Comparing near-duplicate sheets for unique records

Good morning, afternoon, evening. I hope you are in fine fettle today.

I had this year’s Tiller Google sheet open alongside last year’s (doing tax prep). I mistakenly uploaded lots of transactions data forJanuary through mid-March 2024 into the 2023 Transactions sheet.

I’ve noticed that the number of records (rows) in the 2023 Transactions sheet for the period 01 January through mid-March 2024 exceeds the number of records in toe 2024 Transactions sheet over the same period. I suppose this could be due to a combination of factors: (1) I [mistakenly] did the “Fill Sheets” action in the 2023 spreadsheet more times than I did in the 2024 spreadsheet; and/or (2) I have done the “Reconcile transactions” action in the 2024 spreadsheet, thereby reducing the number of records (rows) in that spreadsheet.

What is a good way to compare the 2023 and 2024 Transactions sheets only for the relevant period with the following goals?
(1) In the relevant period, identify every record in the 2023 sheet that does not appear in the 2024 sheet.
(2) Move or copy any valid records that are in 2023 but missing from 2024 into the 2024 Transactions sheet.

Thank you.

I had a similar need. How I accomplished it was to save copies of the two transaction lists to a new sheet (better safe than sorry ;-). Then I setup a check column using a arrayformula vlookup to check the transactionid’s on the smaller transaction list against the longer transaction list. All that had “#n/a” errors were the ones that were missing so easy enough to filter on that and cut/paste them into my sheet with the missing transactions (or if you want to get fancy us an iferror to mark them with an “x” ;-).

Hmm… I may have spoken too soon. Can you direct me to a post on how to set up a check column, how to use an array formula, and what is an array formula anyway?
Thank you.

An arrayformula applies a formula across multiple columns or rows vs. having to copy a formula into each cell in the column - ARRAYFORMULA - Google Docs Editors Help

For the purposes of this example, the tab with your excess transactions will be “Check” and your other one will be “Transactions”. Put this formula into the A2 cell of “Check”

=arrayformula(if(Len(iferror(vlookup(K2:K,Check!K:K,1,FALSE),)),“X”))

It will put an “X” in the A column for every transaction where the TransactionID is not found on the “Check” tab

1 Like

Thank you so much!
Greg Corning

1 Like

Dear rgerrans,
I am sorry to keep bugging you. It’s just that I still don’t understand.
I have two separate Google Sheets files–one for 2023, and one for 2024. They are both built on the Tiller Google Sheets template.
In the following description, “transactions” are always on the “Transactions” tab (also called “Transactions sheet”).
The 2024 Google Sheets file has transactions from 01 January 2024 to 29 March 2024, but I am only interested in the transactions up to 10 March.
The 2023 Google Sheets file has transactions for the entire year of 2023 PLUS transactions from 01 January 2024 to 10 March 2024 (because I accidentally made new entries for that period in the wrong Google Sheets file- 2023).
The list of transactions in the 2023 Google Sheets file from 01 January 2024 to 10 March 2024 is longer than the list of transactions in the 2024 Google Sheets for the same period.
In order to discover which transactions in the 2023 Google Sheets file are missing from the 2024 Google Sheets file, you say I need to create a new tab or sheet, copy the transactions to that new sheet, and name the new sheet “Check”.
In which of the existing Google Sheets files do I create the new tab called “Check”? In the 2023 one with the longer list, or the 2024 one?
Or do I create an entirely new Google Sheets file for this comparison? (I tried this and the formula returned “Formula parse error”.)
Thank you.

Your “check” column should be on which ever tab has the extra transactions. The sheet doesn’t really matter, just copy the data over to a new “check” tab in the same sheet as the one with the extra data.

Thank you.
I did the steps and the result is #ERROR. A pop-up says “Error. Formula parse error.”
I think it’s time I let this go. You’ve got better things to do with your generously shared time.
Again, thank you.

1 Like

That means there’s a typo in your formula

Aha! I see now that the formula is based on column K. In my sheets, the column for transaction ID is different. So I have to change the “K” to “L”, or whatever column transaction ID is kept.
Thank you.

Oops–still returns the same error message. What a sticky fiddley problem!

1 Like

i’ve skimmed this thread, and I think the issue here is your level of expertise with the solutions given. Perhaps try this simpler approach…
(1) copy the data in question to another spreadsheet. use a different sheet/tab for each set of data that you believe are duplicates.
(2) in a 3rd sheet/tab use this formula… =Sheet1!A1=Sheet2!A1 … where Sheet1 is one data set and sheet2 is another data set, and both datasets start at cell A1
(3) then copy this formula down far enough to refer to the number of rows in each sheet, and then copy it right far enough to refer to the number of columns in each sheet.
(4) the formula result will read “TRUE” if the cell contents are identical, and “FALSE” otherwise.
(5) this method presupposes the columns are in the same order and the rows are sorted the same.

Thank you. I’ll save this workflow.