Hi @andrei - Your post is very interesting to me because I just embarked on a similar clean-up of my Balance History last week along what appears to be the very similar line of logic. I made a back-up of the Balance History first just in case and then removed a couple thousand redundant lines. I did it one account at a time just to kind of double-check that things were making sense as I went along, but it can be done all at once as well.
I’m posting a screenshot below along with the main formulas behind the companion sheet that I created to manage the purge operation.
The left panel of columns lists out the unique accounts, the number of rows of balance history of each, and the number of lines that could be purged.
I added some conditional formatting to highlight if there was any redundancy of a same account number using different names - I had one account like this where I had changed the name over time. I also used a conditional format to gray out the text where there were zero lines left to remove.
The middle panel of columns lists the balance history and possible duplicates for a single/individual account only, based on the account selected in cell C2.
The last panel of columns does the data check on all accounts and all balance history items. I did it this way because my method relies on the data being sorted by account and then date. This is critical to flagging the transactions using formulas 3 and 5. And I didn’t want to mess with the sorting in the Balance History sheet itself.
I realize this creates another lengthy sheet in the workbook, but after finishing you can hide the sheet or click cell A1 to Deactivate the formulas and delete most of the rows in this sheet. Then it can be fired up again later when you want to do another purge.
Here is the companion sheet screenshot:
Here are the main formulas corresponding to the numbers in the example screenshot.
1 -
=LET(bsheet, "Balance History!", acctid_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account ID",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctname_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"),
acctlist,UNIQUE({INDIRECT(bsheet&acctid_col&"2:"&acctid_col),INDIRECT(bsheet&acctname_col&"2:"&acctname_col)}), acctitems,MAP(CHOOSECOLS(acctlist,1),LAMBDA(acctid,IF(acctid<>"",COUNTIF(INDIRECT(bsheet&acctid_col&"2:"&acctid_col),acctid),))),sortedaccts,QUERY({acctlist,acctitems},"select * order by Col3 desc",0),{"Account","Account Name","# Rows";sortedaccts})
2 -
=MAP(B4:B,LAMBDA(acctid,IF(acctid<>"",COUNTIFS($AA$4:$AA,acctid,$X$4:$X,"x"),)))
3 -
=MAP(I4:I,N4:N,LAMBDA(datevals,balvals,IF(OFFSET(datevals,1,0)<>"",IF(AND(OFFSET(datevals,1,0)=datevals,OFFSET(balvals,1,0)=balvals),"x",IF(OR(OFFSET(balvals,1,0)<>balvals,OFFSET(balvals,-1,0)<>balvals),,"x")),)))
4 -
=IF($A$1=TRUE,,LET(bsheet, "Balance History!", acctid_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account ID",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctname_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctname_colnum, MATCH("Account",INDIRECT(bsheet&"1:1"),0),
date_colnum, MATCH("Date",INDIRECT(bsheet&"1:1"),0), time_colnum, MATCH("Time",INDIRECT(bsheet&"1:1"),0),
acct_rng, SORT(QUERY({INDIRECT(bsheet&"A2:"&acctid_col),ARRAYFORMULA(ROW(INDIRECT(bsheet&acctid_col&"2:"&acctid_col)))},"select * where Col"&acctname_colnum&" ="""&C2&"""",0),date_colnum,TRUE,time_colnum,TRUE),
acct_rng))
5 -
=MAP(Y4:Y,AB4:AB,AA4:AA,LAMBDA(datevals,balvals,acctvals,IF(OFFSET(datevals,1,0)<>"",IF(OFFSET(acctvals,1,0)=acctvals,IF(AND(OFFSET(datevals,1,0)=datevals,OFFSET(balvals,1,0)=balvals),"x",IF(OR(OFFSET(balvals,1,0)<>balvals,OFFSET(balvals,-1,0)<>balvals),,"x")),),)))
6 -
=IF($A$1=TRUE,,LET(bsheet, "Balance History!", acctid_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account ID",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctbal_col, REGEXEXTRACT(ADDRESS(1,MATCH("Balance",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"),
date_col, REGEXEXTRACT(ADDRESS(1,MATCH("Date",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), time_col, REGEXEXTRACT(ADDRESS(1,MATCH("Time",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"),
acct_rng, SORT(QUERY({INDIRECT(bsheet&date_col&"2:"&date_col),INDIRECT(bsheet&time_col&"2:"&time_col),INDIRECT(bsheet&acctid_col&"2:"&acctid_col),INDIRECT(bsheet&acctbal_col&"2:"&acctbal_col),ARRAYFORMULA(ROW(INDIRECT(bsheet&acctid_col&"2:"&acctid_col)))},"select *",0),3,TRUE,1,TRUE,2,TRUE),
acct_rng))
The last step is that I added a column to the far right in the Balance History sheet to hold the flag that will allow you to identify the lines for purging. It can be filtered upon like your previous stated solution. I added a variable called Type at the beginning that can be set with a 0, 1, or 2 to turn the formula off or run it against the individual account or all accounts. The formula contains a text reminder for what each of those numbers means. The header label will show you how many are flagged and whether you are running it on an individual account or all.
Here is the screenshot of that column in the Balance History sheet:
Here is the formula shown in O1:
=LET(type, 1, typekey, "0 = OFF, 1 = Individual, 2 = All", indvacct1, "G", indvacct2, "V", allacct1, "X", allacct2, "AC", bhs, "BalHistCleaner!",
markarray,IF(type=0,,ARRAYFORMULA(IFNA(XLOOKUP(ROW(A2:A),IF(type=1,INDIRECT(bhs&"$"&indvacct2&"$4:$"&indvacct2),INDIRECT(bhs&"$"&allacct2&"$4:$"&allacct2)),IF(type=1,INDIRECT(bhs&"$"&indvacct1&"$4:$"&indvacct1),INDIRECT(bhs&"$"&allacct1&"$4:$"&allacct1))),))),{"Marker ("&COUNTA(markarray)&") - "&IF(type=0,"OFF",IF(type=1,"INDV","ALL"));markarray})
This formula relies on setting the column letters from the companion sheet corresponding with the middle and right section Remove flags and the ROWNUM columns (G, V, X, and AC in my set-up). So if you don’t have the same columns, it may require an adjustment there. You may also need to change the name of your companion sheet for the variable bhs.
Once the flags are set, it can be filtered and those rows can be deleted.
Feel free to let me know if anyone tries it out and any questions or feedback.