Like "Trim Balance History", but lossless

I’ve used Tiller since January and I already have 18,252 rows in “Balance History”. Looking at the entries, there are a lot that can be safely removed without loss of information:

  • I have a few credit cards that I never use, so the balance on them is always $0. For one credit card there are 486 entries since Jan 2024! Everything but the first and last balance can be safely deleted, and any balance tracker can infer that the balance is the same all along.
  • Similarly, I have a couple of barely active bank accounts that do have activity, and again the balance stays unchanged for long periods of time. Everything except the first and the last entries in a run of equal balances can be deleted.
  • Even some investment accounts (retirement, 529, etc) have slow-moving balances that stay unchanged for days.

A tool that deletes runs of equal value for accounts (and keep only the first and last entry) would greatly reduce the size of by balance history, without losing any information.

Please let me know of any existing code that does that. Thanks!

2 Likes

This type of efficiency is fantastic. I don’t have the answer as I am even newer to Tiller than you are, but this makes a ton of sense.

To help the situation, I wrote the following formula and put it in ‘Balance History’!A1 (not guaranteed to work, use at your own risk, etc etc):

=let(
  lut, SORT(map(F2:F, B2:B, C2:C, I2:I, lambda(id, d, t, b, { id & TEXT(d, "yyyy-mm-dd") & TEXT(t, "hh:mm:ss"), b, id })), 1, TRUE),
  { image("https://storage.googleapis.com/assets.templates.tillermoney.com/Tiller_Icon_Reverse%402x.png");
    map(F2:F, B2:B, C2:C, I2:I, lambda(id, d, t, b,
      let(pkey, id & TEXT(d, "yyyy-mm-dd") & TEXT(t-time(0, 0, 1), "hh:mm:ss"),
        ppos, MATCH(pkey, INDEX(lut, 0, 1), 1),
        nkey, id & TEXT(d, "yyyy-mm-dd") & TEXT(t+time(0, 0, 1), "hh:mm:ss"),
        npos, 1+MATCH(nkey, INDEX(lut, 0, 1), 1),
        prevBalance, if(index(lut, ppos, 3)=id, index(lut, ppos, 2), 1000000000),
        nextBalance, if(index(lut, npos, 3)=id, index(lut, npos, 2), 1000000000),
        IF(AND(prevBalance=b, iferror(b=nextBalance, false)), "*", "")
      )
    ))
  }
)

The implementation creates a lookup table by account ID, date, and time, and then uses it for binary searches for each row in the sheet.

Each position that is redundant will receive an asterisk in column A. Then after inspection the rows with an asterisk can be filtered out or removed.

1 Like

Hi @andrei,

Thanks for your post and formula. To clarify, the formula is for Google Sheets, correct?

Yes, sorry for not being specific. I’d assume it can be ported with Excel with ease. In fact, I just asked ChatGPT o1-preview to translate the fomula to Excel and here’s what it generated (untested):

=LET(
  lut, SORT(
    MAP(F2:F, B2:B, C2:C, I2:I,
      LAMBDA(id, d, t, b,
        HSTACK(id & TEXT(d, "yyyy-mm-dd") & TEXT(t, "hh:mm:ss"), b, id)
      )
    ), 1, TRUE
  ),
  VSTACK(
    IMAGE("https://storage.googleapis.com/assets.templates.tillermoney.com/Tiller_Icon_Reverse%402x.png"),
    MAP(F2:F, B2:B, C2:C, I2:I,
      LAMBDA(id, d, t, b,
        LET(
          pkey, id & TEXT(d, "yyyy-mm-dd") & TEXT(t - TIME(0, 0, 1), "hh:mm:ss"),
          ppos, MATCH(pkey, INDEX(lut, ,1), 1),
          nkey, id & TEXT(d, "yyyy-mm-dd") & TEXT(t + TIME(0, 0, 1), "hh:mm:ss"),
          npos, 1 + MATCH(nkey, INDEX(lut, ,1), 1),
          prevBalance, IF(INDEX(lut, ppos, 3) = id, INDEX(lut, ppos, 2), 1000000000),
          nextBalance, IF(INDEX(lut, npos, 3) = id, INDEX(lut, npos, 2), 1000000000),
          IF(AND(prevBalance = b, IFERROR(b = nextBalance, FALSE)), "*", "")
        )
      )
    )
  )
)

Thanks for the confirmation. I’m not a script writer but it looks like the ChatGPT Excel formula is referencing a google storage location in the image statement. I assume that would not work for the Excel template, correct?

Hi @andrei - Your post is very interesting to me because I just embarked on a similar clean-up of my Balance History last week along what appears to be the very similar line of logic. I made a back-up of the Balance History first just in case and then removed a couple thousand redundant lines. I did it one account at a time just to kind of double-check that things were making sense as I went along, but it can be done all at once as well.

I’m posting a screenshot below along with the main formulas behind the companion sheet that I created to manage the purge operation.

The left panel of columns lists out the unique accounts, the number of rows of balance history of each, and the number of lines that could be purged.

I added some conditional formatting to highlight if there was any redundancy of a same account number using different names - I had one account like this where I had changed the name over time. I also used a conditional format to gray out the text where there were zero lines left to remove.

The middle panel of columns lists the balance history and possible duplicates for a single/individual account only, based on the account selected in cell C2.

The last panel of columns does the data check on all accounts and all balance history items. I did it this way because my method relies on the data being sorted by account and then date. This is critical to flagging the transactions using formulas 3 and 5. And I didn’t want to mess with the sorting in the Balance History sheet itself.

I realize this creates another lengthy sheet in the workbook, but after finishing you can hide the sheet or click cell A1 to Deactivate the formulas and delete most of the rows in this sheet. Then it can be fired up again later when you want to do another purge.

Here is the companion sheet screenshot:

Here are the main formulas corresponding to the numbers in the example screenshot.

1 -

=LET(bsheet, "Balance History!", acctid_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account ID",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctname_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"),
acctlist,UNIQUE({INDIRECT(bsheet&acctid_col&"2:"&acctid_col),INDIRECT(bsheet&acctname_col&"2:"&acctname_col)}), acctitems,MAP(CHOOSECOLS(acctlist,1),LAMBDA(acctid,IF(acctid<>"",COUNTIF(INDIRECT(bsheet&acctid_col&"2:"&acctid_col),acctid),))),sortedaccts,QUERY({acctlist,acctitems},"select * order by Col3 desc",0),{"Account","Account Name","# Rows";sortedaccts})

2 -

=MAP(B4:B,LAMBDA(acctid,IF(acctid<>"",COUNTIFS($AA$4:$AA,acctid,$X$4:$X,"x"),)))

3 -

=MAP(I4:I,N4:N,LAMBDA(datevals,balvals,IF(OFFSET(datevals,1,0)<>"",IF(AND(OFFSET(datevals,1,0)=datevals,OFFSET(balvals,1,0)=balvals),"x",IF(OR(OFFSET(balvals,1,0)<>balvals,OFFSET(balvals,-1,0)<>balvals),,"x")),)))

4 -

=IF($A$1=TRUE,,LET(bsheet, "Balance History!", acctid_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account ID",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctname_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctname_colnum, MATCH("Account",INDIRECT(bsheet&"1:1"),0),
date_colnum, MATCH("Date",INDIRECT(bsheet&"1:1"),0), time_colnum, MATCH("Time",INDIRECT(bsheet&"1:1"),0),
acct_rng, SORT(QUERY({INDIRECT(bsheet&"A2:"&acctid_col),ARRAYFORMULA(ROW(INDIRECT(bsheet&acctid_col&"2:"&acctid_col)))},"select * where Col"&acctname_colnum&" ="""&C2&"""",0),date_colnum,TRUE,time_colnum,TRUE),
acct_rng))

5 -

=MAP(Y4:Y,AB4:AB,AA4:AA,LAMBDA(datevals,balvals,acctvals,IF(OFFSET(datevals,1,0)<>"",IF(OFFSET(acctvals,1,0)=acctvals,IF(AND(OFFSET(datevals,1,0)=datevals,OFFSET(balvals,1,0)=balvals),"x",IF(OR(OFFSET(balvals,1,0)<>balvals,OFFSET(balvals,-1,0)<>balvals),,"x")),),)))

6 -

=IF($A$1=TRUE,,LET(bsheet, "Balance History!", acctid_col, REGEXEXTRACT(ADDRESS(1,MATCH("Account ID",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), acctbal_col, REGEXEXTRACT(ADDRESS(1,MATCH("Balance",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"),
date_col, REGEXEXTRACT(ADDRESS(1,MATCH("Date",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"), time_col, REGEXEXTRACT(ADDRESS(1,MATCH("Time",INDIRECT(bsheet&"1:1"),0),4),"[A-Z]+"),
acct_rng, SORT(QUERY({INDIRECT(bsheet&date_col&"2:"&date_col),INDIRECT(bsheet&time_col&"2:"&time_col),INDIRECT(bsheet&acctid_col&"2:"&acctid_col),INDIRECT(bsheet&acctbal_col&"2:"&acctbal_col),ARRAYFORMULA(ROW(INDIRECT(bsheet&acctid_col&"2:"&acctid_col)))},"select *",0),3,TRUE,1,TRUE,2,TRUE),
acct_rng))

The last step is that I added a column to the far right in the Balance History sheet to hold the flag that will allow you to identify the lines for purging. It can be filtered upon like your previous stated solution. I added a variable called Type at the beginning that can be set with a 0, 1, or 2 to turn the formula off or run it against the individual account or all accounts. The formula contains a text reminder for what each of those numbers means. The header label will show you how many are flagged and whether you are running it on an individual account or all.

Here is the screenshot of that column in the Balance History sheet:

Here is the formula shown in O1:

=LET(type, 1, typekey, "0 = OFF, 1 = Individual, 2 = All", indvacct1, "G", indvacct2, "V", allacct1, "X", allacct2, "AC", bhs, "BalHistCleaner!",
markarray,IF(type=0,,ARRAYFORMULA(IFNA(XLOOKUP(ROW(A2:A),IF(type=1,INDIRECT(bhs&"$"&indvacct2&"$4:$"&indvacct2),INDIRECT(bhs&"$"&allacct2&"$4:$"&allacct2)),IF(type=1,INDIRECT(bhs&"$"&indvacct1&"$4:$"&indvacct1),INDIRECT(bhs&"$"&allacct1&"$4:$"&allacct1))),))),{"Marker ("&COUNTA(markarray)&") - "&IF(type=0,"OFF",IF(type=1,"INDV","ALL"));markarray})

This formula relies on setting the column letters from the companion sheet corresponding with the middle and right section Remove flags and the ROWNUM columns (G, V, X, and AC in my set-up). So if you don’t have the same columns, it may require an adjustment there. You may also need to change the name of your companion sheet for the variable bhs.

Once the flags are set, it can be filtered and those rows can be deleted.

Feel free to let me know if anyone tries it out and any questions or feedback.

2 Likes

You mean IMAGE("https://storage.googleapis.com/assets.templates.tillermoney.com/Tiller_Icon_Reverse%402x.png")? That’s nothing to worry about - it’s just a formula to fetch an image off the Net. The fact that Tiller stores it in Google storage is not relevant. All you need is the IMAGE function, which Excel does support.

1 Like

Nice idea and thanks for sharing that formula.

I got two #N/A results:
Did not find value ‘655bd78af3667f002f26df252023-11-2022:01:59’ in MATCH evaluation.
Did not find value ‘655bd78af3667f002f26df252023-11-2022:01:59’ in MATCH evaluation.

It’s the first two (oldest) entries for that account.
The Time cell is: 10:02:00 PM , so one second off from 22:01:59 in the error message.
Any ideas?

image

1 Like

I have taken a simpler, albeit manual, approach. At the end of each month I manually delete (rows in the spreadsheet) all the balance history entries from the 3rd to the 29th. This makes sure I have preserved beginning and end of month balances in case I want to use them, but reduces the total number of entries significantly.

1 Like

I want to make sure community members seeing this thread know there is a built-in Community tool to “Trim Balance History.” Your title of the thread implies you know of it but want something different, but just in case…

Go to Extensions > Tiller Community Solutions > Tools > Trim Balance History.

3 Likes

@andrei this would be a good Show & Tell topic.

I did some testing with it and thought there could be an issue with reports that reference a time period with no balance entries for an account, like a QUERY between start/end dates … and do think that is a possible downside. It may be lossless wrt the Balance History sheet, but reporting could lose info if it doesn’t account for missing account balances within a time period.

The Net Worth sheet seems to handle this just fine, though, and will use the balance before the selected time period.

In my testing, I found this lossless formula to produce a 70% decrease in Balance History rows.
Trim Balance History with weekly frequency was similarly a 72% decrease.

So, Trim Balance History may lose some daily balances, but retaining weekly accuracy is quite good IMO, if a person isn’t concerned with reconciling financial statement details.

One nice advantage to this lossless formula over Trim Balance History, is that it would be easy to delete only older rows, say older than one year, where Trim Balance History doesn’t have that optional control.

2 Likes

Thanks very much for taking a look. Indeed the formula had a couple of bugs, here’s what I have right now (it only does one lookup per element so it should be simpler and more efficient):

=let(
  lut, SORT(map(F2:F, B2:B, C2:C, I2:I, lambda(id, d, t, b, { id & TEXT(d, "yyyy-mm-dd") & TEXT(t, "hh:mm:ss"), b, id })), 1, TRUE),
  r, map(F2:F, B2:B, C2:C, I2:I, lambda(id, d, t, b,
      if(and(id=ABOVE(id), d=ABOVE(d), t=ABOVE(t), b=ABOVE(b)), "🟰",
        let(key, id & TEXT(d, "yyyy-mm-dd") & TEXT(t, "hh:mm:ss"),
          pos, MATCH(key, INDEX(lut, 0, 1), 1),
          prevBalance, if(and(pos>1, index(lut, pos-1, 1)<>key, index(lut, pos-1, 3)=id), index(lut, pos-1, 2), 1000000000),
          nextBalance, if(pos<rows(lut), if(and(index(lut, pos+1, 1)<>key, index(lut, pos+1, 3)=id), index(lut, pos+1, 2), 1000000000), 1000000000),
          IF(AND(prevBalance=b, b=nextBalance), "⬍", "")
        )
      ))
    ),
  dupes, counta(filter(r, r<>"")),
  { if(dupes, dupes, image("https://storage.googleapis.com/assets.templates.tillermoney.com/Tiller_Icon_Reverse%402x.png")); r }
)

It also detects (and marks a different way) the sheer duplicates, by using a named function called ABOVE(r) (which is useful elsewhere, too) with the definition:

=offset(r, -1, 0, 1, columns(r))

BTW I think Tiller’s implementation could define and use named functions - they’d make a lot of things a lot simpler.

2 Likes

Thanks for sharing the updated formulas. I confirmed the #N/A 's are gone now :+1:

FYI - the "🟰" (? inside a box) doesn’t come through for me (Chrome on Windows 10).
Unless the intended result is an X inside a box.
Your code:


After I copy/paste into sheets:

Result:
image

2 Likes