On December 1st, 2025, a website went down because of an IndexError. That’s not surprising.
The surprising part is why it happened:
A data series that has delivered one new data point every month since 1913 suddenly skipped a month due to the 2025 United States federal government shutdown. That missing row was enough to cause an IndexError, resulting in a 92-minute outage.
The website
Total Real Returns is a website that lets you chart long-term stock / ETF / mutual fund performance, with two key features that differentiate it from conventional stock charts:
- Inflation-adjusted (“real”) returns
- Dividend-reinvested (“total”) returns
To adjust for inflation, it uses the Consumer Price Index dataset published by the U.S. Bureau of Labor Statistics (BLS) to calculate inflation adjustments. This dataset includes:
- Monthly values back to 1913 (not seasonally adjusted)
- Monthly values back to 1947 (seasonally adjusted)
The dataset
Every month, the BLS publishes the CPI number for the prior month: for example, in mid-February, they publish the January number. (CNBC viewers will know this is dramatized with a “Breaking News” banner and a live readout by Rick Santelli.)
For 112 years, that series has looked straightforward, with one new row per month:
series_id year period value footnote_codes CUUR0000SA0 1913 M01 9.8 CUUR0000SA0 1913 M02 9.8 CUUR0000SA0 1913 M03 9.8 CUUR0000SA0 1913 M04 9.8 CUUR0000SA0 1913 M05 9.7 CUUR0000SA0 1913 M06 9.8 CUUR0000SA0 1913 M07 9.9 CUUR0000SA0 1913 M08 9.9 CUUR0000SA0 1913 M09 10.0 ... CUUR0000SA0 2025 M01 317.671 CUUR0000SA0 2025 M02 319.082 CUUR0000SA0 2025 M03 319.799 CUUR0000SA0 2025 M04 320.795 CUUR0000SA0 2025 M05 321.465 CUUR0000SA0 2025 M06 322.561 CUUR0000SA0 2025 M07 323.048 CUUR0000SA0 2025 M08 323.976 CUUR0000SA0 2025 M09 324.800
The assumption
The code assumed that on the first day of any month M, the CPI print for month M-2 would definitely be available. (And for about half the month, the newer value for M-1 would be used, but this row wasn’t assumed to always be available, because the mid-month release date varies.)
For example, on December 1st, it was assumed that the October CPI value would be available, because it would have been published in mid-November.
That assumption had been true for 112 years, across world wars, depressions, pandemics… so it wasn’t even consciously registered as an assumption, but more like the definition of how the dataset worked.
The outage
On 2025-12-01, when the calendar flipped over to December, the HTTP checks for the homepage https://totalrealreturns.com/ started failing. After 15 minutes of continuous failures, HeyOnCall opened an incident and sent a Critical Alert.
The logs said:
Out of range (IndexError) from /app/src/inflation_adjuster.cr:50:5 in 'date=' from /app/src/inflation_adjuster.cr:96:5 in 'normalize_to_date' from /app/src/asset_price.cr:65:21 in '->'
On December 1st, the code assumed the October 2025 CPI row existed, and tried to index into the time series for “two months ago”.
But there was no row for October 2025, so IndexError, and the homepage went down.
The missing month
It turns out that the BLS couldn’t publish the October 2025 CPI data. The official BLS explanation was:
“BLS could not collect October 2025 reference period survey data due to a lapse in appropriations. BLS is unable to retroactively collect these data.”
[…]
“BLS will publish the November 2025 CPI news release on December 18, 2025. This news release and database update will not include 1-month percent changes for November 2025 where the October 2025 data are missing.”
So the October 2025 value simply does not (and will not) exist in the data. The dataset broke the “one row per month” assumption that has held since 1913.
The fix
To get the site back up and running, the calculation was quickly adjusted to use the average CPI-U inflation rate over the prior 12 months to extrapolate a missing trailing datapoint. (And when the November CPI-U number is published in mid-December, it will interpolate between the published September and November values.)
After deploying this change, the site came back up after a 92-minute outage (status page link).
Couldn’t this outage have been avoided?
For this specific incident, yes:
- BLS did publish the shutdown note, so some workaround for October 2025 could have been implemented in advance.
- There were unit tests covering the parsing of the CPI data, but none of those specs tested the case of a missing month.
- Or perhaps the code might have been architected more resiliently to continue serving up the November 30th version of the homepage if it couldn’t successfully update the data on December 1st.
All reasonable ideas.
The bigger picture is where it gets more interesting:
- Can you prevent some particular failure? Often, yes.
- Can you prevent every weird failure like this in advance? Probably not.
The moment your data pipeline depends on a third-party dataset, API, dependency, etc., you’re running a distributed system. And in particular, a distributed system where some nodes belong to strangers with different incentives, constraints, budgets, and management. Even when that supplier has been reliably delivering a particular dataset for over a century, it turns out that there can still be surprises!
Invariants, assumptions, and defensive programming
The interesting part is that there are so many silent assumptions baked into any piece of software:
- “This value will never be null.”
- “This table always has exactly one row per month.”
- “The denominator will never be zero.”
- “Nobody will ever run this function twice at the same time.”
- “No web server will trickle out a response a few bytes at a time with seconds between them.”
- “This graph will never have cycles.”
- “This array won’t be mutated while we’re iterating over it.”
- “This file will never be larger than 232 - 1 bytes.”
- “This API will always return valid JSON.”
It takes experience, skill, and effort to recognize and defend against each of these.
And it’s a tradeoff: that can cost a lot of engineering time and energy spent defending against situations that will never happen, plus all the future time spent reviewing and maintaining code that handles rare edge cases. That rare-edge-case handling code also inflates the size of the code and may itself introduce new bugs! Taken to the limit, extremely defensive programming can slow down your team’s velocity quite a lot!
But even defensive programming can’t save you from the “unknown unknowns”: the potential failures you haven’t imagined. And they could be unimagined or deprioritized because:
- You’ve never heard of that particular failure mode before, or
- The assumption you’ve made seems so unlikely or so deeply ingrained that you might not even consciously register it as an assumption that just hasn’t failed yet.
You can’t defend against everything: your team wouldn’t be able to ship anything. But you can try to:
- Remember that there are assumptions.
- If easy/cheap to evaluate, add them as runtime assertions.
- If not, add them as comments for the future.
- Remember that even after you’ve listed these assumptions, there are still more you haven’t considered.
- Fail loudly.
- Have some way to notice when reality doesn’t line up.
This outage was, in fact, a simple runtime assertion firing:
raise IndexError.new("Out of range") if val < @min_date || val > @max_date
The real lesson
Data pipelines are fragile. Apparently, even when you have a monthly dataset going back over 100 years, the assumption that there’s going to be a new row every month can bite you. :)
But it’s not just data pipelines: almost all software is fragile. Software is built on a pile of assumptions, and they’re not all obvious or practical to defend against. Being “too defensive” has real costs!
So what can we do about this fragility? We put smart humans back in the feedback loop.
A 1913-era assumption finally broke after 112 years. Monitoring picked it up and sent alerts after the configured 15-minute failure threshold. And after a good laugh at the situation, the site was back up after a 92-minute outage.
That feedback loop is where setting up something like HeyOnCall pays off: by making sure you or your team finds out when one of those invisible assumptions finally breaks.
When you step back and consider all the necessary conditions, all that fragility, it’s hard not to feel some gratitude and astonishment that despite all that, most of what we build mostly works… most of the time! ;-)