Today, we improved the APY display on our LST page. Which is great – but there’s a much larger story to tell with this update!
The old display:

The new display:

It's a lot clearer now and hides outliers, making the graph a lot more useful. There's a proper y-axis. You can scroll back and forth, zoom in, it's excellent. Go check it out! But what I actually want to talk about isn't this. I want to talk about the work we did building a generalised automation system that made new developments go from days to hours, and finally put a smile on long-suffering HY's face.
This frontend change hides a big infrastructure improvement under the hood of Sanctum.
There's always been a lot of backend work in Sanctum's LST service. On the LST side alone, we need to constantly monitor and measure the APY of every single LST. We need to constantly check for deposits into each LST and stake them to validators. At the start of every epoch, we need to run update commands to keep the state of each LST accurate. We have to accurately calculate fees every epoch and distribute half of it to our partners. We also need to monitor Infinity, Sanctum Reserve, and many other things.
The initial backend infra HY wrote had a few problems:
- It was written in Rust, making it very difficult for other developers who are not familiar with Rust to maintain.
- No observability - we didn't have a good way to find out when things weren't working.
- It was very messy because it was a bunch of random services we added one- by- one as our needs grew.
We had to do what we had to do at the time as a small scrappy team of 6, but technical debt always comes due sooner than you think. As Sanctum grew, all of this quickly became very painful, complex and unmanageable.
One day, I saw fur on the table. I thought a rat had come into our office, but it turned out that it was simply clumps of HY's hair. I knew we had to do something fast.
So, we took a step back and built a generalised automation system to replace the hodgepodge of microservices that had accreted over the years.
What is a "generalised automation system"? By that I mean something that READS the state of the chain and performs ACTIONS based on that state. This single, simple definition encompasses all of the actions we need to do. For example, in our fee-sharing action, we have to:
- (READ) read onchain data to determine the fees
- (ACTION) transfer the fees to our partners, retry in case the transaction fails, and save a record of the transfer to our DB
The system was built with observability, incident management, and reliability in mind. We have alerts on all our services so we immediately know when things go wrong—often before our users even notice.
We simplified our codebase by moving most of it to Node.js, which lets us easily expand the engineering team since good Rust talent is scarce. It also means that HY is no longer the only person who knows how everything works, which frees him up to do higher-value things (top-secret R&D stuff).
But by far the largest win was that we now have a proper abstraction to add new actions. This lets us build new business logic in hours, rather than days.
I'll give an example.
Let's say we want to implement a points program. This might require us to take snapshots of all LST balances every hour. Before, we had to build this service from scratch, which would have taken days. Now, any developer can now write that new action and hook it up to the system in a few hours. Better yet, that developer will have full confidence that it will run automatically, reliably, and ping us if it breaks.
Less development time, more reliability, and more confidence. What's not to like?
This migration taught me a lot about patience. We agreed to take a moratorium on most public-facing features until we had finished the migration, and that was really hard. I will say it was worth it though – this foundational infrastructure took a few months to build, but we have already seen morale and velocity increase greatly.
And really, you can't put a price on HY's hairline, haha.