I think I need a deeper-dive into the "diagonal scaling" presented. From my understanding, this is actually no different from "industry decoupling" he disparages earlier in the presentation. There are even off-the-shelf libraries for LSMs backed by object storage like SlateDB.
I feel the Expression Problem neatly frames the "diagonal scaling" proposition; what system design choices will allow the architecture to scale vertically in what fashion, while also being able to scale what horizontally, without losing strict serialisability.
If we add a "vertical" capability, it cannot be at the cost of any existing "horizontal" capability, nor should doing so forfend any future "horizontal" capability. And vice-versa (adding horizontal capability should not mess with vertical ones). The point at which one will break the other is the theoretical design limit of the system.
in general these aren't in conflict. in particular once I have a system which can distribute work among faulty nodes and maintain serializability, exploiting parallelism _within_ a fault domain just falls out.
This was a team effort: the object storage connector, the scale test, the visualization, the slides, even provisioning the hardware had its challenges!
Oh most certainly; I am remiss to have not included the group effort in my comment, particularly as a person surrounded by theatre and film making friends.
Still, for the same reason, I have some idea of why their productions turn out well (or not). Where "well" is "a story well told", not "successful" as in "did well at the box office". The why is usually one person who keeps asking the questions and making the decisions that take the story from imagination to imagination via screen or floor.
Something tells me your doubtlessly excellent "production team" (in film terms) will agree with my original comment :)
I have tried to use tiger beetle in production. haven't been successful yet.
nice stuff, multi master replication.
user API, super small.
doubts about how to do streaming backup.
after studying the API and doing some spike architectures I come to the conclusion (I may be wrong):
tiger beetle is awesome to keep the account balance. that's it.
because you pretty much get the transactions affecting and account and IIRC there was not a lot you can do about how to query them or use them.
also I was thinking it would be nice to have something like an account grouping other accounts to answer something like: how much money out user accounts have in this microsecond?
I think that was more or less about itm they have some special fields u128 to store ids to the transaction they represent into your actual system
and IIRC handle multi currency in different books
my conclusion was: I think I don't get it yet. I think I'm missing something. had to write a ruby client for it and build an UI to play with the API and do some transactions and see how it behaved. yet that was my conclusion
To a first approximation, yes. But, why? And for up to how many hundred terabytes of data can you get away with the single beefy server? Provided you make what design choices?
To keep things simple. My current company is running multiple instances of back-end services for absolutely no fucking reason, and I had to fix numerous race condition bugs for them. I had an interview with a startup where, after I asked why they were using distributed DynamoDB locks in a monolith app with only a single instance running, the person said "it works for us" and got defensive. Later they told me I wasn't experienced enough. I am so frustrated that there appears to be zero basic engineering rigor anywhere I can find nowadays.
> And for up to how many hundred terabytes of data can you get away with the single beefy server?
Do you even need to store many hundred terabytes of data? I have never encountered a scenario in my career (admittedly not very long so far) where there was a need to store even one terabyte of data. But in case of TigerBeetle, from skimming through the video, it appears they offload the main bulk of data to a "remote storage."
Boot it up again. You'll still have higher availability than AWS, GitHub, OpenAI, Anthropic, and many others.
> Where do you think those object storage live exactly?
On a RAID5 array with hot-swappable disks, of course.
(Edit to add: this is just a comment on Kubernetes being invoked whenever someone talks about scalability; I have massive respect for what the TigerBeetle folks are doing)
I think I need a deeper-dive into the "diagonal scaling" presented. From my understanding, this is actually no different from "industry decoupling" he disparages earlier in the presentation. There are even off-the-shelf libraries for LSMs backed by object storage like SlateDB.
I feel the Expression Problem neatly frames the "diagonal scaling" proposition; what system design choices will allow the architecture to scale vertically in what fashion, while also being able to scale what horizontally, without losing strict serialisability.
If we add a "vertical" capability, it cannot be at the cost of any existing "horizontal" capability, nor should doing so forfend any future "horizontal" capability. And vice-versa (adding horizontal capability should not mess with vertical ones). The point at which one will break the other is the theoretical design limit of the system.
in general these aren't in conflict. in particular once I have a system which can distribute work among faulty nodes and maintain serializability, exploiting parallelism _within_ a fault domain just falls out.
TigerBeetle's Joran Greef is a teacher / explainer par excellence.
Ah I appreciate your kind words, Aditya!
This was a team effort: the object storage connector, the scale test, the visualization, the slides, even provisioning the hardware had its challenges!
Oh most certainly; I am remiss to have not included the group effort in my comment, particularly as a person surrounded by theatre and film making friends.
Still, for the same reason, I have some idea of why their productions turn out well (or not). Where "well" is "a story well told", not "successful" as in "did well at the box office". The why is usually one person who keeps asking the questions and making the decisions that take the story from imagination to imagination via screen or floor.
Something tells me your doubtlessly excellent "production team" (in film terms) will agree with my original comment :)
I have tried to use tiger beetle in production. haven't been successful yet.
nice stuff, multi master replication.
user API, super small.
doubts about how to do streaming backup.
after studying the API and doing some spike architectures I come to the conclusion (I may be wrong):
tiger beetle is awesome to keep the account balance. that's it.
because you pretty much get the transactions affecting and account and IIRC there was not a lot you can do about how to query them or use them.
also I was thinking it would be nice to have something like an account grouping other accounts to answer something like: how much money out user accounts have in this microsecond?
I think that was more or less about itm they have some special fields u128 to store ids to the transaction they represent into your actual system
and IIRC handle multi currency in different books
my conclusion was: I think I don't get it yet. I think I'm missing something. had to write a ruby client for it and build an UI to play with the API and do some transactions and see how it behaved. yet that was my conclusion
would be great to have an official UI client
On the streaming side, are you looking for Change Data Capture?
https://docs.tigerbeetle.com/operating/cdc/
The takeaway is that you can just buy a single beefy server instead of using kubernetes or whatever.
To a first approximation, yes. But, why? And for up to how many hundred terabytes of data can you get away with the single beefy server? Provided you make what design choices?
Which leads to the real takeaway which is "Tiger Style": https://tigerstyle.dev/ which I am partial to, along with Rich Hickey's "Hammock Driven Development" https://www.youtube.com/watch?v=f84n5oFoZBc
"Tiger on Hammock" will absolutely smoke the competition.
(edit: add links)
> But, why?
To keep things simple. My current company is running multiple instances of back-end services for absolutely no fucking reason, and I had to fix numerous race condition bugs for them. I had an interview with a startup where, after I asked why they were using distributed DynamoDB locks in a monolith app with only a single instance running, the person said "it works for us" and got defensive. Later they told me I wasn't experienced enough. I am so frustrated that there appears to be zero basic engineering rigor anywhere I can find nowadays.
> And for up to how many hundred terabytes of data can you get away with the single beefy server?
Do you even need to store many hundred terabytes of data? I have never encountered a scenario in my career (admittedly not very long so far) where there was a need to store even one terabyte of data. But in case of TigerBeetle, from skimming through the video, it appears they offload the main bulk of data to a "remote storage."
And your beefy server goes down what do you do? Where do you think those object storage live exactly?
Kubernetes is not just for scaling, it's a way to standardize all ops.
> And your beefy server goes down what do you do?
Boot it up again. You'll still have higher availability than AWS, GitHub, OpenAI, Anthropic, and many others.
> Where do you think those object storage live exactly?
On a RAID5 array with hot-swappable disks, of course.
(Edit to add: this is just a comment on Kubernetes being invoked whenever someone talks about scalability; I have massive respect for what the TigerBeetle folks are doing)
"Always has been"