Shower thoughts: since we can do service discovery pretty easily to know when a server was added or removed from a pool, we can also discover a metrics endpoint with a limited set like CPU load, memory load, threads available etc. With a helper process/thread running alongside the loadbalancer main processes, it could populate/update in almost realtime the equivalent of an haproxy stick tables but with much richer information. When the next request hits the loadbalancer, you know “exactly” where to route it for best performance.
Author here. Two quick thoughts:
1. As I covered in an earlier part of this series, service discovery is not always easy at scale. High churn, partial failures, and the cost of health checks can make it tricky to get right.
2. Using server-side metrics for load balancing is a great idea. In many setups, feedback is embedded in response headers or health check responses so the LB can make more informed routing decisions. Hodor at LinkedIn is a good example of this in practice:
https://www.linkedin.com/blog/engineering/data-management/ho...
I was thinking something along the lines of a “map” with all the backends and their capabilities that would be recomputed every N seconds and atomically switched with the previous one. The LB woukd then be able to decide where to send a request and also have a precomputed backup option in case the first choice would become unavailable. You could also use those metrics to signal that a node needs to be drained of traffic for example, so no more new connections towards it.
I understand the complexities of having a large set of distributed services behind load balancers, I just think there could be a better way of choosing a backend based not only on least requests, TTFB and an OK response from a health check every N seconds.
Author here. Absolutely, HAProxy’s sticktables is a powerful way to implement advanced routing logic, and they’ve been around for years. This series focuses on explaining the broader concepts and tradeoffs rather than diving deep into any single implementation, and since it also covers other aspects of reverse proxies, the focus on load balancing here is mostly to present the challenges and high-level ideas.
Glad you found it a good effort, and I agree there’s room to go deeper in future posts.
algorithms is pretty hard as a spelling: its derived from something like Al Gorism - the name of an Arab chap who documented an early notion. By the time English has decided to create a word, you can be sure it will be ... painful!
Keep going mate, you have a great writing style and presentation.
its honestly not, but younger developers can be forgiven for assuming traefik is all you need. the learn-to-code camps really did a number on kids these days :(
explore load balancing lower in the stack based on ASN to preroute stuff for divide and conquer. (geolocated, etc...)
weighted load balancing only works for uniform traffic sources. youll need to weight connections based on priority or location, backend heavy transactions (checkout vs just browsing the store) and other conditions that can change the affinity of your user (sometimes dynamically.) keepalived isnt mentioned once, or .1q trunk optimization, or SRV records and failover/HA thats performed in most modern browsers based on DNS information itself.
> most modern browsers based on DNS information itself.
I went down this rabbit hole and was surprised how all over the place the behavior was against various http clients (not just browsers).
Very little consistency in how the IPs in the dns response are retried, if at all.
Author here. Thanks for sharing these thoughts. You’re right that DSR, ASN-based routing, SRV records, and other lower-layer approaches are important in certain setups.
This post is focused primarily on Layer 7 load balancing, connection and request routing based on application-level information, so it doesn’t go into Layer 3/4 techniques like DSR or network-level optimizations. Those are certainly worth covering in a broader series that spans the full stack.
On the subject I can recommend the original paper from Google about Maglev https://static.googleusercontent.com/media/research.google.c...
and subsequent enhancement from Yandex folks https://github.com/kndrvt/mhs
Explanation is at https://habr.com/ru/companies/yandex/articles/858662/ use your favorite translate site.
Shower thoughts: since we can do service discovery pretty easily to know when a server was added or removed from a pool, we can also discover a metrics endpoint with a limited set like CPU load, memory load, threads available etc. With a helper process/thread running alongside the loadbalancer main processes, it could populate/update in almost realtime the equivalent of an haproxy stick tables but with much richer information. When the next request hits the loadbalancer, you know “exactly” where to route it for best performance.
Author here. Two quick thoughts: 1. As I covered in an earlier part of this series, service discovery is not always easy at scale. High churn, partial failures, and the cost of health checks can make it tricky to get right. 2. Using server-side metrics for load balancing is a great idea. In many setups, feedback is embedded in response headers or health check responses so the LB can make more informed routing decisions. Hodor at LinkedIn is a good example of this in practice: https://www.linkedin.com/blog/engineering/data-management/ho...
I was thinking something along the lines of a “map” with all the backends and their capabilities that would be recomputed every N seconds and atomically switched with the previous one. The LB woukd then be able to decide where to send a request and also have a precomputed backup option in case the first choice would become unavailable. You could also use those metrics to signal that a node needs to be drained of traffic for example, so no more new connections towards it.
I understand the complexities of having a large set of distributed services behind load balancers, I just think there could be a better way of choosing a backend based not only on least requests, TTFB and an OK response from a health check every N seconds.
HA Proxy has been doing this sort of thing for a very, very long time.
You have stick tables and a very rich way of populating them and then you can use these tables of in RAM data to make routing decisions.
Sometimes you need another proxy too - eg Apache/nginx or whatever, perhaps for authn/authz.
Yes it is a tricky concept and this series of articles merely scratches the surface. Good effort though.
Author here. Absolutely, HAProxy’s sticktables is a powerful way to implement advanced routing logic, and they’ve been around for years. This series focuses on explaining the broader concepts and tradeoffs rather than diving deep into any single implementation, and since it also covers other aspects of reverse proxies, the focus on load balancing here is mostly to present the challenges and high-level ideas.
Glad you found it a good effort, and I agree there’s room to go deeper in future posts.
"Common load balancing algoithims and challenges"
algorithms is pretty hard as a spelling: its derived from something like Al Gorism - the name of an Arab chap who documented an early notion. By the time English has decided to create a word, you can be sure it will be ... painful!
Keep going mate, you have a great writing style and presentation.
its honestly not, but younger developers can be forgiven for assuming traefik is all you need. the learn-to-code camps really did a number on kids these days :(
use DSR and 50% of your traffic is taken care of. https://www.loadbalancer.org/blog/direct-server-return-is-si...
explore load balancing lower in the stack based on ASN to preroute stuff for divide and conquer. (geolocated, etc...)
weighted load balancing only works for uniform traffic sources. youll need to weight connections based on priority or location, backend heavy transactions (checkout vs just browsing the store) and other conditions that can change the affinity of your user (sometimes dynamically.) keepalived isnt mentioned once, or .1q trunk optimization, or SRV records and failover/HA thats performed in most modern browsers based on DNS information itself.
> most modern browsers based on DNS information itself.
I went down this rabbit hole and was surprised how all over the place the behavior was against various http clients (not just browsers). Very little consistency in how the IPs in the dns response are retried, if at all.
Author here. Thanks for sharing these thoughts. You’re right that DSR, ASN-based routing, SRV records, and other lower-layer approaches are important in certain setups.
This post is focused primarily on Layer 7 load balancing, connection and request routing based on application-level information, so it doesn’t go into Layer 3/4 techniques like DSR or network-level optimizations. Those are certainly worth covering in a broader series that spans the full stack.