eBPF is incredible: I think of it as flipping the relationship between you and the perf tool. Instead of the tool giving you information, it asks you "what do you want to know?".
I don't know if it's google search but I find it hard to discover blogs etc. when I'm hunting for the syntax/language etc, having documented "case studies" is super useful.
This is cool!
eBPF is incredible: I think of it as flipping the relationship between you and the perf tool. Instead of the tool giving you information, it asks you "what do you want to know?".
I don't know if it's google search but I find it hard to discover blogs etc. when I'm hunting for the syntax/language etc, having documented "case studies" is super useful.
bpftrace was instrumental in helping me diagnose a latency spike on a hot path.
I call into a vendor library that eventually calls into Solarflare's TCPDirect API to send packets to the network.
This call usually stays under 5us and has very little variance, but I was seeing the occasional 30ms or 60ms outlier.
I attached probes to track the latency of each call down the stack and it became quite easy to pin down the specific function.
Eventually tracking it down to a single assignment to a variable that was mmapped to a file, and getting spikes due to page cache writeback.
After figuring it out I found these two articles that go into more detail on the problem.
https://rigtorp.se/virtual-memory/
https://tungdam.medium.com/our-lessons-on-linux-writeback-do...
Here's one of the probes producing a histogram of the latency of the call to the TCPDirect function, which was surprising in how simple it was.
Cool to see Linux catching up to dtrace :)