Exploring a Language Runtime with Bpftrace

(mgaudet.ca)

28 points | by mgaudet 5 days ago ago

3 comments

$mrlongroots 2 days ago

This is cool!
eBPF is incredible: I think of it as flipping the relationship between you and the perf tool. Instead of the tool giving you information, it asks you "what do you want to know?".
I don't know if it's google search but I find it hard to discover blogs etc. when I'm hunting for the syntax/language etc, having documented "case studies" is super useful.
$indiosmo a day ago
bpftrace was instrumental in helping me diagnose a latency spike on a hot path.
I call into a vendor library that eventually calls into Solarflare's TCPDirect API to send packets to the network.
This call usually stays under 5us and has very little variance, but I was seeing the occasional 30ms or 60ms outlier.
I attached probes to track the latency of each call down the stack and it became quite easy to pin down the specific function.
Eventually tracking it down to a single assignment to a variable that was mmapped to a file, and getting spikes due to page cache writeback.
After figuring it out I found these two articles that go into more detail on the problem.
https://rigtorp.se/virtual-memory/
https://tungdam.medium.com/our-lessons-on-linux-writeback-do...
Here's one of the probes producing a histogram of the latency of the call to the TCPDirect function, which was surprising in how simple it was.
```
  sudo bpftrace -p "$PID" -e '
    // —— TCPDirect ——
    uprobe:/lib/x86_64-linux-gnu/libonload_zf.so.1:zft_send_single {
      @zf_start[tid] = nsecs;
    }
    uretprobe:/lib/x86_64-linux-gnu/libonload_zf.so.1:zft_send_single /@zf_start[tid]/ {
      $lat = nsecs - @zf_start[tid];
      @h_zft = hist($lat / 1000);
      delete(@zf_start[tid]);
    }
  '
```
$panic a day ago

Cool to see Linux catching up to dtrace :)