I personally believe once you get beyond a handful of GPUs, people probably end up using both levels of telemetry because they answer different questions. NVML is nice for per-request attribution and understanding model behavior, but I believe PDU/BMC measurements are better suited for actual power draw since they capture everything (CPUs, networking, PSU losses, fans, etc.).
For instance, people running 32+ GPU setups probably correlate timestamps rather than trying to preserve strict per-request attribution at the rack level. This will enable these individuals to have rack/PDU power sampled every second.
Either way, I haven't seen many people publish how they instrument this in practice so take what I wrote with a gran of salt. I simple wanted to share a little bit of what I understand and I hope it helps.
I personally believe once you get beyond a handful of GPUs, people probably end up using both levels of telemetry because they answer different questions. NVML is nice for per-request attribution and understanding model behavior, but I believe PDU/BMC measurements are better suited for actual power draw since they capture everything (CPUs, networking, PSU losses, fans, etc.).
For instance, people running 32+ GPU setups probably correlate timestamps rather than trying to preserve strict per-request attribution at the rack level. This will enable these individuals to have rack/PDU power sampled every second.
Either way, I haven't seen many people publish how they instrument this in practice so take what I wrote with a gran of salt. I simple wanted to share a little bit of what I understand and I hope it helps.