Get the Source
The Whole Story
I recently learned about a useful performance monitoring utility that I wasn’t aware of - nicstat. Somewhat coincidentally, I also discovered that my coworker Brendan Gregg was coauthor of this utility for Solaris back when he worked at Sun.
nicstat is modeled very much after iostat, vmstat, and similar tools. It works great, but I found network utilization (%Util column) was not being calculated correctly in EC2. This was no surprise, as a xen guest has no idea about AWS imposed bandwidth throttles, but I thought it was strange that nicstat reported an interface speed of 1.410Gbps when ethtool reports a 10Gbps interface for c3.4xl - the instance type I was running this on.
Strange, but since nicstat has a -S option where the bandwidth can be overridden, no problem. So I tried that - using measured bandwidth from iperf, but that didn’t work either and nicstat kept reporting a 1.410Gbps rate.
After perusing bugs, I found the cause for the strange 1.410Gbps value - on 10Gbps interfaces there is an integer overflow that happens resulting in a bogus value. Great - so this mystery was solved, but why doesn’t the -S option work?
It turns out that nicstat will ignore the -S option if it gets a valid result from the SIOCETHTOOL call (getting it from the underlying hardware). Normally, you would expect command line options to override values, but that wasn’t the case. So, I coded up a workaround for that. Great - except, the utilization values were still wrong!
Back to the source again, where I found a bug in the implementation where utilization is always computed as a half-duplex link on Linux. Easily fixed - and now I have a working version of nicstat where I can provide an appropriate -S override for any particular EC2 instance type (and in our use at Netflix, we have a wrapper script that provides an appropriate value for each type).
As the current author of nicstat isn’t actively maintaining it, I’ve forked the source to nicstat 1.95 on github and fixed several bugs. (3 of which I reported) that have not yet made their way into the official version.
If you use nicstat, you may find this repository handy, rather than having to apply several patches manually.
$ nicstat -S eth0:2000 -l Int Loopback Mbit/s Duplex State eth0 No 2000 full up lo Yes - unkn up $ nicstat -n 1 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 23:19:29 eth0 34157.0 34140.3 39139.6 39136.8 893.6 893.3 14.0 0.00 23:19:30 eth0 37426.4 37462.9 42513.8 42546.7 901.5 901.6 15.3 0.00 23:19:31 eth0 35787.9 35787.2 41224.3 41224.3 889.0 888.9 14.7 0.00 23:19:32 eth0 25482.9 25482.2 33962.9 33961.9 768.3 768.3 10.4 0.00 23:19:33 eth0 29245.6 29244.4 36125.4 36125.4 829.0 829.0 12.0 0.00 ^C