Jan 26, 2015 - CLOSE_WAIT socket leak in java.net.HttpURLConnection

Comments

I came across an interesting socket leak a few weeks ago which was maddeningly difficult to track down and reproduce so it could be fixed.

This all started when I noticed a large number of CLOSE_WAIT sockets to the AWS metadata service, 169.254.169.254 on every AWS instance - see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html :

tcp6       1      0 10.0.0.1:43396      169.254.169.254:80      CLOSE_WAIT
tcp6       1      0 10.0.0.1:43438      169.254.169.254:80      CLOSE_WAIT
tcp6       1      0 10.0.0.1:43045      169.254.169.254:80      CLOSE_WAIT
tcp6       1      0 10.0.0.1:43317      169.254.169.254:80      CLOSE_WAIT
[...]
tcp6       1      0 10.0.0.1:43388      169.254.169.254:80      CLOSE_WAIT
tcp6       1      0 10.0.0.1:43279      169.254.169.254:80      CLOSE_WAIT
tcp6       1      0 10.0.0.1:43357      169.254.169.254:80      CLOSE_WAIT
tcp6       1      0 10.0.0.1:43134      169.254.169.254:80      CLOSE_WAIT

The number of these would increase, by about 2 a minute, until the kernel does it’s housekeeping for these sort of sockets - at max, there’d be around 120-130 of them.

CLOSE_WAIT means the socket has been closed by the TCP stack but is waiting on the application (Java in this case) to acknowledge the closure and clean things up from it’s side. So it seems to clearly be a bug in our code, but I wasn’t sure where. I used tcpdump (“sudo tcpdump -A host 169.254.169.254”) to see which requests were being made and made an educated guess on which service might be responsible, eventually tracking it down. This code was a couple of years old, and didn’t change except that some new metadata requests were added for VPC support, but the problem didn’t occur with the older version of the client library.

It seemed strange that new requests would cause this, but after some more tcpdumps I discovered that one of the new requests always results in a HTTP 404 from the metadata service, and these requests always resulted in a CLOSE_WAIT socket (matching up the ports in tcpdump with those from netstat). This request was only useful in VPC, but running it in EC2 Classic isn’t really a problem, it just means no metadata for that case.

It seemed pretty obvious that somehow the code must not be closing the socket after a 404, but as another engineer and I looked at the code, it didn’t actually do anything different whether it was an HTTP 200 or a 404, aside from parsing the request or not. We tried making sure all data was read from the InputStream, closing the InputStream more aggressively, calling disconnect() explicitly, and a few other things, but nothing helped. The code uses Java’s HttpURLConnection: http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html

After trying all these different things, the other engineer noticed a method on HttpURLConnection called getErrorStream() - documented as “Returns the error stream if the connection failed but the server sent useful data nonetheless”.

Turns out that whoever wrote HttpURLConnection (who I refer to as “bonehead”) decided to implement it sort of emulating stdout/stderr - if a request is a successful HTTP result code, then the output goes into one stream, but if it is an error then the output goes into a different stream. This doesn’t really help with coding because either way you still have to check the HTTP response code to see if it failed or not, but with this implementation you need another whole code block to process the ErrorStream if there was an error.

The docs for HttpURLConnection only say the following, you’ll note they don’t say anything about closing the ErrorStream: “Calling the close() methods on the InputStream or OutputStream of an HttpURLConnection after a request may free network resources associated with this instance but has no effect on any shared persistent connection. Calling the disconnect() method may close the underlying socket if a persistent connection is otherwise idle at that time”.

The fix was simple - a block of code where if the HTTP response code is not a 200, instead of just returning failure, we get the ErrorStream, read all the data off of it (discarding it), then close it.

Nov 1, 2014 - A Free 1GB RAM Upgrade

Comments

Last spring a built a cheap little Linux box for a server, consolidating NAS storage and a few other functions (fetchmail receiver, IMAP server, IRC bouncer, btsync master, Plex server, network backups, etc). Although I wasn’t on any particular budget, I was going for cheap but decent, with noise and power consumption in mind, and ended up paying around $250 for a case, mobo, CPU, and RAM. I already had a 1TB SATA drive from a Buffalo Cloudstor NAS (what a slow piece of garbage - except for the drive).

I went with an AMD A6-6400K (dual core 3.9GHz). It has an onboard Radeon GPU, which I don’t care about as this is a headless server. What I didn’t know at the time, is that this integrated GPU uses system RAM, and the MSI mainboard I got, an FM2-A75IA-E53, is somewhat limited in what options it exposes for managing this.

So I built the system, and it ran great, but I noticed only 3GB of RAM was available to Linux, even though the BIOS shows 4GB, and lshw showed a 4GB module installed. Suspecting the BIOS was allocating RAM for the GPU, I double checked the settings, but there was simply no way to choose the amount of RAM consumed by the GPU.

I found a BIOS update, but then discovered there was no way to install the update - you cannot update the BIOS from within the BIOS unless you have Windows installed (what the heck, who came up with this idea?) - and in fact the BIOS updates are distributed as a .exe file. Useless to me!

At this point, I gave up and just ignored the issue. 3GB really is enough RAM for this server, and I just lost interest in trying to solve it for a few months. I could have returned the motherboard, but in the end decided dealing with Fry’s return department wasn’t worth the effort and 3GB was enough versus the hassle. So, I forgot all about it for months.

A couple of weeks ago, I remembered this whole thing and decided to try and figure it out again. I started with the BIOS update, and found that it could be done by creating an MS-DOS bootable USB stick, and found the instructions to do that. And it worked! Great, now that I’m on the latest BIOS I bet they fixed this.

My hopes were quickly dashed as I discovered there were still no options in the MSI BIOS to control RAM for the GPU. I started to think that maybe it wasn’t this simple - maybe there was something else at fault here. Maybe the BIOS wasn’t exposing things correctly to Linux, or maybe some esoteric IOMMU driver was needed, or the Linux kernel needed to coordinate with the GPU, so I installed the radeon driver hoping that maybe Linux had to ask for the RAM back once aware of the GPU. All of this, whether crazy or not, led nowhere. And yes, of course I am running a 64bit kernel.

I did a ton of Google searches and research on this particular problem, back after building the system, then more recently as I worked through this problem. I found vague references to others with similar issues, but nobody had a solution that jived with the problem I was having. Searches of MSI’s own support site also turned up nothing, and I was about to check with their tech support site (perhaps I should have done this from the beginning), when I thought I’d try one last thing - was there a way to disable the GPU? I didn’t really want to go down this road, as while the server is headless, I do occasionally plug it in to a monitor. I was also a bit concerned that if I disabled the GPU then the system wouldn’t POST at all and I’d have to force the CMOS to be cleared on the BIOS and figure out how to get things back. That all seemed like more effort than just ignoring the missing 1GB of RAM to me.

So I duckduckgo’d on how to disable the GPU in the BIOS, wading through article after article and several posts on MSI’s tech support site when I came across this post. Oddly, I had never come across this post in all my searches about how to reduce the GPU memory, but this seemed to indicate there was a very sneaky trick - to reduce GPU RAM, you have to select “Dual Graphics”, then (and only then) will the MSI BIOS expose the option to change RAM allocation.

This is so very frustrating of MSI to bury this option inside some other option that no sane person would think to try. It seems so simple now that I know the answer, but just like a video game cheat code, you have to know the exact right set of options in order to expose this one.

So now, I was able to reduce the 1GB allocation to a minimum of 32MB. That’s still a small chunk of wasted RAM, but 32MB doesn’t make me lose any sleep.

So I rebooted, and… YES! Linux now has 4(ish)GB RAM available.

Oct 23, 2014 - VW Electric Golf (eGolf) Spotted

Comments

Yesterday I was behind a Volkswagen Electric Golf (VW eGolf) with manufacturer plates (Michigan) - the vehicle goes on sale in November, and it was clearly a production model with no attempt at stealth.

Here are a couple of pictures, Enjoy!

eGolf photo 1

eGolf photo 2

I’ve read that VW decided to air-cool the batteries, removing the liquid-cooled thermal management system (TMS) that any proper EV with lithium-ion batteries needs. Nissan found this out the hard way with the Leaf and the resulting class-action lawsuit when their batteries start to die after just a few years. Amazingly, Nissan never fixed this engineering error, and it looks like VW is going down the same road.

For comparison, the Tesla, Volt, and Spark EV (the latter I have), all use a liquid-cooled TMS to insure batteries stay at a proper operating temperature.

Sep 30, 2014 - Nvidia driver on backported kernel in Wheezy

Comments

Update 2015-01-05: This is no longer necessary. When I wrote this, the current version of nvidia-kernel-dkms was 319.82. The current version in the Wheezy Debian repository (as I write this comment) is 340.65 which resolves this bug, at least on the current backported kernel (3.16.0-0.bpo-4). If you are still encountering this issue, please make sure you are on both the latest kernel and nvidia driver.

If you like to use a backported kernel in Debian Wheezy, like I do, and have systems with Nvidia graphics cards, you’ve likely discovered an incompatibility between recent kernels (since 3.13) and the Nvidia driver.

Because of this problem, I’ve been using the Nouveau driver since 3.13, but today I decided to give nvidia-kernel-dkms another try after getting upgraded to the 3.16 kernel.

I was disappointed to find the same incompatibility still exists. Specifically, the problem is that a acpi_os_wait_events_complete was removed in the Linux kernel, but the Nvidia driver still references it. You’ll you know you’ve hit this problem when X doesn’t start and dmesg or /var/log/syslog contains:

kernel: [   27.487723] nvidia: Unknown symbol acpi_os_wait_events_complete (err 0)

Instead of just doing my usual thing of switching back to the Nouveau driver, I was curious how hard it would be to patch the driver. I know the patch for this exists, and part of the whole point of DKMS is to make it easy to compile drivers that are not part of the core kernel.

As it turns out, this problem is really, really easy to fix and you should be running with the Nvidia driver in just a few minutes if you follow these instructions.

Instructions

  • This assumes you are already running on a backported kernel. I’m using 3.16, but this should work for 3.13 and 3.14 kernels as well (there was no backport of 3.15 in Wheezy, as far as I am aware).
  • First, follow the instructions here on the Debian wiki under the Wheezy backports section. Make sure you follow all of those 6 steps exactly.
  • After rebooting, you’ll have no X display and see the Unknown symbol error message seen above.
  • Switch to a VT (Alt-F1) and login to your system.
  • As root, edit the file /usr/src/nvidia-current-319.82/nv-acpi.c (As of this time, the 319-82 part may change if the backport is updated, and of course the contents of the file might change as well - you may have to find the section in question). Comment out lines 304-310 so the file looks like:
/*
    if (pNvAcpiObject->notify_handler_installed)
    {
        NV_ACPI_OS_WAIT_EVENTS_COMPLETE();

        // remove event notifier
        status = acpi_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY, nv_acpi_event);
    }
*/
  • Now run the following command to rebuild the Nvidia DKMS module:
$ sudo dpkg-reconfigure nvidia-kernel-dkms
  • And finally, restart your display manager (I use lightdm, so yours might be different):
$ sudo /etc/init.d/lightdm stop
$ sudo /etc/init.d/lightdm start
  • And that’s it! There’s no need to reboot, X will load the Nvidia driver automatically as long as you set up /etc/X11/xorg.conf.d/20-nvidia.conf per the Debian wiki’s instructions.

Of course, if you get a system update that changes the nv-acpi.c file, then you’ll have to reapply the patch. As long as that file isn’t changed, things will otherwise be fine, even if your kernel gets upgraded, since a new DKMS module will be built from the modified source.

Sep 29, 2014 - nicstat Patched for EC2

Comments

Get the Source

Get the patched nicstat utility here. It has several bugs fixed, of interest to anyone running Linux, whether in EC2 or on physical hardware.

The Whole Story

I recently learned about a useful performance monitoring utility that I wasn’t aware of - nicstat. Somewhat coincidentally, I also discovered that my coworker Brendan Gregg was coauthor of this utility for Solaris back when he worked at Sun.

nicstat is modeled very much after iostat, vmstat, and similar tools. It works great, but I found network utilization (%Util column) was not being calculated correctly in EC2. This was no surprise, as a xen guest has no idea about AWS imposed bandwidth throttles, but I thought it was strange that nicstat reported an interface speed of 1.410Gbps when ethtool reports a 10Gbps interface for c3.4xl - the instance type I was running this on.

Strange, but since nicstat has a -S option where the bandwidth can be overridden, no problem. So I tried that - using measured bandwidth from iperf, but that didn’t work either and nicstat kept reporting a 1.410Gbps rate.

After perusing bugs, I found the cause for the strange 1.410Gbps value - on 10Gbps interfaces there is an integer overflow that happens resulting in a bogus value. Great - so this mystery was solved, but why doesn’t the -S option work?

It turns out that nicstat will ignore the -S option if it gets a valid result from the SIOCETHTOOL call (getting it from the underlying hardware). Normally, you would expect command line options to override values, but that wasn’t the case. So, I coded up a workaround for that. Great - except, the utilization values were still wrong!

Back to the source again, where I found a bug in the implementation where utilization is always computed as a half-duplex link on Linux. Easily fixed - and now I have a working version of nicstat where I can provide an appropriate -S override for any particular EC2 instance type (and in our use at Netflix, we have a wrapper script that provides an appropriate value for each type).

As the current author of nicstat isn’t actively maintaining it, I’ve forked the source to nicstat 1.95 on github and fixed several bugs. (3 of which I reported) that have not yet made their way into the official version.

If you use nicstat, you may find this repository handy, rather than having to apply several patches manually.

$ nicstat -S eth0:2000 -l
Int      Loopback   Mbit/s Duplex State
eth0           No     2000   full    up
lo            Yes        -   unkn    up

$ nicstat -n 1
    Time      Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs %Util    Sat
23:19:29     eth0 34157.0 34140.3 39139.6 39136.8   893.6   893.3  14.0   0.00
23:19:30     eth0 37426.4 37462.9 42513.8 42546.7   901.5   901.6  15.3   0.00
23:19:31     eth0 35787.9 35787.2 41224.3 41224.3   889.0   888.9  14.7   0.00
23:19:32     eth0 25482.9 25482.2 33962.9 33961.9   768.3   768.3  10.4   0.00
23:19:33     eth0 29245.6 29244.4 36125.4 36125.4   829.0   829.0  12.0   0.00
^C