At work, we have a large-scale deployment at AWS on Ubuntu. As a member of the Performance and Operating Systems Engineering team, I am partially responsible for building out and stabilizing the base image we use to deploy our instances. We are currently in the process of migrating to Xenial, the current Ubuntu LTS release. There’s a lot that has to happen to go from the foundation image to our deployable image. There’s a few manual things, such as making our AWS AMI bootable on both PV and HVM instance types (we’ve shared how to do this with Canonical, but they don’t seem to interested, even though it reduces operational complexity by not having to maintain multiple base images). The vast majority of building out our image, on the other hand, is an automated process involving a relatively large and complex chef recipe, which we keep backwards compatable for all versions of Ubuntu we support for our internal customers.

All this works pretty well in practice, but iterating on a new base AMI, like we are doing now for Xenial, takes some time as we try different recipes, update init scripts (systemd is new in Xenial since the last LTS - Trusty), and various other customizations we do. Making idempotent chef recipes is difficult and not worth the effort, but also that means it’s not really possible to re-run after a failed chef recipe. The end-to-end delay in trying out changes is a fairly long process - we check package source into git, let jenkins build packages, and kick off our automated AMI build process - which involves taking our foundation image, chrooting into it, running the chef recipes, and snapshotting the EBS volume into an AMI. Now, we can finally launch an EC2 instance on the AMI and see if things worked.

This all takes a fair bit of time when rapidly iterating on our base image, and I wanted to find a quicker way to try potentially breaking changes. Even though we deploy on Ubuntu, all my personal and work laptops, desktops, and servers run base Debian. Lately, I’ve been building out all my filesystems (except for /boot) with ZFS using zfsonlinux (even on my LUKS/dm-crypt encrypted laptops).

I’ve used LXC a fair bit in the past when needing to do cross-distribution builds - and I’ve used BTRFS snapshots to make cloning containers fast and space efficient. ZFS also supports copy-on-write, and is natively supported by LXC on Debian Jessie, so this seemed like a good approach - and it is!

I’ve been using this method to iterate quickly on our recipes. I have a base xenial image that I can clone and start in a few seconds to start from the beginning. I can also snapshot a container at any point in the process so that I can repeat and retry what would otherwise not be idempotent.

Some of the ZFS integration in LXC is not well documented, so here’s some rough steps on how I’m doing this on my work desktop, to help anyone else trying to figure this out.

I started with a single ZFS pool called “pool0” with several filesystems:

$ sudo zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
pool0   238G   102G   136G         -    17%    42%  1.00x  ONLINE  -

$ sudo zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
pool0              110G   120G    96K  none
pool0/home        89.6G   120G  89.6G  /home
pool0/opt         2.61G   120G  2.61G  /opt
pool0/root        6.75G   120G  5.54G  /
pool0/swap        8.50G   129G   186M  -
pool0/tmp          728K   120G   728K  /tmp
pool0/var         2.17G   120G  2.17G  /var

In order to use ZFS volumes, I wanted a new filesystem just for /var/lib/lxc, the default location for LXC containers:

$ sudo zfs create -o mountpoint=/var/lib/lxc pool0/lxc

$ sudo zfs list pool0/lxc
NAME        USED  AVAIL  REFER  MOUNTPOINT
pool0/lxc   539M   120G   124K  /var/lib/lxc

Next, I created my base Xenial LXC container:

$ sudo lxc-create -n xenial -t download -B zfs --zfsroot=pool0/lxc -- --dist ubuntu --release xenial --arch amd64

The “zfsroot” option is important - without it, LXC doesn’t know what pool or filesystem to use (it defaults to ‘tank/lxc’).

At this point, we have a working Xenial container - before starting it I manually edited /var/lib/lxc/xenial/etc/shadow removing the passwords for the “root” and “ubuntu” users. I then launch the container, login through the console, and change the passwords for both users. Then, I install openssh-server and stop the container - this is my base that I can now clone.

Cloning a container is easy, and takes just a couple of seconds:

$ sudo lxc-clone -s -o xenial -n try

$ sudo lxc-ls -f
NAME    STATE    IPV4           IPV6                                 AUTOSTART
------------------------------------------------------------------------------
try     STOPPED  -              -                                    NO
xenial  STOPPED  -              -                                    NO

$ sudo zfs list -r pool0/lxc
NAME               USED  AVAIL  REFER  MOUNTPOINT
pool0/lxc          539M   120G   124K  /var/lib/lxc
pool0/lxc/try      124M   120G   471M  /var/lib/lxc/try/rootfs
pool0/lxc/xenial   415M   120G   415M  /var/lib/lxc/xenial/rootfs

You can see that each container is in it’s own ZFS copy-on-write volume. I can easily clone and destroy containers now without going through a full build, bake, and deploy process.

Here’s a couple more hints - If you have trouble connecting to the LXC console before openssh and networking is enabled, make sure you are connecting to the console tty (for Xenial, I was otherwise getting tty1 which has no getty):

$ sudo lxc-console -n try -t 0

Finally, by default, LXC containers will not be set up with networking. It’s easy to supply an “/etc/lxc/default.conf” to resolve this:

lxc.network.type = veth
lxc.network.link = br0

And remember that the host needs bridged networking to be configured.