At work, we have a large-scale deployment at AWS on Ubuntu. As a member of the Performance and Operating Systems Engineering team, I am partially responsible for building out and stabilizing the base image we use to deploy our instances. We are currently in the process of migrating to Xenial, the current Ubuntu LTS release. There’s a lot that has to happen to go from the foundation image to our deployable image. There’s a few manual things, such as making our AWS AMI bootable on both PV and HVM instance types (we’ve shared how to do this with Canonical, but they don’t seem to interested, even though it reduces operational complexity by not having to maintain multiple base images). The vast majority of building out our image, on the other hand, is an automated process involving a relatively large and complex chef recipe, which we keep backwards compatable for all versions of Ubuntu we support for our internal customers.
All this works pretty well in practice, but iterating on a new base AMI, like we are doing now for Xenial, takes some time as we try different recipes, update init scripts (systemd is new in Xenial since the last LTS - Trusty), and various other customizations we do. Making idempotent chef recipes is difficult and not worth the effort, but also that means it’s not really possible to re-run after a failed chef recipe. The end-to-end delay in trying out changes is a fairly long process - we check package source into git, let jenkins build packages, and kick off our automated AMI build process - which involves taking our foundation image, chrooting into it, running the chef recipes, and snapshotting the EBS volume into an AMI. Now, we can finally launch an EC2 instance on the AMI and see if things worked.
This all takes a fair bit of time when rapidly iterating on our base image, and I wanted to find a quicker way to try potentially breaking changes. Even though we deploy on Ubuntu, all my personal and work laptops, desktops, and servers run base Debian. Lately, I’ve been building out all my filesystems (except for /boot) with ZFS using zfsonlinux (even on my LUKS/dm-crypt encrypted laptops).
I’ve used LXC a fair bit in the past when needing to do cross-distribution builds - and I’ve used BTRFS snapshots to make cloning containers fast and space efficient. ZFS also supports copy-on-write, and is natively supported by LXC on Debian Jessie, so this seemed like a good approach - and it is!
I’ve been using this method to iterate quickly on our recipes. I have a base xenial image that I can clone and start in a few seconds to start from the beginning. I can also snapshot a container at any point in the process so that I can repeat and retry what would otherwise not be idempotent.
Some of the ZFS integration in LXC is not well documented, so here’s some rough steps on how I’m doing this on my work desktop, to help anyone else trying to figure this out.
I started with a single ZFS pool called “pool0” with several filesystems:
$ sudo zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT pool0 238G 102G 136G - 17% 42% 1.00x ONLINE - $ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT pool0 110G 120G 96K none pool0/home 89.6G 120G 89.6G /home pool0/opt 2.61G 120G 2.61G /opt pool0/root 6.75G 120G 5.54G / pool0/swap 8.50G 129G 186M - pool0/tmp 728K 120G 728K /tmp pool0/var 2.17G 120G 2.17G /var
In order to use ZFS volumes, I wanted a new filesystem just for /var/lib/lxc, the default location for LXC containers:
$ sudo zfs create -o mountpoint=/var/lib/lxc pool0/lxc $ sudo zfs list pool0/lxc NAME USED AVAIL REFER MOUNTPOINT pool0/lxc 539M 120G 124K /var/lib/lxc
Next, I created my base Xenial LXC container:
$ sudo lxc-create -n xenial -t download -B zfs --zfsroot=pool0/lxc -- --dist ubuntu --release xenial --arch amd64
The “zfsroot” option is important - without it, LXC doesn’t know what pool or filesystem to use (it defaults to ‘tank/lxc’).
At this point, we have a working Xenial container - before starting it I manually edited /var/lib/lxc/xenial/etc/shadow removing the passwords for the “root” and “ubuntu” users. I then launch the container, login through the console, and change the passwords for both users. Then, I install openssh-server and stop the container - this is my base that I can now clone.
Cloning a container is easy, and takes just a couple of seconds:
$ sudo lxc-clone -s -o xenial -n try $ sudo lxc-ls -f NAME STATE IPV4 IPV6 AUTOSTART ------------------------------------------------------------------------------ try STOPPED - - NO xenial STOPPED - - NO $ sudo zfs list -r pool0/lxc NAME USED AVAIL REFER MOUNTPOINT pool0/lxc 539M 120G 124K /var/lib/lxc pool0/lxc/try 124M 120G 471M /var/lib/lxc/try/rootfs pool0/lxc/xenial 415M 120G 415M /var/lib/lxc/xenial/rootfs
You can see that each container is in it’s own ZFS copy-on-write volume. I can easily clone and destroy containers now without going through a full build, bake, and deploy process.
Here’s a couple more hints - If you have trouble connecting to the LXC console before openssh and networking is enabled, make sure you are connecting to the console tty (for Xenial, I was otherwise getting tty1 which has no getty):
$ sudo lxc-console -n try -t 0
Finally, by default, LXC containers will not be set up with networking. It’s easy to supply an “/etc/lxc/default.conf” to resolve this:
lxc.network.type = veth lxc.network.link = br0
And remember that the host needs bridged networking to be configured.