Sep 14, 2017 - Pelletheads is down, come on over to Pelletfan

If you cook on a pellet smoker/BBQ/grill, you might have noticed that the forum has been down for awhile. I came across which was started by one of the former moderators at pelletheads, and a good chunk of the pellet cooking community has found it.

So, if you have discovered pelletheads is down, like I did, and are looking for a replacement forum, give a try.

Dec 28, 2016 - ZFS Root Filesystem on AWS

Did you know you can create your own Linux AWS EC2 AMI which is running 100% ZFS for all filesystems (/, /boot - everything)? You can, and it’s not too hard as long as you are experienced with installing Linux without an installer. Here’s the rough instructions for setting this up with a modern Debian based system (I’ve tested with Debian and Ubuntu). As far as I know, this is the first published account of how to set this up. There aren’t any prebuilt AMIs available that I know of, but I might just do that unless someone else beats me to it.

Why run ZFS for the root filesystem? Not only is ZFS a high performing filesystem, but using native ZFS for everything makes storage management a cinch. For example, want to keep your root EBS volumes small? No problem - keep your AMI on a 1GB volume (yes, it’s possible to be that small), and extend the ZFS pool dynamically at runtime by attaching additional EBS volumes as needed. ZFS handles this extremely well.

Why build your own AMI instead of using a prebuilt one? There’s a couple of good reasons, but the primary one is that you get a minimal AMI with the least bit of cruft and bloat possible. Many of the prebuilt cloud AMIs have a bunch of package installed that you might not need or want. By building from scratch, your AMI contains just the things you want, not only lowering EBS costs, but potentially reducing security risks.

Note that we don’t do anything special with ephemeral drives here - that’s best kept in it’s own ZFS pool anyway, since mixing EBS and ephemeral drives will have some interesting performance consequences. You can use ephemerals on an instance, of course (in fact, it works great to stripe across all ephemeral drives, or you could use SSD ephemerals for ZFS L2ARC) - that’s just not the purpose of this article.

These instructions will only create an AMI that will boot on an HVM instance type. Although it’s easy enough to create a snapshot that can be registered separately as both HVM or PV AMI, all new AWS instance types support HVM. Because of this, I’ve decided only to support newer instance types, hence HVM-only.

I’ve tested this with the upcoming Debian Stretch (“testing” as of this writing), as well as Ubuntu Yakkety. It should work with Ubuntu Xenial as well, but I wouldn’t try anything earlier, since ZFS support is relatively new and maturing rapidly (last time I tried Debian Jessie with 100% ZFS I found that grub was too old to support booting into ZFS, although a separate EXT4 /boot works fine. This may have changed since then).

Again, these instructions assume you are pretty familiar with installing Debian via debootstrap, which means manually provisioning volumes, partitioning them, creating filesystems, bootstrapping, and chrooting in for final setup. If you don’t know what all these things mean, you might find this a difficult undertaking. Unlike installing to your own hardware, there’s very little instrumentation if things go wrong, and only a read-only console (if you are lucky - if networking does not initialize properly, you might not even get that). Expect this to take a few iterations and some frustration - this is a general guide, not step-by-step instructions!

If you’ve ever installed a Debian based system from scratch, you’ll note that most of these steps are no different than you’d do on physical hardware. There are only a few things that are AWS specific, but the vast majority is exactly how you’d install on bare metal.

Step 1 - Prepare Host Instance

Fire up a host instance to build out the AMI. This doesn’t need to be the same distribution or version as the AMI to be built, but it has to be recent enough to have ZFS. Debian Jessie (with jessie-backports) or Ubuntu Xenial will work.

We’ll use this instance (the “host”) to build out the target AMI, and if things don’t go well we can come back to it and try again (so don’t terminate it until you are ready, or have a working target AMI).

Once the host instance is up, provision a GP2 EBS volume via the AWS console and attach it to the host. We use a 10GB volume, but you could make this as small as 1GB if you really want to (be aware GP2 doesn’t perform well with small volumes).

We’ll assume the newly provisioned volume is attached at /dev/xvdf. The actual device might vary, use “dmesg” if you aren’t sure.

Next, update /etc/apt/sources.list with the full sources list for your host distribution. For Debian, use “main contrib non-free” - and you’ll need jessie-backports if the host is Jessie. For Ubuntu, use “main restricted universe multiverse”.

Next, install ZFS and debootstrap:

$ apt update
$ apt install zfs-zed zfsutils-linux zfs-initramfs zfs-dkms debootstrap gdisk

Step 2 - Prepare Target Pools And Filesystems

Now it’s time to set up ZFS on the new EBS volume. Assuming the target volume device is /dev/xvdf, we’ll create a GPT partition table with a small GRUB EFI partition and leave the rest of the disk for ZFS.

Be careful - many instructions out on the net for ZFS munge the sector geometry, or fake sgdisk into using an unnatural alignment. The following is correct (per AWS documentation) not only for EBS, but is the exact same geometry I use when installing Linux with ZFS root on physical hardware.

$ sgdisk -Zg -n1:0:4095 -t1:EF02 -c1:GRUB -n2:0:0 -t2:BF01 -c2:ZFS /dev/xvdf

This will create a small partition labelled GRUB (type EF02, 4096 sectors), and use the rest of the disk with a partition labelled ZFS (type BF01). The grub partition doesn’t technically need to be as big as 4096 sectors, but this insures everything is aligned properly.

It’s worth noting that I never give ZFS a full disk, and instead I always use partitions for ZFS pools. If you give ZFS the entire disk, it will create it’s own partition table, but waste 8MB in a Solaris partition that Linux has no use for.

OK, great, next up let’s create our ZFS pool and set up some filesystems. This will set the target up in /mnt. You can choose any mount point you want, just remember to use it consistently if you choose a different one.

I use the ZFS pool name “rpool”, but you can choose a different one, just be careful to substitute yours everywhere.

You may want different options - this will globally enable lz4 compression and disable atime for the pool. You may want to disable compression generally and only enable it for specific filesystems. The choice is up to you. We also allow overlay mount on /var. This is an obscure but important bit - when the system initially boots, it will log to /var/log before the /var ZFS filesystem is mounted. Because the mount point is dirty, ZFS won’t mount /var without setting the overlay flag. Note that /dev/xvdf2 is the second GPT partition we created above.

$ zpool create -o ashift=12 -O compression=lz4 -O atime=off -m none -R /mnt rpool /dev/xvdf2
$ zfs create -o mountpoint=/ rpool/root
$ zfs create -o mountpoint=/home rpool/home
$ zfs create -o mountpoint=/tmp rpool/tmp
$ zfs create -o overlay=on -o mountpoint=/var rpool/var

You may wish to have different ZFS filesystems, of course. And note we don’t set any quotas - we let all our filesystems share the entire storage pool.

At this point, the usual storage commands should show everything mounted up and ready for bootstrap (“zpool status”, “zfs list”, “df”, etc).

Step 3 - Bootstrap The Target

Now we’ll install our target distribution on the newly provisioned volume. There’s not much to do in this step:

$ debootstrap --arch amd64 stretch /mnt

Or if for Ubuntu Yakkety:

$ debootstrap --arch amd64 yakkety /mnt

Note that we can do this cross-distribution. We can bootstrap Ubuntu from a Debian host, or a Debian target from an Ubuntu host.

Step 4 - Chroot Into Target

Next up we need to chroot into the target before doing final configuration.

$ mount --rbind /dev /mnt/dev
$ mount --rbind /proc /mnt/proc
$ mount --rbind /sys /mnt/sys
$ chroot /mnt

At this point, you should have a root shell into the target system.

Step 5 - Finalize Target Configuration

Now we’ll do some final configuration. Some of the steps here are different between Debian and Ubuntu, but the general theme is the same.

Update /etc/apt/sources.list with the full sources list for your target distribution. For Debian, use “main contrib non-free”. For Ubuntu, use “main restricted universe multiverse”. Be sure you are setting up sources.list for your target distribution, not the host like we did before!

Install packages, but be sure NOT to install grub when it asks - you’ll have to acknowledge that this will result in a broken system (for now, anyway).

$ apt update

# Debian
$ apt install linux-image-amd64 linux-headers-amd64 grub-pc zfs-zed zfsutils-linux zfs-initramfs zfs-dkms cloud-init gdisk locales
$ ln -s /proc/mounts /etc/mtab

# Ubuntu
$ apt install linux-image-generic linux-headers-generic grub-pc zfs-zed zfsutils-linux zfs-initramfs zfs-dkms cloud-init gdisk

# All
$ dpkg-reconfigure locales # Choose en_US.UTF-8 or as appropriate
$ apt install --no-install-recommends openssh-server

Note creating the symlink to /etc/mtab for Debian - There was a bug in ZFS that relied on using /etc/mtab. We got that bug fixed in Ubuntu by Canonical, but as of a couple of months ago, Stretch didn’t yet have the fix - it’s probably fixed in Debian as well by now.

On Debian, I found I needed to modify GRUB_CMDLINE_LINUX in /etc/default/grub with the following. Note escaping ‘$’:

GRUB_CMDLINE_LINUX="boot=zfs \$bootfs"

This additional step might go away (or already be resolved) with a newer version of ZFS and grub in stretch. You could (should) probably add this to the grub.d configuration we add later, rather than here.

Verify grub and ZFS are happy. This is very important. If this step doesn’t work, there’s no point in continuing - the target will not boot.

$ grub-probe /

This verifies that grub is able to probe filesystems and devices and has ZFS support. If this returns an error, the target system isn’t going to boot.

Everything is good, so let’s install grub:

$ grub-install /dev/xvdf

Note we give grub the entire EBS volume of xvdf, not just xvdf1. This is important (installing to just the GRUB partition will result in a non-booting system).

Again, if this fails, you’ll need to diagnose why and potentially start over, as you won’t have a bootable target system.

Now we need to add a configuration file for grub to set a few things. To do this, create a file in “/etc/default/grub.d/50-aws-settings.cfg”:

GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 ip=dhcp tsc=reliable net.ifnames=0"

This will configure grub to log as much as possible to the AWS console, get an IP address as early as possible, and force TSC (time source) to be reliable (an obscure boot parameter required for some AWS instance classes). net.ifnames is set so ethernet adapters are enumerated as ethX instead of ensXX.

Now, let’s update grub:

$ update-grub

You might want to check “/boot/grub/grub.cfg” at this stage to see if the zfs module will be probed and it’s got the right boot line (vague advice, I know).

Finally, set the ZFS cache and reconfigure - these might be unnecessary, but since this works, I superstitiously don’t skip it :-).

$ zpool set cachefile=/etc/zfs/zpool.cache rpool
$ dpkg-reconfigure zfs-dkms

Now, just a few sundry things left to do.

Update “/etc/network/interfaces” with:

auto eth0
iface eth0 inet dhcp

Again note that we’ve altered the boot commandline so network devices will be enumerated as ethX, instead of ensXX.

Don’t drop this config into “/etc/network/interfaces.d/eth0.cfg” - cloud-init will blacklist that configuration.

Finally, you may wish to provision and configure a user (cloud-init will set up a “debian” or “ubuntu” user already by default). You may want to give root user a secure passwd and update /etc/ssh/sshd_config to allow PermitRootLogin if this is appropriate for your environment and security policies.

Step 6 - Quiesce Target Volume

Before creating an AMI, we need to exit the chroot, unmount everything, export the pool - basically quiesce the target so the volume can be snapshot.

Exit the chroot:

$ exit

Now, you should be back in the host instance.

Unmount the bind mounts (we use the lazy option, otherwise unmounts can fail):

$ umount -l /mnt/dev
$ umount -l /mnt/proc
$ umount -l /mnt/sys

And finally, export the ZFS pool.

$ zpool export rpool

Now, “zpool status”, “df”, etc should show that our target filesystems are unmounted, and /dev/xvdf is free to be safely cloned. If anything here fails (unmounting, exporting), the target will not be in a good state and won’t boot.

Step 7 - Snapshot EBS And Create AMI

Now we are all set to create an AMI from our target EBS volume.

In the AWS console, take a snapshot of the target EBS volume - this should take a minute or two.

Next, also in the AWS console, select the snapshot and register a new AMI. Be sure to register as HVM and set up ephemeral mappings as you wish. Don’t mess with kernel ID and other parameters.

Step 8 - Launch And Add Storage

Once registered, launch your shiny new AMI on an instance and enjoy ZFS root filesystem goodness.

If your instance never comes up, take a look at the console logging available in the AWS console. This is the only real avenue you have to debug a failed launch, and it’s very limited. If grub fails, the log might be empty. If networking fails, the log should have some details, but the instance will not be reachable.

A very useful debugging technique for AMIs is to terminate the instance, but don’t destroy the EBS volume - instead, attach the volume to another instance and import the ZFS pool there. This will allow you to look at logs so hopefully you can figure out why the boot failed.

If the instance doesn’t come up, you can re-import the ZFS pool on the host used to stage the target and try to fix it (remember above, I suggested leaving the host and target EBS volume around so you can iterate on it). Do the bind mounts before your chroot, and don’t forget to unmount everything and export the pool before taking another snapshot.

Login with the “debian” or “ubuntu” users (with the default passwords), if provisioned by default cloud-init - or however they are provisioned by cloud-init if you customize it. Or login as root if you set the root passwd and modified ssh configuration to allow root login.

Did it work? If so great! If not, give it another try, paying careful attention to any errors, as well as scouring output of dkms builds, etc. This isn’t completely straightforward, and it took me a few tries to get things figured out.

Now, let’s show the power of ZFS by adding 100GB, which will be available across the entire rpool, without having to fracture filesystems, mount new storage to it’s own directory, or move files around to the new device.

Assuming we used a 10GB EBS volume for the AMI, our pool probably looks something like:

$ zpool list
rpool   9.94G   784M  9.22G        -     0%     0%  1.00x  ONLINE  -

In the AWS console, create a new 100GB GP2 EBS volume and attach it to your running instance.

Assuming the volume is attached as /dev/xvdf, let’s extend rpool into this new volume:

$ sgdisk -Z -n1:0:0 -t1:BF01 -c1:ZFS /dev/xvdf
$ zpool add rpool /dev/xvdf1

This partitions the volume with a new GPT table, using everything for ZFS (again, I don’t like giving ZFS the raw volume, as it will waste a bit of space when it partitions the volume for Solaris compatibility). Finally, we extend rpool onto the new volume.

That’s it! Now we see:

$ zpool list -v
rpool     109G   784M   109G         -     0%     0%  1.00x  ONLINE  -
  xvda1  9.94G   735M  9.22G         -     7%     7%
  xvdf1  99.5G  48.7M  99.5G         -     0%     0%

We’ve added 100GB of storage completely transparently, and unlike creating a traditional EXT or XFS volume we don’t have to mount it into a new directory - with ZFS the storage is just there, and available to all our ZFS filesystems.

Thanks For Reading

Hope that helps for anyone else looking to run ZFS exclusively in AWS. While not as easy as taking an off-the-shelf prebuilt AMI, you end up with an AMI that has only a minimal Debian or Ubuntu install - you know exactly want went into it, and the process for doing so.

If you run into any issues trying this, you can indirectly contact me by commenting on this blog entry, or try in ##aws on Freenode.

Jul 19, 2016 - LXC containers on ZFS

At work, we have a large-scale deployment at AWS on Ubuntu. As a member of the Performance and Operating Systems Engineering team, I am partially responsible for building out and stabilizing the base image we use to deploy our instances. We are currently in the process of migrating to Xenial, the current Ubuntu LTS release. There’s a lot that has to happen to go from the foundation image to our deployable image. There’s a few manual things, such as making our AWS AMI bootable on both PV and HVM instance types (we’ve shared how to do this with Canonical, but they don’t seem to interested, even though it reduces operational complexity by not having to maintain multiple base images). The vast majority of building out our image, on the other hand, is an automated process involving a relatively large and complex chef recipe, which we keep backwards compatable for all versions of Ubuntu we support for our internal customers.

All this works pretty well in practice, but iterating on a new base AMI, like we are doing now for Xenial, takes some time as we try different recipes, update init scripts (systemd is new in Xenial since the last LTS - Trusty), and various other customizations we do. Making idempotent chef recipes is difficult and not worth the effort, but also that means it’s not really possible to re-run after a failed chef recipe. The end-to-end delay in trying out changes is a fairly long process - we check package source into git, let jenkins build packages, and kick off our automated AMI build process - which involves taking our foundation image, chrooting into it, running the chef recipes, and snapshotting the EBS volume into an AMI. Now, we can finally launch an EC2 instance on the AMI and see if things worked.

This all takes a fair bit of time when rapidly iterating on our base image, and I wanted to find a quicker way to try potentially breaking changes. Even though we deploy on Ubuntu, all my personal and work laptops, desktops, and servers run base Debian. Lately, I’ve been building out all my filesystems (except for /boot) with ZFS using zfsonlinux (even on my LUKS/dm-crypt encrypted laptops).

I’ve used LXC a fair bit in the past when needing to do cross-distribution builds - and I’ve used BTRFS snapshots to make cloning containers fast and space efficient. ZFS also supports copy-on-write, and is natively supported by LXC on Debian Jessie, so this seemed like a good approach - and it is!

I’ve been using this method to iterate quickly on our recipes. I have a base xenial image that I can clone and start in a few seconds to start from the beginning. I can also snapshot a container at any point in the process so that I can repeat and retry what would otherwise not be idempotent.

Some of the ZFS integration in LXC is not well documented, so here’s some rough steps on how I’m doing this on my work desktop, to help anyone else trying to figure this out.

I started with a single ZFS pool called “pool0” with several filesystems:

$ sudo zpool list
pool0   238G   102G   136G         -    17%    42%  1.00x  ONLINE  -

$ sudo zfs list
pool0              110G   120G    96K  none
pool0/home        89.6G   120G  89.6G  /home
pool0/opt         2.61G   120G  2.61G  /opt
pool0/root        6.75G   120G  5.54G  /
pool0/swap        8.50G   129G   186M  -
pool0/tmp          728K   120G   728K  /tmp
pool0/var         2.17G   120G  2.17G  /var

In order to use ZFS volumes, I wanted a new filesystem just for /var/lib/lxc, the default location for LXC containers:

$ sudo zfs create -o mountpoint=/var/lib/lxc pool0/lxc

$ sudo zfs list pool0/lxc
pool0/lxc   539M   120G   124K  /var/lib/lxc

Next, I created my base Xenial LXC container:

$ sudo lxc-create -n xenial -t download -B zfs --zfsroot=pool0/lxc -- --dist ubuntu --release xenial --arch amd64

The “zfsroot” option is important - without it, LXC doesn’t know what pool or filesystem to use (it defaults to ‘tank/lxc’).

At this point, we have a working Xenial container - before starting it I manually edited /var/lib/lxc/xenial/etc/shadow removing the passwords for the “root” and “ubuntu” users. I then launch the container, login through the console, and change the passwords for both users. Then, I install openssh-server and stop the container - this is my base that I can now clone.

Cloning a container is easy, and takes just a couple of seconds:

$ sudo lxc-clone -s -o xenial -n try

$ sudo lxc-ls -f
NAME    STATE    IPV4           IPV6                                 AUTOSTART
try     STOPPED  -              -                                    NO
xenial  STOPPED  -              -                                    NO

$ sudo zfs list -r pool0/lxc
pool0/lxc          539M   120G   124K  /var/lib/lxc
pool0/lxc/try      124M   120G   471M  /var/lib/lxc/try/rootfs
pool0/lxc/xenial   415M   120G   415M  /var/lib/lxc/xenial/rootfs

You can see that each container is in it’s own ZFS copy-on-write volume. I can easily clone and destroy containers now without going through a full build, bake, and deploy process.

Here’s a couple more hints - If you have trouble connecting to the LXC console before openssh and networking is enabled, make sure you are connecting to the console tty (for Xenial, I was otherwise getting tty1 which has no getty):

$ sudo lxc-console -n try -t 0

Finally, by default, LXC containers will not be set up with networking. It’s easy to supply an “/etc/lxc/default.conf” to resolve this: = veth = br0

And remember that the host needs bridged networking to be configured.

May 10, 2016 - Be Careful With Apache mod_headers

Note: This post has been updated since discovering this is NOT an Apache issue, and it turns out to entirely be a problem in the request processing framework of the application Apache is proxying requests to. Some frameworks follow old CGI specs that prohibit hyphens (“-“) in request header names. Apache is passing along both it’s header and the client-generated headers, but the proxied framework converts “-“ to “_” which results in a map/dictionary key collision.

As a result, my “Do this” advice has been updated.

While doing doing some Apache TLS configuration this week for work, I came across a security edge case with mod_headers and the RequestHeaders directive.

A fairly common use-case for this is to pass TLS/SSL headers to a proxied backend service when TLS termination is done in Apache. Imagine a case where client certificates are optional but the backend uses information from the certificate, such as the DN, or just validating if a client certificate was used.

Let’s take that last case as an example to illustrate this security risk, where we wish to pass along the SSL_CLIENT_VERIFY Apache variable to a backend, indicating that a client certificate was successfully used and validated. A common, but insecure configuration (which you’ll find in many guides and blogs if you search) is to do this:

RequestHeader set SSL_CLIENT_VERIFY "%{SSL_CLIENT_VERIFY}s"  # Don't do this!

This directive will add the header “Ssl-Client-Verify” to the request passed to the backend service, however this header can be overridden and spoofed by a client!

Instead, use the following configuration, which is not vulnerable to header forgery:

RequestHeader set SSLCLIENTVERIFY "%{SSL_CLIENT_VERIFY}s"  # Do this

Some request processing frameworks follow an old CGI specification that prohibits “-“ in header names and convert these to “_”, so to prevent a client from using a map/dictionary key collision to spoof headers, avoid the use of these characters entirely.

Here’s an example of header forgery, where we can easily override the Apache generated headers when specified like the “Don’t do this” case above:

$ curl --header "Ssl-Client-Verify: SPOOFED" -i

With a valid certificate we can still override the Apache generated header:

$ curl --header "Ssl-Client-Verify: SPOOFED" --cert cert.crt --key cert.key --cacert all.cert \

This is easy to test using a simple Python flask backend service with a route like the following (for easy illustration purposes only, of course):

def root():
    print request
    print request.headers
    return ''

The resulting output will show that the client was able to override the Apache header if underscores are used in the RequestHeader directive:

Ssl-Client-Verify: SPOOFED

Whereas using either the second or third form, where dashes are used instead of underscores, the client cannot spoof the header:

Ssl-Client-Verify: SUCCESS

Or if client certifications are optional and none was provided:

Ssl-Client-Verify: (null)

This vulnerability happens if the client passes a header that matches the final header of “Ssl-Client-Verify” (case doesn’t matter, so a spoofed header of “SSL-CLIENT-VERIFY” will result in header forgery). Passing a header of “SSL_CLIENT_VERIFY” from the client will not result in a spoofed header, potentially giving a false sense of security in testing.

The security risk is pretty clear - a malconfigured Apache and backend request processing framework that munges header names can result in clients spoofing headers such that a proxied service incorrectly thinks authentication or authorization has been confirmed when indeed it has not.

Be careful, do not use “-“ or “_” for header names in RequestHeader!

May 8, 2016 - Jekyll With Isso Comments

I switched this blog over to the Isso commenting system from Disqus, and added support for Isso to my popular Jekyll theme jekyll-clean. It was always a bit of a battle getting Disqus to work right - I had quite a few comments that would not show up, and just logging into Disqus doesn’t work right if you use privacy blockers like I do (Privacy Badger, Ublock Origin, and HTTPS Everywhere for those interested

  • these are all worthwhile browser extensions to use). There were always some questions about what Disqus does with data, as well.

Isso is self-hosted, which means you can’t directly use it on static webhosting such as github pages, and while your data is arguably no more safe on someone’s random self-hosted blog (such as this one!), Isso allows anonymous comments - so people only have to provide as much detail as they wish. For people who want to demand it, you can make the email and name fields mandatory, but there’s no verification so in practice there’s not much point (when I come across comment forms that require an email I always give a fake one).

We’ll see if spam is an issue - Isso has a basic moderation system. That’s one benefit of hosted solutions such as Disqus - they have a shared knowledge about spammers and can make some reasonable attempts to control it, along with requiring you create account (with the obvious downside being the lack of anonymous comments I mention above).

So, in the end, it’s not a clear choice so everyone has to choose what matters most to them - there are a few other options other than Isso as well, but I liked the fact that Isso is small and simple, written in Python, and uses sqlite for storage. There’s not much to go wrong nor much attack surface for abuse.

Integrating Isso with Jekyll is pretty easy, you can take a look at jekyll-clean to see how I approached it.

On the topic of Jekyll for blogs - I switched over to Jekyll for this blog about 1+1/2 years ago and don’t regret it for a moment. It’s simple, easy to modify and theme, and super super fast.