Did you know you can create your own Linux AWS EC2 AMI which is running 100%
ZFS for all filesystems (/, /boot - everything)? You can, and it’s not too hard
as long as you are experienced with installing Linux without an installer.
Here’s the rough instructions for setting this up with a modern Debian based
system (I’ve tested with Debian and Ubuntu). As far as I know, this is the
first published account of how to set this up. There aren’t any prebuilt AMIs
available that I know of, but I might just do that unless someone else beats me
Why run ZFS for the root filesystem? Not only is ZFS a high performing
filesystem, but using native ZFS for everything makes storage management a
cinch. For example, want to keep your root EBS volumes small? No problem - keep
your AMI on a 1GB volume (yes, it’s possible to be that small), and extend the
ZFS pool dynamically at runtime by attaching additional EBS volumes as needed.
ZFS handles this extremely well.
Why build your own AMI instead of using a prebuilt one? There’s a couple of good
reasons, but the primary one is that you get a minimal AMI with the least bit
of cruft and bloat possible. Many of the prebuilt cloud AMIs have a bunch of
package installed that you might not need or want. By building from scratch,
your AMI contains just the things you want, not only lowering EBS costs, but
potentially reducing security risks.
Note that we don’t do anything special with ephemeral drives here - that’s best
kept in it’s own ZFS pool anyway, since mixing EBS and ephemeral drives will
have some interesting performance consequences. You can use ephemerals on an
instance, of course (in fact, it works great to stripe across all ephemeral
drives, or you could use SSD ephemerals for ZFS L2ARC) - that’s just not the
purpose of this article.
These instructions will only create an AMI that will boot on an HVM instance
type. Although it’s easy enough to create a snapshot that can be registered
separately as both HVM or PV AMI, all new AWS instance types support HVM.
Because of this, I’ve decided only to support newer instance types, hence
I’ve tested this with the upcoming Debian Stretch (“testing” as of this
writing), as well as Ubuntu Yakkety. It should work with Ubuntu Xenial as well,
but I wouldn’t try anything earlier, since ZFS support is relatively new and
maturing rapidly (last time I tried Debian Jessie with 100% ZFS I found that
grub was too old to support booting into ZFS, although a separate EXT4 /boot
works fine. This may have changed since then).
Again, these instructions assume you are pretty familiar with installing Debian
via debootstrap, which means manually provisioning volumes, partitioning them,
creating filesystems, bootstrapping, and chrooting in for final setup. If you
don’t know what all these things mean, you might find this a difficult
undertaking. Unlike installing to your own hardware, there’s very little
instrumentation if things go wrong, and only a read-only console (if you are
lucky - if networking does not initialize properly, you might not even get
that). Expect this to take a few iterations and some frustration - this is a
general guide, not step-by-step instructions!
If you’ve ever installed a Debian based system from scratch, you’ll note that
most of these steps are no different than you’d do on physical hardware. There
are only a few things that are AWS specific, but the vast majority is exactly
how you’d install on bare metal.
Step 1 - Prepare Host Instance
Fire up a host instance to build out the AMI. This doesn’t need to be the same
distribution or version as the AMI to be built, but it has to be recent enough
to have ZFS. Debian Jessie (with jessie-backports) or Ubuntu Xenial will work.
We’ll use this instance (the “host”) to build out the target AMI, and if things
don’t go well we can come back to it and try again (so don’t terminate it until
you are ready, or have a working target AMI).
Once the host instance is up, provision a GP2 EBS volume via the AWS console
and attach it to the host. We use a 10GB volume, but you could make this as
small as 1GB if you really want to (be aware GP2 doesn’t perform well with
We’ll assume the newly provisioned volume is attached at /dev/xvdf. The actual
device might vary, use “dmesg” if you aren’t sure.
Next, update /etc/apt/sources.list with the full sources list for your host
distribution. For Debian, use “main contrib non-free” - and you’ll need
jessie-backports if the host is Jessie. For Ubuntu, use “main restricted
Next, install ZFS and debootstrap:
$ apt update
$ apt install zfs-zed zfsutils-linux zfs-initramfs zfs-dkms debootstrap gdisk
Step 2 - Prepare Target Pools And Filesystems
Now it’s time to set up ZFS on the new EBS volume. Assuming the target volume
device is /dev/xvdf, we’ll create a GPT partition table with a small GRUB EFI
partition and leave the rest of the disk for ZFS.
Be careful - many instructions out on the net for ZFS munge the sector
geometry, or fake sgdisk into using an unnatural alignment. The following is
correct (per AWS documentation) not only for EBS, but is the exact same
geometry I use when installing Linux with ZFS root on physical hardware.
$ sgdisk -Zg -n1:0:4095 -t1:EF02 -c1:GRUB -n2:0:0 -t2:BF01 -c2:ZFS /dev/xvdf
This will create a small partition labelled GRUB (type EF02, 4096 sectors), and
use the rest of the disk with a partition labelled ZFS (type BF01). The grub
partition doesn’t technically need to be as big as 4096 sectors, but this
insures everything is aligned properly.
It’s worth noting that I never give ZFS a full disk, and instead I always use
partitions for ZFS pools. If you give ZFS the entire disk, it will create it’s
own partition table, but waste 8MB in a Solaris partition that Linux has no use
OK, great, next up let’s create our ZFS pool and set up some filesystems. This
will set the target up in /mnt. You can choose any mount point you want,
just remember to use it consistently if you choose a different one.
I use the ZFS pool name “rpool”, but you can choose a different one, just be
careful to substitute yours everywhere.
You may want different options - this will globally enable lz4 compression and
disable atime for the pool. You may want to disable compression generally and
only enable it for specific filesystems. The choice is up to you. We also allow
overlay mount on /var. This is an obscure but important bit - when the system
initially boots, it will log to /var/log before the /var ZFS filesystem is
mounted. Because the mount point is dirty, ZFS won’t mount /var without setting
the overlay flag. Note that /dev/xvdf2 is the second GPT partition we created
$ zpool create -o ashift=12 -O compression=lz4 -O atime=off -m none -R /mnt rpool /dev/xvdf2
$ zfs create -o mountpoint=/ rpool/root
$ zfs create -o mountpoint=/home rpool/home
$ zfs create -o mountpoint=/tmp rpool/tmp
$ zfs create -o overlay=on -o mountpoint=/var rpool/var
You may wish to have different ZFS filesystems, of course. And note we don’t
set any quotas - we let all our filesystems share the entire storage pool.
At this point, the usual storage commands should show everything mounted up
and ready for bootstrap (“zpool status”, “zfs list”, “df”, etc).
Step 3 - Bootstrap The Target
Now we’ll install our target distribution on the newly provisioned volume.
There’s not much to do in this step:
$ debootstrap --arch amd64 stretch /mnt
Or if for Ubuntu Yakkety:
$ debootstrap --arch amd64 yakkety /mnt
Note that we can do this cross-distribution. We can bootstrap Ubuntu from a
Debian host, or a Debian target from an Ubuntu host.
Step 4 - Chroot Into Target
Next up we need to chroot into the target before doing final configuration.
$ mount --rbind /dev /mnt/dev
$ mount --rbind /proc /mnt/proc
$ mount --rbind /sys /mnt/sys
$ chroot /mnt
At this point, you should have a root shell into the target system.
Step 5 - Finalize Target Configuration
Now we’ll do some final configuration. Some of the steps here are different
between Debian and Ubuntu, but the general theme is the same.
Update /etc/apt/sources.list with the full sources list for your target
distribution. For Debian, use “main contrib non-free”. For Ubuntu, use
“main restricted universe multiverse”. Be sure you are setting up sources.list
for your target distribution, not the host like we did before!
Install packages, but be sure NOT to install grub when it asks - you’ll have
to acknowledge that this will result in a broken system (for now, anyway).
$ apt update
$ apt install linux-image-amd64 linux-headers-amd64 grub-pc zfs-zed zfsutils-linux zfs-initramfs zfs-dkms cloud-init gdisk locales
$ ln -s /proc/mounts /etc/mtab
$ apt install linux-image-generic linux-headers-generic grub-pc zfs-zed zfsutils-linux zfs-initramfs zfs-dkms cloud-init gdisk
$ dpkg-reconfigure locales # Choose en_US.UTF-8 or as appropriate
$ apt install --no-install-recommends openssh-server
Note creating the symlink to /etc/mtab for Debian - There was a bug in ZFS that
relied on using /etc/mtab. We got that bug fixed in Ubuntu by Canonical, but as
of a couple of months ago, Stretch didn’t yet have the fix - it’s probably
fixed in Debian as well by now.
On Debian, I found I needed to modify GRUB_CMDLINE_LINUX in /etc/default/grub
with the following. Note escaping ‘$’:
This additional step might go away (or already be resolved) with a newer
version of ZFS and grub in stretch. You could (should) probably add this to the
grub.d configuration we add later, rather than here.
Verify grub and ZFS are happy. This is very important. If this step doesn’t
work, there’s no point in continuing - the target will not boot.
This verifies that grub is able to probe filesystems and devices and has ZFS
support. If this returns an error, the target system isn’t going to boot.
Everything is good, so let’s install grub:
Note we give grub the entire EBS volume of xvdf, not just xvdf1. This is
important (installing to just the GRUB partition will result in a non-booting
Again, if this fails, you’ll need to diagnose why and potentially start over,
as you won’t have a bootable target system.
Now we need to add a configuration file for grub to set a few things. To do
this, create a file in “/etc/default/grub.d/50-aws-settings.cfg”:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 ip=dhcp tsc=reliable net.ifnames=0"
This will configure grub to log as much as possible to the AWS console, get an
IP address as early as possible, and force TSC (time source) to be reliable (an
obscure boot parameter required for some AWS instance classes). net.ifnames is
set so ethernet adapters are enumerated as ethX instead of ensXX.
Now, let’s update grub:
You might want to check “/boot/grub/grub.cfg” at this stage to see if the zfs
module will be probed and it’s got the right boot line (vague advice, I know).
Finally, set the ZFS cache and reconfigure - these might be unnecessary, but
since this works, I superstitiously don’t skip it :-).
$ zpool set cachefile=/etc/zfs/zpool.cache rpool
$ dpkg-reconfigure zfs-dkms
Now, just a few sundry things left to do.
Update “/etc/network/interfaces” with:
iface eth0 inet dhcp
Again note that we’ve altered the boot commandline so network devices will be
enumerated as ethX, instead of ensXX.
Don’t drop this config into “/etc/network/interfaces.d/eth0.cfg” - cloud-init
will blacklist that configuration.
Finally, you may wish to provision and configure a user (cloud-init will set up
a “debian” or “ubuntu” user already by default). You may want to give root user
a secure passwd and update /etc/ssh/sshd_config to allow PermitRootLogin if this
is appropriate for your environment and security policies.
Step 6 - Quiesce Target Volume
Before creating an AMI, we need to exit the chroot, unmount everything, export
the pool - basically quiesce the target so the volume can be snapshot.
Exit the chroot:
Now, you should be back in the host instance.
Unmount the bind mounts (we use the lazy option, otherwise unmounts can fail):
$ umount -l /mnt/dev
$ umount -l /mnt/proc
$ umount -l /mnt/sys
And finally, export the ZFS pool.
Now, “zpool status”, “df”, etc should show that our target filesystems are
unmounted, and /dev/xvdf is free to be safely cloned. If anything here fails
(unmounting, exporting), the target will not be in a good state and won’t boot.
Step 7 - Snapshot EBS And Create AMI
Now we are all set to create an AMI from our target EBS volume.
In the AWS console, take a snapshot of the target EBS volume - this should take
a minute or two.
Next, also in the AWS console, select the snapshot and register a new AMI. Be
sure to register as HVM and set up ephemeral mappings as you wish. Don’t mess
with kernel ID and other parameters.
Step 8 - Launch And Add Storage
Once registered, launch your shiny new AMI on an instance and enjoy ZFS root
If your instance never comes up, take a look at the console logging available
in the AWS console. This is the only real avenue you have to debug a failed
launch, and it’s very limited. If grub fails, the log might be empty. If
networking fails, the log should have some details, but the instance will not
A very useful debugging technique for AMIs is to terminate the instance, but
don’t destroy the EBS volume - instead, attach the volume to another instance
and import the ZFS pool there. This will allow you to look at logs so hopefully
you can figure out why the boot failed.
If the instance doesn’t come up, you can re-import the ZFS pool on the host
used to stage the target and try to fix it (remember above, I suggested leaving
the host and target EBS volume around so you can iterate on it). Do the bind
mounts before your chroot, and don’t forget to unmount everything and export
the pool before taking another snapshot.
Login with the “debian” or “ubuntu” users (with the default passwords), if
provisioned by default cloud-init - or however they are provisioned by
cloud-init if you customize it. Or login as root if you set the root passwd and
modified ssh configuration to allow root login.
Did it work? If so great! If not, give it another try, paying careful attention
to any errors, as well as scouring output of dkms builds, etc. This isn’t
completely straightforward, and it took me a few tries to get things figured
Now, let’s show the power of ZFS by adding 100GB, which will be available
across the entire rpool, without having to fracture filesystems, mount new
storage to it’s own directory, or move files around to the new device.
Assuming we used a 10GB EBS volume for the AMI, our pool probably looks
$ zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 9.94G 784M 9.22G - 0% 0% 1.00x ONLINE -
In the AWS console, create a new 100GB GP2 EBS volume and attach it to your
Assuming the volume is attached as /dev/xvdf, let’s extend rpool into this
$ sgdisk -Z -n1:0:0 -t1:BF01 -c1:ZFS /dev/xvdf
$ zpool add rpool /dev/xvdf1
This partitions the volume with a new GPT table, using everything for ZFS
(again, I don’t like giving ZFS the raw volume, as it will waste a bit of space
when it partitions the volume for Solaris compatibility). Finally, we extend
rpool onto the new volume.
That’s it! Now we see:
$ zpool list -v
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 109G 784M 109G - 0% 0% 1.00x ONLINE -
xvda1 9.94G 735M 9.22G - 7% 7%
xvdf1 99.5G 48.7M 99.5G - 0% 0%
We’ve added 100GB of storage completely transparently, and unlike creating a
traditional EXT or XFS volume we don’t have to mount it into a new directory -
with ZFS the storage is just there, and available to all our ZFS filesystems.
Thanks For Reading
Hope that helps for anyone else looking to run ZFS exclusively in AWS. While
not as easy as taking an off-the-shelf prebuilt AMI, you end up with an AMI
that has only a minimal Debian or Ubuntu install - you know exactly want went
into it, and the process for doing so.
If you run into any issues trying this, you can indirectly contact me by
commenting on this blog entry, or try in ##aws on Freenode.