====== Using bhyve and jails on FreeBSD to host services ====== This is a short guide on how I have migrated from various Linux distributions to FreeBSD with [[https://docs.freebsd.org/en/books/handbook/jails/|jails]] and [[https://docs.freebsd.org/en/books/handbook/virtualization/#virtualization-host-bhyve|bhyve]] for the occasional service that must run on Linux. I have a decent amount of professional experience building reliable FreeBSD systems but for the sake of convenience I have been running Linux in my lab for quite a while. I am sure that all of this can be improved in one way or another but in my lab I tend to follow the path of least resistance, since I am not expected by customers or managers to do 110% :) A few quick warnings: * This is deliberately not a copy/paste guide, so read carefully * I do not cover headless/serial installs for *nix. I already use VNC in my environment so was fine with using it here * You may find my network setup (or setup in general) opinionated * I have not personally benchmarked NVMe vs sparse-zvol storage but on my setup sparse-zvol is "good enough" If you are confused at any point, consult the [[https://docs.freebsd.org/en/books/handbook/|FreeBSD Handbook]]. It is very easy to parse, and written by experts in the FreeBSD community. It puts a certain Linux wiki to shame... ===== Initial host setup ===== You should start by installing the latest release of [[https://www.freebsd.org/|FreeBSD]], or [[https://hardenedbsd.org/|HardenedBSD]] if you are looking for a more proactively secure experience. The same steps should generally apply to both. I have not tested on HardenedBSD so YMMV, some things may need some adapting. If you are looking to use FreeBSD you should be able to figure those things out. I would recommend running ZFS unless you are on an extremely memory-limited system, and even then I would still recommend running ZFS. The rest of this guide assumes you are. The [[https://wiki.freebsd.org/ZFSTuningGuide#i386|FreeBSD tuning guide]] has an example of one maintainer working on a laptop with ZFS and 768MB of RAM using only minor tweaking. The advantages are worth it, and ZFS will only use as much memory as it needs. Rumors of ZFS being a memory hog originate from loud and confident but ultimately clueless people. More memory is always good of course, but this is universally true. Unless you are building a high-performance multi-user pool I would not worry about it, ZFS will handle your workloads beautifully. Zpool layout is ultimately up to you, but if you have enough disks for a RAIDZ layout you should [[https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/|use mirrored vdevs, not RAIDZ]]. I will not summarize this post here because you should read and understand all of it, but I happily accepted the tradeoffs for every system I've deployed and have never regretted that choice. Both the FreeBSD and HardenedBSD installers will ask you to apply security tunables. You should probably pick all of these but depending on your environment you may need one or two disabled. There are a few things to start with on our freshly installed system. First, you'll want to bootstrap ''pkg'' and install [[https://github.com/churchers/vm-bhyve|vm-bhyve]] package, along with anything else you need such as editors, shells, etc. I don't use CBSD, I found it hard to work with and too opinionated for my tastes. The packages I install on a new machine of this type are as follows, tweak as you see fit: ''# pkg ins -y zsh git wget curl nginx emacs automake autoconf gmake m4 gcc gsed tmux rsync python39 py39-pip bash vm-bhyve bhyve-firmware grub2-bhyve'' First, we'll configure our bhyve/jail dataset and NIC. In my configuration VMs and jails are on 10.0.0.0/24, the gateway is on 10.0.0.254, and traffic is NAT'd by pf. Each service is CNAME'd to the host running it, and the traffic is routed based on hostname by Nginx. Other configurations are out of scope. The machine I am using for an example has two pools (technically three with the root pool), ''data'' (4x4TB SATA HDDs in a 2x2 mirror + 1 spare) and ''ssd'' (2x1TB SATA SSDs in a 1x1 mirror + 1 spare). Yes, inspiring pool names, I know. :) Depending on how much I care about IOPS and responsiveness I throw things on one pool or the other. Both pools have the same layout: data/jail/ data/jail/tmpl/ data/vm/ data/vm/tmpl/ ssd/jail/ ssd/jail/tmpl/ ssd/vm/ ssd/vm/tmpl/ ''vm-bhyve'''s main pool is ''data/vm'' so templates, ISOs, etc can live there without needing to be replicated to the other pool. ''tmpl'' is used for VM/jail template images, not the ''vm-bhyve'' templates themselves. Create your datasets however you would like and then proceed with setting up ''vm-bhyve''. Initial setup goes something like this, change as needed: # zfs create pool/vm # sysrc vm_enable="YES" # sysrc vm_dir="zfs:pool/vm" # vm init # cp /usr/local/share/examples/vm-bhyve/* /mountpoint/for/pool/vm/.templates/ # vm switch create -a 10.0.0.254/24 public # vm switch add public em0 These commands create the ''vm'' dataset, enable and initialize ''vm-bhyve'', set up templates, and create a virtual switch. It is recommended that you not change the name of the switch. The ''-a'' argument is also important- in my experience attempting to set the IP address of the switch after creation results in strange behavior. As mentioned before, in this setup traffic is NAT'd by the venerable [[https://docs.freebsd.org/en/books/handbook/firewalls/#firewalls-pf|pf]]. The following very simple configuration will work for this setup, additional filtering/translation is up to your discretion. ''/etc/pf.conf'' # network macros ext_if = "em0" pub_ip = "10.5.5.252" int_if = "vm-public" jail_net = $int_if:network # nat jail traffic nat on $ext_if from $jail_net to any -> ($ext_if) You now have a functional vswitch with NAT on the 10.0.0.0/24 address range. Start pf and vm-bhyve: # sysrc pf_enable=yes # service pf start # service vm start ===== Deploying a VM via bhyve ===== You now have a functional backend for deploying VMs and jails. FreeBSD will run Linux jails, but some applications will not work in that environment and must be virtualized. FreeBSD can always run in a jail so I would not bother virtualizing it unless you have some specific use case. The following instructions are for Debian but you can use whatever distribution you want, nothing should change. bhyve and GRUB behave a bit strangely, and so my VMs use the UEFI loader. A working (as of Bookworm) template for Debian follows: ''/data/vm/.templates/debian-uefi.conf'' loader="uefi" debug="yes" cpu="1" memory="1024M" graphics="yes" xhci_mouse="yes" network0_type="virtio-net" network0_switch="public" disk0_type="virtio-blk" disk0_name="disk0" disk0_dev="sparse-zvol" zfs_zvol_opts="volblocksize=128k" uefi_vars=" In this config, VNC will only listen on localhost- I port forward the port over SSH and use a VNC client on my local machine. If you want it to listen on all available interfaces (the default) just remove the ''graphics_listen'' line. Create a VM using this template and a 20GB disk: # vm create -t debian-uefi -s 20G debian If you have multiple pools/datasets like I do, you can add them and deploy to them specifically: # vm datastore add ssd zfs:ssd/vm # vm datastore list NAME TYPE PATH ZFS DATASET default zfs /data/vm data/vm ssd zfs /ssd/vm ssd/vm # vm create -d ssd -t debian-uefi -s 20G debian-ssd Now you can download the latest Debian netinstall image for x64, and use it to boot your VM: # vm iso https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-12.4.0-amd64-netinst.iso # vm install debian debian-12.4.0-amd64-netinst.iso The VNC console will come up and you can proceed with the installation as normal. On Debian at least I had an issue where rebooting would not bring up the bootloader, but instead boot to a UEFI shell. From the shell this can be fixed by doing the following (credit to [[https://www.jongibbins.com/solving-uefi-boot-problems-on-bhyve-freenas-vm/|Jon Gibbins]]): - Type ''exit'' at the UEFI shell - Arrow down to **Boot Maintenance Manager** - Select **Boot From File** - Select the disk, then the ''EFI'' folder, then the ''debian'' folder - Boot from ''grubx64.efi'' To fix this inside the OS: - ''mkdir /boot/efi/EFI/BOOT'' - ''cp /boot/efi/EFI/debian/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi'' This is likely to break on bootloader updates but is the best solution I am currently aware of. Using sparse zvols, you may run into issues deploying multiple VMs. If you have a VM failing to start and are seeing the following in dmesg: ''Jan 25 11:58:17 datz kernel: g_dev_taste: g_dev_taste(zvol/data/vm/pwm/disk0) failed to g_attach, error=6'' you will need to set ''vfs.zfs.vol.mode=2'' in ''sysctl.conf''. This appears to potentially be related to FreeBSD bug [[https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262189|#262189]], though I am not completely sure. From [[https://justinholcomb.me/blog/2016/05/23/zfs-volume-manipulations.html|Justin Holcomb]]: >There are 4 user settable modes to place a volume in as follows: > > default which is what is set by thesysctl “vfs.zfs.vol.mode” on FreeBSD machines. This typically is set to geom mode, however I was having issues with a particular volume which required expressly setting the mode to geom even though the default was already set to geom. > geom or mode 1 - “exposes volumes as geom(4) providers, providing maximal functionality.”[1] > **dev or mode 2 - ”exposes volumes only as cdev device in devfs**. Such volumes can be accessed only as raw disk device files, i.e. they can not be partitioned, mounted, participate in RAIDs, etc, but they are faster, and in some use scenarios with untrusted consumer, such as NAS or VM storage, can be more safe.”[1] > none or mode 3 - ”are not exposed outside ZFS, but can be snapshoted, cloned, replicated, etc, that can be suitable for backup purposes.”[1] This is also a helpful mode to set on a bhyve template as accidental modifications are harder to achieve. Since VM disks generally do not need to be manipulated via ''geom'' you are not missing anything here. ===== Deploying jails ===== Jails are fairly simple to manage with ZFS and the built-in tools, so I do not use any jail managers. You can if you want to, I just do not find them necessary. Linux jails are hit or miss, I typically virtualize Linux if I need to run it at all. There are two types of jails: **thick jails** which are completely separate installs of FreeBSD and are therefore highly isolated from one another, and **thin jails** which are essentially copied from a ZFS snapshot of a template system. They are smaller but can introduce issues. It is recommended that you read the [[https://docs.freebsd.org/en/books/handbook/jails/|handbook entry for jails]] to gain a better understanding of the advantages and disadvantages of each. In this guide we'll deploy thick jails, as there are [[https://jrs-s.net/2017/03/15/zfs-clones-probably-not-what-you-really-want/|maintenance issues with snapshots and clones I would like to avoid]]. Instead, we will be deploying thick jails and using ''zfs send''/''zfs recv'', snapshots are largely just a convenience (and ZFS will not ''send'' an active/mounted filesystem :D). Set up your datasets as you want, and create a dataset for your first base jail. If you are using my dataset layout this would be: # zfs create data/jail/tmpl/freebsd-14 Grab the base distribution and extract it: # fetch https://download.freebsd.org/ftp/releases/amd64/amd64/13.2-RELEASE/base.txz -o /data/jail/media/freebsd-14.0-base.txz # sudo tar xf /data/jail/media/freebsd-14.0-base.txz -C /data/jail/tmpl/freebsd-14/ --unlink Copy in some important files: # cp /etc/resolv.conf /data/jail/tmpl/freebsd-14/etc/resolv.conf # cp /etc/localtime /data/jail/tmpl/freebsd-14/etc/localtime Update to the latest patchlevel: # freebsd-update -b /data/jail/tmpl/freebsd-14/ fetch install You'll probably want a template config to make deploying jails quicker and easier. ''/etc/jail.conf.d/TMPL'' TMPL { # startup and logging exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown"; exec.consolelog = "/var/log/jail_console_${name}.log"; # permissions allow.mount; allow.reserved_ports; allow.raw_sockets; exec.clean; # set path/hostname path = "/data/jail/${name}"; host.hostname = "${name}"; # use vnet vnet; vnet.interface = "${if}b"; # set up NIC and plumbing $id = ""; $ip = "10.0.0.${id}/24"; $gw = "10.0.0.254"; $br = "vm-public"; $if = "epair${id}"; exec.prestart += "ifconfig ${if} create up"; exec.prestart += "ifconfig ${if}a up descr jail:${name}"; exec.prestart += "ifconfig ${br} addm ${if}a up"; exec.start += "ifconfig ${if}b ${ip} up"; exec.start += "route add default ${gw}"; exec.poststop += "ifconfig ${if}a destroy"; } This config is fairly permissive, so you may want to read up about the available permissions in the [[https://man.freebsd.org/cgi/man.cgi?jail(8)|manpage]]. Now we can make some changes to our template jail such as adding users, installing packages, etc. Unfortunately this cannot be done via chroot. You'll need to first create the jail configuration file by copying the template above and adding it to the jail list: # cp /etc/jail.conf.d/TMPL /etc/jail.conf.d/freebsd-14.conf # sysrc jail_list+=freebsd-14 jail_list: -> freebsd-14 Next, edit ''freebsd-14.conf'' and update ''TMPL'' to match the jail's name (in this case ''freebsd-14''), and set ''$id'' to the IP address you want the jail on. For example, ''$id = "101";'' will give the jail an IP address of 10.0.0.101. For this example you will also need to alter the ''path'' variable to reflect that the jail lives at ''/data/jail/tmpl''. Once that is all in place you can start the jail and log in: # service jail start freebsd-14 # jexec freebsd-14 csh You will probably find that the system is pretty barren. That is ok, because ''pkg'' and the network are both functional. Bootstrap the system to your heart's desire, and when you are done just ^D or ''exit'' to log out. Since this is a template, do not make any changes that would be specific to any one jail. Once logged out, stop the jail with ''service jail stop freebsd-14'' and copy the base jail dataset to create a new jail: # zfs snap data/jail/tmpl/freebsd-14@gold # zfs send data/jail/tmpl/freebsd-14@gold | zfs recv data/jail/myjail # zfs list data/jail/myjail NAME USED AVAIL REFER MOUNTPOINT data/jail/myjail 618M 7.12T 583M /data/jail/myjail Copying a snapshot is just that, a full copy, so as you can see we can remove the original snapshot without affecting the copy: # zfs destroy data/jail/tmpl/freebsd-14@gold # zfs list data/jail/myjail NAME USED AVAIL REFER MOUNTPOINT data/jail/myjail 618M 7.12T 583M /data/jail/myjail You can now remove the template jail and add your jail to ''jail_list'': # sysrc jail_list-=freebsd-14 jail_list: freebsd-14 -> # sysrc jail_list+=myjail jail_list: -> myjail Be aware that this configuration requires that **all** jails you want autostarted exist in the ''jail_list'' rcvar. So for instance, on a system with the jails ''foo'', ''bar'', ''baz'', and ''qux'', the following will only start ''foo'' and ''baz'' at boot: # sysrc jail_list jail_list: foo baz You can add to (or subtract from) the list easily with ''sysrc'', and this is the recommended method of dealing with ''rc.conf'' and ''loader.conf'' as it does a good job of preventing changes that lead to an unbootable system: # sysrc jail_list+="bar qux" jail_list: foo baz -> foo baz bar qux # sysrc jail_list-="foo bar" jail_list: foo baz bar qux -> baz qux You can convert these configs to work in ''/etc/jail.conf'' and in that case all jails will autostart, but that is extra credit work. ===== Linux jails ===== TBD