Virtual Block Devices / Virtual Disks in Xen - HOWTO ==================================================== HOWTO for Xen 1.2 Mark A. Williamson (mark.a.williamson@intel.com) (C) Intel Research Cambridge 2004 Introduction ------------ This document describes the new Virtual Block Device (VBD) and Virtual Disk features available in Xen release 1.2. First, a brief introduction to some basic disk concepts on a Xen system: Virtual Block Devices (VBDs): VBDs are the disk abstraction provided by Xen. All XenoLinux disk accesses go through the VBD driver. Using the VBD functionality, it is possible to selectively grant domains access to portions of the physical disks in the system. A virtual block device can also consist of multiple extents from the physical disks in the system, allowing them to be accessed as a single uniform device from the domain with access to that VBD. The functionality is somewhat similar to that underpinning LVM, since you can combine multiple regions from physical devices into a single logical device, from the point of view of a guest virtual machine. Everyone who boots Xen / XenoLinux from a hard drive uses VBDs but for some uses they can almost be ignored. Virtual Disks (VDs): VDs are an abstraction built on top of the functionality provided by VBDs. The VD management code maintains a "free pool" of disk space on the system that has been reserved for use with VDs. The tools can automatically allocate collections of extents from this free pool to create "virtual disks" on demand. VDs can then be used just like normal disks by domains. VDs appear just like any other disk to guest domains, since they use the same VBD abstraction, as provided by Xen. Using VDs is optional, since it's always possible to dedicate partitions, or entire disks to your virtual machines. VDs are handy when you have a dynamically changing set of virtual machines and you don't want to have to keep repartitioning in order to provide them with disk space. Virtual Disks are rather like "logical volumes" in LVM. If that didn't all make sense, it doesn't matter too much ;-) Using the functionality is fairly straightforward and some examples will clarify things. The text below expands a bit on the concepts involved, finishing up with a walk-through of some simple virtual disk management tasks. Virtual Block Devices --------------------- Before covering VD management, it's worth discussing some aspects of the VBD functionality that will be useful to know. A VBD is made up of a number of extents from physical disk devices. The extents for a VBD don't have to be contiguous, or even on the same device. Xen performs address translation so that they appear as a single contiguous device to a domain. When the VBD layer is used to give access to entire drives or entire partitions, the VBDs simply consist of a single extent that corresponds to the drive or partition used. Lists of extents are usually only used when virtual disks (VDs) are being used. Xen 1.2 and its associated XenoLinux release support automatic registration / removal of VBDs. It has always been possible to add a VBD to a running XenoLinux domain but it was then necessary to run the "xen_vbd_refresh" tool in order for the new device to be detected. Nowadays, when a VBD is added, the domain it's added to automatically registers the disk, with no special action by the user being required. Note that it is possible to use the VBD functionality to allow multiple domains write access to the same areas of disk. This is almost always a bad thing! The provided example scripts for creating domains do their best to check that disk areas are not shared unsafely and will catch many cases of this. Setting the vbd_expert variable in config files for xc_dom_create.py controls how unsafe it allows VBD mappings to be - 0 (read only sharing allowed) should be right for most people ;-). Level 1 attempts to allow at most one writer to any area of disk. Level 2 allows multiple writers (i.e. anything!). Virtual Disk Management ----------------------- The VD management code runs entirely in user space. The code is written in Python and can therefore be accessed from custom scripts, as well as from the convenience scripts provided. The underlying VD database is a SQLite database in /var/db/xen_vdisks.sqlite. Most virtual disk management can be performed using the xc_vd_tool.py script provided in the tools/examples/ directory of the source tree. It supports the following operations: initialise - "Formats" a partition or disk device for use storing virtual disks. This does not actually write data to the specified device. Rather, it adds the device to the VD free-space pool, for later allocation. You should only add devices that correspond directly to physical disks / partitions - trying to use a VBD that you have created yourself as part of the free space pool has undefined (possibly nasty) results. create - Creates a virtual disk of specified size by allocating space from the free space pool. The virtual disk is identified in future by the unique ID returned by this script. The disk can be given an expiry time, if desired. For most users, the best idea is to specify a time of 0 (which has the special meaning "never expire") and then explicitly delete the VD when finished with it - otherwise, VDs will disappear if allowed to expire. delete - Explicitly delete a VD. Makes it disappear immediately! setexpiry - Allows the expiry time of a (not yet expired) virtual disk to be modified. Be aware the VD will disappear when the time has expired. enlarge - Increase the allocation of space to a virtual disk. Currently this will not be immediately visible to running domain(s) using it. You can make it visible by destroying the corresponding VBDs and then using xc_dom_control.py to add them to the domain again. Note: doing this to filesystems that are in use may well cause errors in the guest Linux, or even a crash although it will probably be OK if you stop the domain before updating the VBD and restart afterwards. import - Allocate a virtual disk and populate it with the contents of some disk file. This can be used to import root file system images or to restore backups of virtual disks, for instance. export - Write the contents of a virtual disk out to a disk file. Useful for creating disk images for use elsewhere, such as standard root file systems and backups. list - List the non-expired virtual disks currently available in the system. undelete - Attempts to recover an expired (or deleted) virtual disk. freespace - Get the free space (in megabytes) available for allocating new virtual disk extents. The functionality provided by these scripts is also available directly from Python functions in the xenctl.utils module - you can use this functionality in your own scripts. Populating VDs: Once you've created a VD, you might want to populate it from DOM0 (for instance, to put a root file system onto it for a guest domain). This can be done by creating a VBD for dom0 to access the VD through - this is discussed below. More detail on how virtual disks work: When you "format" a device for virtual disks, the device is logically split up into extents. These extents are recorded in the Virtual Disk Management database in /var/db/xen_vdisks.sqlite. When you use xc_vd_tool.py to add create a virtual disk, some of the extents in the free space pool are reallocated for that virtual disk and a record for that VD is added to the database. When VDs are mapped into domains as VBDs, the system looks up the allocated extents for the virtual disk in order to set up the underlying VBD. Free space is identified by the fact that it belongs to an "expired" disk. When "initialising" with xc_vd_tool.py adds a real device to the free pool, it actually divides the device into extents and adds them to an already-expired virtual disk. The allocated device is not written to during this operation - its availability is simply recorded into the virtual disks database. If you set an expiry time on a VD, its extents will be liable to be reallocated to new VDs as soon as that expiry time runs out. Therefore, be careful when setting expiry times! Many users will find it simplest to set all VDs to not expire automatically, then explicitly delete them later on. Deleted / expired virtual disks may sometimes be undeleted - currently this only works when none of the virtual disk's extents have been reallocated to other virtual disks, since that's the only situation where the disk is likely to be fully intact. You should try undeletion as soon as you realise you've mistakenly deleted (or allowed to expire) a virtual disk. At some point in the future, an "unsafe" undelete which can recover what remains of partially reallocated virtual disks may also be implemented. Security note: The disk space for VDs is not zeroed when it is initially added to the free space pool OR when a VD expires OR when a VD is created. Therefore, if this is not done manually it is possible for a domain to read a VD to determine what was written by previous owners of its constituent extents. If this is a problem, users should manually clean VDs in some way either on allocation, or just before deallocation (automated support for this may be added at a later date). Side note: The xvd* devices --------------------------- The examples in this document make frequent use of the xvd* device nodes for representing virtual block devices. It is not a requirement to use these with Xen, since VBDs can be mapped to any IDE or SCSI device node in the system. Changing the the references to xvd* nodes in the examples below to refer to some unused hd* or sd* node would also be valid. They can be useful when accessing VBDs from dom0, since binding VBDs to xvd* devices under will avoid clashes with real IDE or SCSI drives. There is a shell script provided in tools/misc/xen-mkdevnodes to create these nodes. Specify on the command line the directory that the nodes should be placed under (e.g. /dev): > cd {root of Xen source tree}/tools/misc/ > ./xen-mkdevnodes /dev Dynamically Registering VBDs ---------------------------- The domain control tool (xc_dom_control.py) includes the ability to add and remove VBDs to / from running domains. As usual, the command format is: xc_dom_control.py [operation] [arguments] The operations (and their arguments) are as follows: vbd_add dom uname dev mode - Creates a VBD corresponding to either a physical device or a virtual disk and adds it as a specified device under the target domain, with either read or write access. vbd_remove dom dev - Removes the VBD associated with a specified device node from the target domain. These scripts are most useful when populating VDs. VDs can't be populated directly, since they don't correspond to real devices. Using: xc_dom_control.py vbd_add 0 vd:your_vd_id /dev/whatever w You can make a virtual disk available to DOM0. Sensible devices to map VDs to in DOM0 are the /dev/xvd* nodes, since that makes it obvious that they are Xen virtual devices that don't correspond to real physical devices. You can then format, mount and populate the VD through the nominated device node. When you've finished, use: xc_dom_control.py vbd_remove 0 /dev/whatever To revoke DOM0's access to it. It's then ready for use in a guest domain. You can also use this functionality to grant access to a physical device to a guest domain - you might use this to temporarily share a partition, or to add access to a partition that wasn't granted at boot time. When playing with VBDs, remember that in general, it is only safe for two domains to have access to a file system if they both have read-only access. You shouldn't be trying to share anything which is writable, even if only by one domain, unless you're really sure you know what you're doing! Granting access to real disks and partitions -------------------------------------------- During the boot process, Xen automatically creates a VBD for each physical disk and gives Dom0 read / write access to it. This makes it look like Dom0 has normal access to the disks, just as if Xen wasn't being used - in reality, even Dom0 talks to disks through Xen VBDs. To give another domain access to a partition or whole disk then you need to create a corresponding VBD for that partition, for use by that domain. As for virtual disks, you can grant access to a running domain, or specify that the domain should have access when it is first booted. To grant access to a physical partition or disk whilst a domain is running, use the xc_dom_control.py script - the usage is very similar to the case of adding access virtual disks to a running domain (described above). Specify the device as "phy:device", where device is the name of the device as seen from domain 0, or from normal Linux without Xen. For instance: > xc_dom_control.py vbd_add 2 phy:hdc /dev/whatever r Will grant domain 2 read-only access to the device /dev/hdc (as seen from Dom0 / normal Linux running on the same machine - i.e. the master drive on the secondary IDE chain), as /dev/whatever in the target domain. Note that you can use this within domain 0 to map disks / partitions to other device nodes within domain 0. For instance, you could map /dev/hda to also be accessible through /dev/xvda. This is not generally recommended, since if you (for instance) mount both device nodes read / write you could cause corruption to the underlying filesystem. It's also quite confusing ;-) To grant a domain access to a partition or disk when it boots, the appropriate VBD needs to be created before the domain is started. This can be done very easily using the tools provided. To specify this to the xc_dom_create.py tool (either in a startup script or on the command line) use triples of the format: phy:dev,target_dev,perms Where dev is the device name as seen from Dom0, target_dev is the device you want it to appear as in the target domain and perms is 'w' if you want to give write privileges, or 'r' otherwise. These may either be specified on the command line or in an initialisation script. For instance, to grant the same access rights as described by the command example above, you would use the triple: phy:hdc,/dev/whatever,r If you are using a config file, then you should add this triple into the vbd_list variable, for instance using the line: vbd_list = [ ('phy:dev', 'hdc', 'r') ] (Note that you need to use quotes here, since config files are really small Python scripts.) To specify the mapping on the command line, you'd use the -d switch and supply the triple as the argument, e.g.: > xc_dom_create.py [other arguments] -d phy:hdc,/dev/whatever,r (You don't need to explicitly quote things in this case.) Walk-through: Booting a domain from a VD ---------------------------------------- As an example, here is a sequence of commands you might use to create a virtual disk, populate it with a root file system and boot a domain from it. These steps assume that you've installed the example scripts somewhere on your PATH - if you haven't done that, you'll need to specify a fully qualified pathname in the examples below. It is also assumed that you know how to use the xc_dom_create.py tool (apart from configuring virtual disks!) [ This example is intended only for users of virtual disks (VDs). You don't need to follow this example if you'll be booting a domain from a dedicated partition, since you can create that partition and populate it, directly from Dom0, as normal. ] First, if you haven't done so already, you'll initialise the free space pool by adding a real partition to it. The details are stored in the database, so you'll only need to do it once. You can also use this command to add further partitions to the existing free space pool. > xc_vd_tool.py format /dev/ Now you'll want to allocate the space for your virtual disk. Do so using the following, specifying the size in megabytes. > xc_vd_tool.py create At this point, the program will tell you the virtual disk ID. Note it down, as it is how you will identify the virtual device in future. If you don't want the VD to be bootable (i.e. you're booting a domain from some other medium and just want it to be able to access this VD), you can simply add it to the vbd_list used by xc_dom_create.py, either by putting it in a config file or by specifying it on the command line. Formatting / populating of the VD could then done from that domain once it's started. If you want to boot off your new VD as well then you need to populate it with a standard Linux root filesystem. You'll need to temporarily add the VD to DOM0 in order to do this. To give DOM0 r/w access to the VD, use the following command line, substituting the ID you got earlier. > xc_dom_control.py vbd_add 0 vd: /dev/xvda w This attaches the VD to the device /dev/xvda in domain zero, with read / write privileges - you can use other devices nodes if you choose too. Now make a filesystem on this device, mount it and populate it with a root filesystem. These steps are exactly the same as under normal Linux. When you've finished, unmount the filesystem again. You should now remove the VD from DOM0. This will prevent you accidentally changing it in DOM0, whilst the guest domain is using it (which could cause filesystem corruption, and confuse Linux). > xc_dom_control.py vbd_remove 0 /dev/xvda It should now be possible to boot a guest domain from the VD. To do this, you should specify the the VD's details in some way so that xc_dom_create.py will be able to set up the corresponding VBD for the domain to access. If you're using a config file, you should include: ('vd:', '/dev/whatever', 'w') In the vbd_list, substituting the appropriate virtual disk ID, device node and read / write setting. To specify access on the command line, as you start the domain, you would use the -d switch (note that you don't need to use quote marks here): > xc_dom_create.py [other arguments] -d vd:,/dev/whatever,w To tell Linux which device to boot from, you should either include: root=/dev/whatever in your cmdline_root in the config file, or specify it on the command line, using the -R option: > xc_dom_create.py [other arguments] -R root=/dev/whatever That should be it: sit back watch your domain boot off its virtual disk! Getting help ------------ The main source of help using Xen is the developer's e-mail list: . The developers will help with problems, listen to feature requests and do bug fixes. It is, however, helpful if you can look through the mailing list archives and HOWTOs provided to make sure your question is not answered there. If you post to the list, please provide as much information as possible about your setup and your problem. There is also a general Xen FAQ, kindly started by Jan van Rensburg, which (at time of writing) is located at: . Contributing ------------ Patches and extra documentation are also welcomed ;-) and should also be posted to the xen-devel e-mail list.