docs/src/user/domain_filesystem.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251

\chapter{Storage and File System Management}

Storage can be made available to virtual machines in a number of
different ways.  This chapter covers some possible configurations.

The most straightforward method is to export a physical block device (a
hard drive or partition) from dom0 directly to the guest domain as a
virtual block device (VBD).

Storage may also be exported from a filesystem image or a partitioned
filesystem image as a \emph{file-backed VBD}.

Finally, standard network storage protocols such as NBD, iSCSI, NFS,
etc., can be used to provide storage to virtual machines.


\section{Exporting Physical Devices as VBDs}
\label{s:exporting-physical-devices-as-vbds}

One of the simplest configurations is to directly export individual
partitions from domain~0 to other domains. To achieve this use the
\path{phy:} specifier in your domain configuration file. For example a
line like
\begin{quote}
  \verb_disk = ['phy:hda3,sda1,w']_
\end{quote}
specifies that the partition \path{/dev/hda3} in domain~0 should be
exported read-write to the new domain as \path{/dev/sda1}; one could
equally well export it as \path{/dev/hda} or \path{/dev/sdb5} should
one wish.

In addition to local disks and partitions, it is possible to export
any device that Linux considers to be ``a disk'' in the same manner.
For example, if you have iSCSI disks or GNBD volumes imported into
domain~0 you can export these to other domains using the \path{phy:}
disk syntax. E.g.:
\begin{quote}
  \verb_disk = ['phy:vg/lvm1,sda2,w']_
\end{quote}

\begin{center}
  \framebox{\bf Warning: Block device sharing}
\end{center}
\begin{quote}
  Block devices should typically only be shared between domains in a
  read-only fashion otherwise the Linux kernel's file systems will get
  very confused as the file system structure may change underneath
  them (having the same ext3 partition mounted \path{rw} twice is a
  sure fire way to cause irreparable damage)!  \Xend\ will attempt to
  prevent you from doing this by checking that the device is not
  mounted read-write in domain~0, and hasn't already been exported
  read-write to another domain.  If you want read-write sharing,
  export the directory to other domains via NFS from domain~0 (or use
  a cluster file system such as GFS or ocfs2).
\end{quote}


\section{Using File-backed VBDs}

It is also possible to use a file in Domain~0 as the primary storage
for a virtual machine.  As well as being convenient, this also has the
advantage that the virtual block device will be \emph{sparse} ---
space will only really be allocated as parts of the file are used.  So
if a virtual machine uses only half of its disk space then the file
really takes up half of the size allocated.

For example, to create a 2GB sparse file-backed virtual block device
(actually only consumes 1KB of disk):
\begin{quote}
  \verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_
\end{quote}

Make a file system in the disk file:
\begin{quote}
  \verb_# mkfs -t ext3 vm1disk_
\end{quote}

(when the tool asks for confirmation, answer `y')

Populate the file system e.g.\ by copying from the current root:
\begin{quote}
\begin{verbatim}
# mount -o loop vm1disk /mnt
# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt
# mkdir /mnt/{proc,sys,home,tmp}
\end{verbatim}
\end{quote}

Tailor the file system by editing \path{/etc/fstab},
\path{/etc/hostname}, etc.\ Don't forget to edit the files in the
mounted file system, instead of your domain~0 filesystem, e.g.\ you
would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab}.  For
this example put \path{/dev/sda1} to root in fstab.

Now unmount (this is important!):
\begin{quote}
  \verb_# umount /mnt_
\end{quote}

In the configuration file set:
\begin{quote}
  \verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_
\end{quote}

As the virtual machine writes to its `disk', the sparse file will be
filled in and consume more space up to the original 2GB.

{\bf Note that file-backed VBDs may not be appropriate for backing
  I/O-intensive domains.}  File-backed VBDs are known to experience
substantial slowdowns under heavy I/O workloads, due to the I/O
handling by the loopback block device used to support file-backed VBDs
in dom0.  Better I/O performance can be achieved by using either
LVM-backed VBDs (Section~\ref{s:using-lvm-backed-vbds}) or physical
devices as VBDs (Section~\ref{s:exporting-physical-devices-as-vbds}).

Linux supports a maximum of eight file-backed VBDs across all domains
by default.  This limit can be statically increased by using the
\emph{max\_loop} module parameter if CONFIG\_BLK\_DEV\_LOOP is
compiled as a module in the dom0 kernel, or by using the
\emph{max\_loop=n} boot option if CONFIG\_BLK\_DEV\_LOOP is compiled
directly into the dom0 kernel.


\section{Using LVM-backed VBDs}
\label{s:using-lvm-backed-vbds}

A particularly appealing solution is to use LVM volumes as backing for
domain file-systems since this allows dynamic growing/shrinking of
volumes as well as snapshot and other features.

To initialize a partition to support LVM volumes:
\begin{quote}
\begin{verbatim}
# pvcreate /dev/sda10           
\end{verbatim} 
\end{quote}

Create a volume group named `vg' on the physical partition:
\begin{quote}
\begin{verbatim}
# vgcreate vg /dev/sda10
\end{verbatim} 
\end{quote}

Create a logical volume of size 4GB named `myvmdisk1':
\begin{quote}
\begin{verbatim}
# lvcreate -L4096M -n myvmdisk1 vg
\end{verbatim}
\end{quote}

You should now see that you have a \path{/dev/vg/myvmdisk1} Make a
filesystem, mount it and populate it, e.g.:
\begin{quote}
\begin{verbatim}
# mkfs -t ext3 /dev/vg/myvmdisk1
# mount /dev/vg/myvmdisk1 /mnt
# cp -ax / /mnt
# umount /mnt
\end{verbatim}
\end{quote}

Now configure your VM with the following disk configuration:
\begin{quote}
\begin{verbatim}
 disk = [ 'phy:vg/myvmdisk1,sda1,w' ]
\end{verbatim}
\end{quote}

LVM enables you to grow the size of logical volumes, but you'll need
to resize the corresponding file system to make use of the new space.
Some file systems (e.g.\ ext3) now support online resize.  See the LVM
manuals for more details.

You can also use LVM for creating copy-on-write (CoW) clones of LVM
volumes (known as writable persistent snapshots in LVM terminology).
This facility is new in Linux 2.6.8, so isn't as stable as one might
hope.  In particular, using lots of CoW LVM disks consumes a lot of
dom0 memory, and error conditions such as running out of disk space
are not handled well. Hopefully this will improve in future.

To create two copy-on-write clone of the above file system you would
use the following commands:

\begin{quote}
\begin{verbatim}
# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1
# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1
\end{verbatim}
\end{quote}

Each of these can grow to have 1GB of differences from the master
volume. You can grow the amount of space for storing the differences
using the lvextend command, e.g.:
\begin{quote}
\begin{verbatim}
# lvextend +100M /dev/vg/myclonedisk1
\end{verbatim}
\end{quote}

Don't let the `differences volume' ever fill up otherwise LVM gets
rather confused. It may be possible to automate the growing process by
using \path{dmsetup wait} to spot the volume getting full and then
issue an \path{lvextend}.

In principle, it is possible to continue writing to the volume that
has been cloned (the changes will not be visible to the clones), but
we wouldn't recommend this: have the cloned volume as a `pristine'
file system install that isn't mounted directly by any of the virtual
machines.


\section{Using NFS Root}

First, populate a root filesystem in a directory on the server
machine. This can be on a distinct physical machine, or simply run
within a virtual machine on the same node.

Now configure the NFS server to export this filesystem over the
network by adding a line to \path{/etc/exports}, for instance:

\begin{quote}
  \begin{small}
\begin{verbatim}
/export/vm1root      1.2.3.4/24 (rw,sync,no_root_squash)
\end{verbatim}
  \end{small}
\end{quote}

Finally, configure the domain to use NFS root.  In addition to the
normal variables, you should make sure to set the following values in
the domain's configuration file:

\begin{quote}
  \begin{small}
\begin{verbatim}
root       = '/dev/nfs'
nfs_server = '2.3.4.5'       # substitute IP address of server
nfs_root   = '/path/to/root' # path to root FS on the server
\end{verbatim}
  \end{small}
\end{quote}

The domain will need network access at boot time, so either statically
configure an IP address using the config variables \path{ip},
\path{netmask}, \path{gateway}, \path{hostname}; or enable DHCP
(\path{dhcp='dhcp'}).

Note that the Linux NFS root implementation is known to have stability
problems under high load (this is not a Xen-specific problem), so this
configuration may not be appropriate for critical servers.