aboutsummaryrefslogtreecommitdiffstats
path: root/docs/misc/vtd.txt
blob: 9af0e992f418672e63c449828483626a7926b29b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
Title   : How to do PCI Passthrough with VT-d
Authors : Allen Kay    <allen.m.kay@intel.com>
          Weidong Han  <weidong.han@intel.com>
          Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Created : October-24-2007
Updated : July-07-2009

How to turn on VT-d in Xen
--------------------------

Xen with 2.6.18 dom0:
1 ) cd xen-unstable.hg
2 ) make install
3 ) make linux-2.6-xen-config CONFIGMODE=menuconfig
4 ) change XEN->"PCI-device backend driver" from "M" to "*".
5 ) make linux-2.6-xen-build
6 ) make linux-2.6-xen-install
7 ) depmod 2.6.18.8-xen
8 ) mkinitrd -v -f --with=ahci --with=aacraid --with=sd_mod --with=scsi_mod initrd-2.6.18-xen.img 2.6.18.8-xen
9 ) cp initrd-2.6.18-xen.img /boot
10) lspci - select the PCI BDF you want to assign to guest OS
11) "hide" pci device from dom0 as following sample grub entry:

title Xen-Fedora Core (2.6.18-xen)
        root (hd0,0)
        kernel /boot/xen.gz com1=115200,8n1 console=com1 iommu=1
        module /boot/vmlinuz-2.6.18.8-xen root=LABEL=/ ro xencons=ttyS console=tty0 console=ttyS0, pciback.hide=(01:00.0)(03:00.0)
        module /boot/initrd-2.6.18-xen.img

    or use dynamic hiding via PCI backend sysfs interface:
        a) check if the driver has binded to the device
            ls -l /sys/bus/pci/devices/0000:01:00.0/driver
            ... /sys/bus/pci/devices/0000:01:00.0/driver -> ../../../../bus/pci/drivers/igb
        b) if yes, then unload the driver first
            echo -n 0000:01:00.0 >/sys/bus/pci/drivers/igb/unbind
        c) add the device to the PCI backend
            echo -n 0000:01:00.0 >/sys/bus/pci/drivers/pciback/new_slot
        d) let the PCI backend bind to the device
            echo -n 0000:01:00.0 >/sys/bus/pci/drivers/pciback/bind

12) reboot system (not requires if you use the dynamic hiding method)
13) add "pci" line in /etc/xen/hvm.conf for to assigned devices
        pci = [ '01:00.0', '03:00.0' ]
15) start hvm guest and use "lspci" to see the passthru device and
    "ifconfig" to see if IP address has been assigned to NIC devices.


Xen with pv-ops dom0:
1 ) cd xen-unstable.hg
2 ) make install
3 ) make linux-2.6-pvops-config CONFIGMODE=menuconfig
4 ) change Bus options (PCI etc.)->"PCI Stub driver" to "*".
5 ) make linux-2.6-pvops-build
6 ) make linux-2.6-pvops-install
7 ) mkinitrd -v -f --with=ahci --with=aacraid --with=sd_mod --with=scsi_mod initrd-2.6.30-rc3-tip.img 2.6.30-rc3-tip
    (change 2.6.30-rc3-tip to pv-ops dom0 version when it's updated in future)
8 ) cp initrd-2.6.30-rc3-tip.img /boot
9 ) edit grub:

title Xen-Fedora Core (pv-ops)
        root (hd0,0)
        kernel /boot/xen.gz console=com1,vga console=com1 com1=115200,8n1 iommu=1
        module /boot/vmlinuz-2.6.30-rc3-tip root=LABEL=/ ro console=hvc0 earlyprintk=xen
        module /boot/initrd-2.6.30-rc3-tip.img

10) reboot system
11) hide device using pci-stub (example PCI device 01:00.0):

    - lspci -n
    - locate the entry for device 01:00.0 and note down the vendor & device ID
8086:10b9
        ...
        01:00.0 0200: 8086:10b9 (rev 06)
        ...
    - then use following commands to hide it:
        echo "8086 10b9" > /sys/bus/pci/drivers/pci-stub/new_id
        echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
        echo "0000:01:00.0" > /sys/bus/pci/drivers/pci-stub/bind

12) add "pci" line in /etc/xen/hvm.conf for to assigned devices
        pci = [ '01:00.0' ]
13) start hvm guest and use "lspci" to see the passthru device and
    "ifconfig" to see if IP address has been assigned to NIC devices.


Enable MSI/MSI-x for assigned devices
-------------------------------------
Add "msi=1" option in kernel line of host grub.


MSI-INTx translation for passthrough devices in HVM
---------------------------------------------------

If the assigned device uses a physical IRQ that is shared by more than
one device among multiple domains, there may be significant impact on
device performance. Unfortunately, this is quite a common case if the
IO-APIC (INTx) IRQ is used. MSI can avoid this issue, but was only
available if the guest enables it.

With MSI-INTx translation turned on, Xen enables device MSI if it's
available, regardless of whether the guest uses INTx or MSI. If the
guest uses INTx IRQ, Xen will inject a translated INTx IRQ to guest's
virtual ioapic whenever an MSI message is received. This reduces the
interrupt sharing of the system. If the guest OS enables MSI or MSI-X,
the translation is automatically turned off.

To enable or disable MSI-INTx translation globally, add "pci_msitranslate"
in the config file:
	pci_msitranslate = 1         (default is 1)

To override for a specific device:
	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]


Caveat on Conventional PCI Device Passthrough
---------------------------------------------

VT-d spec specifies that all conventional PCI devices behind a
PCIe-to-PCI bridge have to be assigned to the same domain.

PCIe devices do not have this restriction.


VT-d Works on OS:
-----------------

1) Host OS: PAE, 64-bit
2) Guest OS: 32-bit, PAE, 64-bit


Combinations Tested:
--------------------

1) 64-bit host: 32/PAE/64 Linux/XP/Win2003/Vista guests
2) PAE host: 32/PAE Linux/XP/Win2003/Vista guests


VTd device hotplug:
-------------------
 
2 virtual PCI slots (6~7) are reserved in HVM guest to support VTd hotplug. If you have more VTd devices, only 2 of them can support hotplug. Usage is simple:

 1. List the VTd device by dom. You can see a VTd device 0:2:0.0 is inserted in the HVM domain's PCI slot 6. '''lspci''' inside the guest should see the same.

	[root@vt-vtd ~]# xm pci-list HVMDomainVtd
	VSlt domain   bus   slot   func
	0x6    0x0  0x02   0x00    0x0

 2. Detach the device from the guest by the physical BDF. Then HVM guest will receive a virtual PCI hot removal event to detach the physical device

	[root@vt-vtd ~]# xm pci-detach HVMDomainVtd 0:2:0.0

 3. Attach a PCI device to the guest by the physical BDF and desired virtual slot(optional). Following command would insert the physical device into guest's virtual slot 7

	[root@vt-vtd ~]# xm pci-attach HVMDomainVtd 0:2:0.0 7

    To specify options for the device, use -o or --options=. Following command would disable MSI-INTx translation for the device

	[root@vt-vtd ~]# xm pci-attach -o msitranslate=0 0:2:0.0 7


VTd hotplug usage model:
------------------------

 * For live migration: As you know, VTd device would break the live migration as physical device can't be save/restored like virtual device. With hotplug, live migration is back again. Just hot remove all the VTd devices before live migration and hot add new VTd devices on target machine after live migration.

 * VTd hotplug for device switch: VTd hotplug can be used to dynamically switch physical device between different HVM guest without shutdown.


VT-d Enabled Systems
--------------------

1) For VT-d enabling work on Xen, we have been using development
systems using following Intel motherboards:
    - DQ35MP
    - DQ35JO

2) As far as we know, following OEM systems also has vt-d enabled.
Feel free to add others as they become available.

- Dell: Optiplex 755
http://www.dell.com/content/products/category.aspx/optix?c=us&cs=555&l=en&s=biz

- HP Compaq:  DC7800
http://h10010.www1.hp.com/wwpc/us/en/en/WF04a/12454-12454-64287-321860-3328898.html

For more information, pls refer to http://wiki.xen.org/wiki/VTdHowTo.


Assigning devices to HVM domains
--------------------------------

Most device types such as NIC, HBA, EHCI and UHCI can be assigned to
an HVM domain.

But some devices have design features which make them unsuitable for
assignment to an HVM domain. Examples include:

 * Device has an internal resource, such as private memory, which is
   mapped to memory address space with BAR (Base Address Register).
 * Driver submits command with a pointer to a buffer within internal
   resource. Device decodes the pointer (address), and accesses to the
   buffer.

In an HVM domain, the BAR is virtualized, and host-BAR value and
guest-BAR value are different. The addresses of internal resource from
device's view and driver's view are different. Similarly, the
addresses of buffer within internal resource from device's view and
driver's view are different. As a result, device can't access to the
buffer specified by driver.

Such devices assigned to HVM domain currently do not work.


Using SR-IOV with VT-d
--------------------------------

The Single Root I/O Virtualization is a PCI Express feature supported by
some devices such as Intel 82576 which allows you to create virtual PCI
devices (Virtual Function) and assign them to the HVM guest.

You can use latest lspci (v3.1 and above) to check if your PCIe device
supports the SR-IOV capability or not.

  $ lspci -s 01:00.0 -vvv

  01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
        Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter

        ...

        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 8, Total VFs: 8, Number of VFs: 7, Function Dependency Link: 00
                VF offset: 128, stride: 2, Device ID: 10ca
                Supported Page Size: 00000553, System Page Size: 00000001
                VF Migration: offset: 00000000, BIR: 0
        Kernel driver in use: igb


The function that has the SR-IOV capability is also known as Physical
Function. You need the Physical Function driver (runs in the Dom0 and
controls the physical resources allocation) to enable the Virtual Function.
Following is the Virtual Functions associated with above Physical Function.

  $ lspci | grep -e 01:1[01].[0246]

  01:10.0 Ethernet controller: Intel Corporation Device 10ca (rev 01)
  01:10.2 Ethernet controller: Intel Corporation Device 10ca (rev 01)
  01:10.4 Ethernet controller: Intel Corporation Device 10ca (rev 01)
  01:10.6 Ethernet controller: Intel Corporation Device 10ca (rev 01)
  01:11.0 Ethernet controller: Intel Corporation Device 10ca (rev 01)
  01:11.2 Ethernet controller: Intel Corporation Device 10ca (rev 01)
  01:11.4 Ethernet controller: Intel Corporation Device 10ca (rev 01)

We can tell that Physical Function 01:00.0 has 7 Virtual Functions (01:10.0,
01:10.2, 01:10.4, 01:10.6, 01:11.0, 01:11.2, 01:11.4). And the Virtual
Function PCI Configuration Space looks just like normal PCI device.

  $ lspci -s 01:10.0 -vvv

  01:10.0 Ethernet controller: Intel Corporation 82576 Gigabit Virtual Function
        Subsystem: Intel Corporation Gigabit Virtual Function
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Region 0: [virtual] Memory at d2840000 (64-bit, non-prefetchable) [size=16K]
        Region 3: [virtual] Memory at d2860000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [70] MSI-X: Enable+ Mask- TabSize=3
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00

        ...


The Virtual Function only appears after the Physical Function driver
is loaded. Once the Physical Function driver is unloaded. All Virtual
Functions associated with this Physical Function disappear.

The Virtual Function is essentially same as the normal PCI device when
using it in VT-d environment. You need to hide the Virtual Function,
use the Virtual Function bus, device and function number in the HVM
guest configuration file and then boot the HVM guest. You also need the
Virtual Function driver which is the normal PCI device driver in the
HVM guest to drive the Virtual Function. The PCIe SR-IOV specification
requires that the Virtual Function can only support MSI/MSI-x if it
uses interrupt. This means you also need to enable Xen/MSI support.
Since the Virtual Function is dynamically allocated by Physical Function
driver, you might want to use the dynamic hiding method mentioned above.