| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The host bridge device (i.e. 0x18 for AMD) does not require IOMMU, and
therefore is not included in the IVRS. The current logic tries to map
all PCI devices to an IOMMU. In this case, "xl dmesg" shows the
following message on AMD sytem.
(XEN) setup 0000:00:18.0 for d0 failed (-19)
(XEN) setup 0000:00:18.1 for d0 failed (-19)
(XEN) setup 0000:00:18.2 for d0 failed (-19)
(XEN) setup 0000:00:18.3 for d0 failed (-19)
(XEN) setup 0000:00:18.4 for d0 failed (-19)
(XEN) setup 0000:00:18.5 for d0 failed (-19)
This patch adds a new device type (i.e. DEV_TYPE_PCI_HOST_BRIDGE) which
corresponds to PCI class code 0x06 and sub-class 0x00. Then, it uses
this new type to filter when trying to map device to IOMMU.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
On VT-d refuse (un)mapping host bridges for other than the hardware
domain.
Coding style cleanup.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
| |
With yet another case to come in a subsequent patch, it seems time to
do this in a single place rather than hand crafting it in various
scattered around locations.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Considering that a significant share of PCI devices out there (not the
least the myriad of CPU-exposed ones) don't support MSI-X at all, and
that the amount of data is well beyond a handful of bytes, break this
out of the common structure, at once allowing the actual data to be
tracked to become architecture specific.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds two new physdev operations for Dom0 to invoke when resource
allocation for devices is known to be complete, so that the hypervisor
can arrange for the respective MMIO ranges to be marked read-only
before an eventual guest getting such a device assigned even gets
started, such that it won't be able to set up writable mappings for
these MMIO ranges before Xen has a chance to protect them.
This also addresses another issue with the code being modified here,
in that so far write protection for the address ranges in question got
set up only once during the lifetime of a device (i.e. until either
system shutdown or device hot removal), while teardown happened when
the last interrupt was disposed of by the guest (which at least allowed
the tables to be writable when the device got assigned to a second
guest [instance] after the first terminated).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apart from generating device context entries for the base function,
all phantom functions also need context entries to be generated for
them.
In order to distinguish different use cases, a variant of
pci_get_pdev() is being introduced that, even when passed a phantom
function number, would return the underlying actual device.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an "unknown" device types as well as one for PCI-to-PCIe bridges
(the latter of which other IOMMU code with or without this patch
doesn't appear to handle properly).
Make sure we don't mistake a device for which we can't access its
config space as a legacy PCI device (after all we in fact don't know
how to deal with such a device, and hence shouldn't try to).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
| |
... to use a (struct pci_dev *, devfn) pair.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Instead, give the owning domain at least a small opportunity of fixing
things up, and allow for rare faults to not bring down the device at
all.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
|
|
|
|
|
|
|
|
|
|
| |
This covers the devices used for the console and the AMD IOMMU ones (as
would be any others that might get passed to pci_ro_device()).
Boot video device determination cloned from similar Linux logic.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent Dom0 kernels want to disable PCI MSI on all devices, yet doing
so on AMD IOMMUs (which get represented by a PCI device) disables part
of the functionality set up by the hypervisor.
Add a mechanism to mark certain PCI devices as having write protected
config spaces (both through port based [method 1] accesses and, for
x86-64, mmconfig), and use that for AMD's IOMMUs.
Note that due to ptwr_do_page_fault() being run first, there'll be a
MEM_LOG() issued for each such mmconfig based write attempt. If that's
undesirable, the order of the calls in fixup_page_fault() would need
to be swapped.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses a number of problems in msixtbl_{read,write}():
- address alignment was not checked, allowing for memory corruption in
the hypervisor (write case) or returning of hypervisor private data
to the guest (read case)
- the interrupt mask bit was permitted to be written by the guest
(while Xen's interrupt flow control routines need to control it)
- MAX_MSIX_TABLE_{ENTRIES,PAGES} were pointlessly defined to plain
numbers (making it unobvious why they have these values, and making
the latter non-portable)
- MAX_MSIX_TABLE_PAGES was also off by one (failing to account for a
non-zero table offset); this was also affecting host MSI-X code
- struct msixtbl_entry's table_flags[] was one element larger than
necessary due to improper open-coding of BITS_TO_LONGS()
- msixtbl_read() unconditionally accessed the physical table, even
though the data was only needed in a quarter of all cases
- various calculations were done unnecessarily for both of the rather
distinct code paths in msixtbl_read()
Additionally it is unclear on what basis MAX_MSIX_ACC_ENTRIES was
chosen to be 3.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we expect all source files to include the header as the first thing
anyway, stop doing this by repeating the inclusion in each and every
source file (and in many headers), but rather enforce this uniformly
through the compiler command line.
As a first cleanup step, remove the explicit inclusion from all common
headers. Further cleanup can be done incrementally.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86's used_vectors member was both misplaced (in struct pci_dev_info,
which acts as a hypercall input data passing container only) and
improperly abstracted (requiring a CONFIG_X86 conditional in a generic
header).
The adjustment requires hiding several more lines in IA64's pci.h, but
as a benefit this allows removing one of the "null" headers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
They are used as boolean flags only, so convert them accordingly
(shrinking the structure size by 8 bytes).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
This addresses all remaining build problems introduced over the last
several months.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bus on crash
It turns out that this causes all mannor of problems on certain
motherboards (so far with no pattern I can discern)
Problems include:
* Hanging forever checking hlt instruction.
* Panics when trying to change switch root device
* Drivers hanging when trying to check for interrupts.
From: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Since extended config space accesses may not be possible when
scan_pci_devices() runs (due to MMCFG resources not being reserved in
the E820 table, which the specification allows to be the case),
functionality enabling of which requires such must be re-attempted
when it is known whether MMCFG is safe to use.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Kay, Allen M" <allen.m.kay@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a non-zero PCI segment isn't accessible during Xen boot (because
firmware decided to not enter the necessary MMIO space into the E820
table), devices referred to on those segments through DRHD or RMRR
structures should not be rejected just because the devices can't be
found.
This is in line with what is being done in at least one other case
already: Systems with more than one PCI segment (usually high end
ones) are assumed to have valid firmware provided data, while systems
with just segment 0 continue to have their firmware tables validated.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Kay, Allen M" <allen.m.kay@intel.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new PHYSDEVOP_pci_device_add is intended to be extensible, with a
first extension (to pass the proximity domain of a device) added right
away.
A couple of directly related functions at once get adjusted to account
for the segment number.
Should we deprecate the PHYSDEVOP_manage_pci_* sub-hypercalls?
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
As was discussed a couple of times on this list, SR-IOV virtual
functions have their BARs read as zero - the physical function's
SR-IOV capability structure must be consulted instead. The bogus
warnings people complained about are being eliminated with this
change.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a vector-map to pci_dev, and add an option to point MSI-related
IRQs to the vector-map of the device.
This prevents irqs from the same device from being assigned
the same vector on different pcpus. This is required for systems
using an AMD IOMMU, since the intremap tables on AMD only look at
vector, and not destination ID.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
| |
The functionality of pci_add_device_ext() can be easily folded into
pci_add_device(), and eliminates the need to change two functions for
future adjustments.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case of a crash, IOMMU DMA remapping gets turned off so that
the kdump kernel may boot. However, this is warned as being dangerous
in the VTD specification if a DMA transaction is in progress.
Also, in the case of a crash, DMA transactions and interrupts from
peripheral devices such as network cards are likely to keep coming in.
Without DMA remapping enabled, the transactions will be writing over
low memory, corrupting the crash state, and perhaps even the kdump
reserved memory.
Therefore, on the crash path, we can disconnect all PCI devices from
their respective buses so that they are no longer able to be DMA
busmasters. This reduces the risk of DMA transactions corrupting
state (and will also reduce spurious interrupts arriving to the kdump
kernel) until the kdump kernel and properly reset the PCI devices.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this it is questionable whether retaining struct domain's
nr_pirqs is actually necessary - the value now only serves for bounds
checking, and this boundary could easily be nr_irqs.
Note that ia64, the build of which is broken currently anyway, is only
being partially fixed up.
v2: adjustments for split setup/teardown of translation data
v3: re-sync with radix tree implementation changes
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
Move all extern declarations into appropriate header files.
This also fixes up a few places where the caller and the definition
had different signatures.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
|
|
|
|
|
|
| |
Fails current lock checking mechanism in spinlock.c in debug=y builds.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this it is questionable whether retaining struct domain's
nr_pirqs is actually necessary - the value now only serves for bounds
checking, and this boundary could easily be nr_irqs.
Another thing to consider is whether it's worth storing the pirq
number in struct pirq, to avoid passing the number and a pointer to
quite a number of functions.
Note that ia64, the build of which is broken currently anyway, is only
partially fixed up.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
Instead, use scan_pci_devices() just like VT-d does. This at once
allows making {alloc,free}_pdev() static.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These structures are used by Xen, and hence guests must not be able
to fiddle with them.
qemu-dm currently plays with the MSI-X table, requiring Dom0 to
still have write access. This is broken (explicitly allowing the guest
write access to the mask bit) and should be fixed in qemu-dm, at which
time Dom0 won't need any special casing anymore.
The changes are made under the assumption that p2m_mmio_direct will
only ever be used for order 0 pages.
An open question is whether dealing with pv guests (including the
IOMMU-less case) is necessary, as handling mappings a domain may
already have in place at the time the first interrupt gets set up
would require scanning all of the guest's L1 page table pages.
Currently a hole still remains allowing PV guests to map these ranges
before actually setting up any MSI-X vector for a device.
An alternative would be to determine and insert the address ranges
earlier into mmio_ro_ranges, but that would require a hook in the
PCI config space writes, which is particularly problematic in case
MMCONFIG accesses are being used.
A second alternative would be to require Dom0 to report all devices
(or at least all MSI-X capable ones) regardless of whether they would
be used by that domain, and do so after resources got determined/
assigned for them (i.e. a second notification later than the one
currently happening from the PCI bus scan would be needed).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, Xen checks RMRR range and disables VT-d if RMRR range is
set incorrectly in BIOS rigorously. But, actually we can ignore the
RMRR if the device under its scope are not pci discoverable, because
the RMRR won't be used by non-existed or disabled devices.
This patch ignores the RMRR if the device under its scope are not pci
discoverable, and only checks the validity of RMRRs that are actually
used. In order to avoid duplicate pci device detection code, this
patch defines a function pci_device_detect for it.
Signed-off-by: Weidong Han <weidong.han@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables P2P upstream forwarding in ACS capable PCIe
switches. The enabling is conditioned on iommu_enabled variable.
This code solves two potential problems in virtualization environment
where a PCIe device is as signed to a guest domain using a HW iommu
such as VT-d:
1) Unintentional failure caused by guest physical address programmed
into the device's DMA that happens to match the memory address range
of other downstream ports in the same PCIe switch. This causes the
PCI transaction to go to the matching downstream port instead of go to
the root complex to get translated by VT-d as it should be.
2) Malicious guest software intentionally attacks another downstream
PCIe device by programming the DMA address into the assigned device
that matches memory address range of the downstream PCIe port.
Corresponding ACS filtering code is already in upstream control panel
code that do not allow PCI device passthrough to guests if it is
behind a PCIe switch that does not have ACS capability or with ACS
capability but is not enabled.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
|
|
|
|
|
|
|
|
| |
- remove redundant declarations
- add/move prototypes to headers
- move things where they belong to
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves the pci code from iommu.c to pci.c. Instead of setup
pci hierarchy in array bus2bridge in iommu_context_mapping, use
scan_pci_devices once to add all existed PCI devices in system to
alldevs_list and setup pci hierarchy in array bus2bridge. In addition,
implement find_upstream_bridge to find the upstream PCIe-to-PCI/PCIX
bridge or PCI legacy bridge for a PCI device, therefore it's cleanly
to handle context map/unmap for PCI device, even for source-id
setting.
Signed-off-by: Weidong Han <weidong.han@intel.com>
|
|
|
|
|
|
| |
This patch enables PCI MMCONFIG in xen and turns on hooks for ATS.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the Extended
Function -- a function whose function number is greater than 7 within an
ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the Extended
Function is under the scope of the same remapping unit as the traditional
function. The hypervisor needs to know if a function is Extended
Function so it can find proper DMAR for it.
And section 8.3.3 specifies that the SR-IOV Virtual Function is under the
scope of the same remapping unit as the Physical Function. The hypervisor
also needs to know if a function is the Virtual Function and which
Physical Function it's associated with for same reason.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
|
|
|
|
|
|
|
|
|
| |
This patch fixes the ia64 link error and some clean up of MSI-X code.
- add ia64 dummy function to link
- fix unmatched prototype
- add error check
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
|
|
|
|
|
|
|
|
|
|
| |
Add a new parameter to DOMCTL_bind_pt_irq to allow Xen to know the
guest physical address of MSI-X table. Also add a new MMIO intercept
handler to intercept that gpa in order to handle MSI-X vector mask
bit operation in the hypervisor. This reduces the load of device model
considerably if the guest does mask and unmask frequently
Signed-off-by: Qing He <qing.he@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, msix table pages are allocated a fixmap page per vector,
the available fixmap pages will be depleted when assigning devices
with large number of vectors. This patch fixes it, and a bug that
prevents cross-page MSI-X table from working properly
It now allocates msix table fixmap pages per device, if the table
entries of two msix vectors share the same page, it will only be
mapped to fixmap once. A ref count is maintained so that it can
be unmapped when all the vectors are freed.
Also changes the meaning of msi_desc->mask_base from the va of msix
table start to the va of the target entry. The former one is currently
buggy (it always maps the first page but msix can support up to 2048
entries) and can't handle separately allocated pages.
Signed-off-by: Qing He <qing.he@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As pcidevs_lock is changed from protecting only the alldevs_list to
more than that, it doesn't benifit too much from the rw_lock. Also the
previous patch 18906:2941b1a97c60 is wrong to use read_lock to protect some
sensitive data (thanks Espen pointed out that).
Also two minor fix in this patch:
a) deassign_device will deadlock when try to get the pcidevs_lock if
called by pci_release_devices, remove the lock to the caller.
b) The iommu_domain_teardown should not ASSERT for the pcidevs_lock
because it just update the domain's vt-d mapping.
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
| |
Currently the MSI is disabled because of some lock issue. This patch
tries to clean up the locking related to MSI lock.
Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com>
|
|
|
|
| |
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current simple logic has some issues: 1) Dstate transition is not
guaranteed to properly clear the device state; 2) the current code for
PCIe FLR is actually buggy: PCI_EXP_DEVSTA_TRPND doesn't mean the
completion of FLR; according to the PCIe spec, after issuing FLR, we
should wait at least 100ms.
The improved FLR logic will be added into xend.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
|
|
|
|
|
|
|
| |
Release all pci devices before doing iommu domain teardown. Also
moved pdev_flr() into generic pci code.
Signed-off-by: Espen Skoglund <espen.skoglund@netronome.com>
|
|
|
|
|
|
|
|
|
| |
The add hypercall will add a new PCI device and register it. The
remove hypercall will remove the pci_dev strucure for the device. The
IOMMU hardware (if present) will be notifed as well.
Signed-off-by: Espen Skoglund <espen.skoglund@netronome.com>
Signed-off-by: Joshua LeVasseur <joshua.levasseur@netronome.com>
|
|
|
|
|
|
|
|
|
| |
Add functions for managing pci_dev structures. Create a list
containing all current pci_devs. Remove msi_pdev_list. Create a
read-write lock protecting all pci_dev lists. Add spinlocks for
pci_dev access. Do necessary modifications to MSI code.
Signed-off-by: Espen Skoglund <espen.skoglund@netronome.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Create a bitmap for each device scope indicating which buses are
covered by the scope. Upon mapping PCI-PCI bridges we now detect
whether we have a bridge to a non-PCIe bus. If so, all devices mapped
on that bus are squashed to the requester-id of the bridge. Bridges
to PCIe busses are ignored. The requester-id squashing also
determines the iommu device group id for the device.
Signed-off-by: Espen Skoglund <espen.skoglund@netronome.com>
|
|
|
|
|
|
|
|
|
|
| |
Move pci_dev lists from hvm to arch_domain
Move the pci_dev list from hvm to arch_domain since PCI devs are no
longer hvm specific. Also removed locking for pci_dev lists. Will
reintroduce them later.
Signed-off-by: Espen Skoglund <espen.skoglund@netronome.com>
|