| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Considering that a significant share of PCI devices out there (not the
least the myriad of CPU-exposed ones) don't support MSI-X at all, and
that the amount of data is well beyond a handful of bytes, break this
out of the common structure, at once allowing the actual data to be
tracked to become architecture specific.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implies
- extending the public interface to have a way to request a block of
MSIs
- allocating a block of contiguous pIRQ-s for the target domain (but
note that the Xen IRQs allocated have no need of being contiguous)
- repeating certain operations for all involved IRQs
- fixing multi_msi_enable()
- adjusting the mask bit accesses for maskable MSIs
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 899110e3 ("AMD IOMMU: include IOMMU interrupt information in 'M'
debug key output") made the AMD IOMMU MSI setup code use more of the
generic MSI setup code (as other than for VT-d this is an ordinary MSI-
capable PCI device), but failed to notice that till now interrupt setup
there _required_ the subsequent affinity setup to be done, as that was
the only point where the MSI message would get written. The generic MSI
affinity setting routine, however, does only an incremental change,
i.e. relies on this setup to have been done before.
In order to not make the code even more clumsy, introduce a new low
level helper routine __setup_msi_irq(), thus eliminating the need for
the AMD IOMMU code to directly fiddle with the IRQ descriptor.
Reported-by: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the need to allocate multiple contiguous IRTEs for multi-vector
MSI, the chance of failure here increases. While on the AMD side
there's no allocation of IRTEs at present at all (and hence no way for
this allocation to fail, which is going to change with a later patch in
this series), VT-d already ignores an eventual error here, which this
patch fixes.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The major aspect being the removal of the overload of the MSI entry's
mask_base field for MSI purposes - a proper union is being installed
instead, tracking both the config space position needed and the number
of vectors used (which is going to be 1 until the actual multi-vector
MSI patches arrive).
It also corrects misleading information from debug key 'M': When
msi_get_mask_bit() returns a negative value, there's no mask bit, and
hence output shouldn't give the impression there is.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds two new physdev operations for Dom0 to invoke when resource
allocation for devices is known to be complete, so that the hypervisor
can arrange for the respective MMIO ranges to be marked read-only
before an eventual guest getting such a device assigned even gets
started, such that it won't be able to set up writable mappings for
these MMIO ranges before Xen has a chance to protect them.
This also addresses another issue with the code being modified here,
in that so far write protection for the address ranges in question got
set up only once during the lifetime of a device (i.e. until either
system shutdown or device hot removal), while teardown happened when
the last interrupt was disposed of by the guest (which at least allowed
the tables to be writable when the device got assigned to a second
guest [instance] after the first terminated).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
Note that this also adds a few pieces missing from c/s
25903:5e4a00b4114c (relevant only when the PCI MSI mask bit is
supported by an IOMMU, which apparently isn't the case for existing
implementations).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
This requires some additions to the VT-d side; AMD IOMMUs use the
"normal" MSI message format even when interrupt remapping is enabled,
thus making adjustments here unnecessary.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Xiantao Zhang<xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
| |
... instead of open coding it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given that here, other than for VT-d, the MSI interface gets surfaced
through a normal PCI device, the code should use as much as possible of
the "normal" MSI support code.
Further, the code can (and should) follow the "normal" MSI code in
distinguishing the maskable and non-maskable cases at the IRQ
controller level rather than checking the respective flag in the
individual actors.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
... as it conflicts with the one made in asm/byteorder.h, and hence
build fails when both happen to be included from the same source file.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses a number of problems in msixtbl_{read,write}():
- address alignment was not checked, allowing for memory corruption in
the hypervisor (write case) or returning of hypervisor private data
to the guest (read case)
- the interrupt mask bit was permitted to be written by the guest
(while Xen's interrupt flow control routines need to control it)
- MAX_MSIX_TABLE_{ENTRIES,PAGES} were pointlessly defined to plain
numbers (making it unobvious why they have these values, and making
the latter non-portable)
- MAX_MSIX_TABLE_PAGES was also off by one (failing to account for a
non-zero table offset); this was also affecting host MSI-X code
- struct msixtbl_entry's table_flags[] was one element larger than
necessary due to improper open-coding of BITS_TO_LONGS()
- msixtbl_read() unconditionally accessed the physical table, even
though the data was only needed in a quarter of all cases
- various calculations were done unnecessarily for both of the rather
distinct code paths in msixtbl_read()
Additionally it is unclear on what basis MAX_MSIX_ACC_ENTRIES was
chosen to be 3.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the .end() accessor having become optional and noting that
several of the accessors' behavior really depends on the result of
msi_maskable_irq(), the splits the MSI IRQ chip type into two - one
for the maskable ones, and the other for the (MSI only) non-maskable
ones.
At once the implementation of those methods gets moved from io_apic.c
to msi.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is again because the descriptor is generally more useful (with
the IRQ number being accessible in it if necessary) and going forward
will hopefully allow to remove all direct accesses to the IRQ
descriptor array, in turn making it possible to make this some other,
more efficient data structure.
This additionally makes the .end() accessor optional, noting that in a
number of cases the functions were empty.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
This is because the descriptor is generally more useful (with the IRQ
number being accessible in it if necessary) and going forward will
hopefully allow to remove all direct accesses to the IRQ descriptor
array, in turn making it possible to make this some other, more
efficient data structure.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
| |
This is particularly relevant as the number of CPUs to be supported
increases (as recently happened for the default thereof).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new PHYSDEVOP_pci_device_add is intended to be extensible, with a
first extension (to pass the proximity domain of a device) added right
away.
A couple of directly related functions at once get adjusted to account
for the segment number.
Should we deprecate the PHYSDEVOP_manage_pci_* sub-hypercalls?
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
| |
This particularly eliminates the bogus passing of NULL by hpet.c.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The original vmsi_deliver is renamed to vmsi_deliver_pirq. New
vmsi_deliver is dedicated to the actually delivering.
Original HVMOP number is unchanged. New operation is numbered 16
and enclosed by (__XEN__) and (__XEN_TOOLS__).
Signed-off-by: Wei Liu <liuw@liuw.name>
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
|
|
|
|
|
|
|
|
|
|
| |
Eliminate redundant ones, fix names (where so far inappropriately
referring to capability structure fields the don't really relate to),
use symbolic names instead of raw numbers, and remove an unusable one.
No functional change intended.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
Make IRQ related data const or __read_mostly where possible/reasonable,
use platform_legacy_irq() where feasible, and remove the now unused
definition of vector_to_irq().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled
When x2APIC and Interrupt Remapping(IR) with EIM are enabled, we
should use 32-bit Destination ID for IOAPIC and MSI.
We implemented the IR support in xen by hooking the functions like
io_apic_write(),io_apic_modify(), write_msi_message(), and as a
result, in the hook functions in intremap.c, we can only see the 8-bit
dest id rather the 32-bit id, so we can't set IR table Entry that
requires a 32-bit dest id.
To solve the issue throughly, we need find every place in io_apic.c
and msi.c that could write ioapic RTE and and device's msi message and
explicitly handle the 32-bit dest id carefully (namely, when genapic
is x2apic, cpu_mask_to_apic could return a 32-bit value); and we have
to change the iommu_ops->{.update_ire_from_apic, .update_ire_from_msi}
interfaces. We may have to write an over-1000-LOC patch for this.
Instead, we could use a workround:
1) for ioapic, in the struct IO_APIC_route_entry, we could use a new
"dest32" to refer to the dest field;
2) for msi, in the struct msi_msg, we could add a new "u32 dest".
And in intremap.c, if x2apic_enabled, we use the new names to refer to
the dest fields.
We can improve this in future.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from vector-based to IRQ-based.
In per-cpu vector environment, vector space changes to
multi-demension resource, so vector number is not appropriate
to index irq_desc which stands for unique interrupt source. As
Linux does, irq number is chosen to index irq_desc. This patch
changes vector-based interrupt infrastructure to irq-based one.
Mostly, it follows upstream linux's changes, and some parts are
adapted for Xen.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Add a new parameter to DOMCTL_bind_pt_irq to allow Xen to know the
guest physical address of MSI-X table. Also add a new MMIO intercept
handler to intercept that gpa in order to handle MSI-X vector mask
bit operation in the hypervisor. This reduces the load of device model
considerably if the guest does mask and unmask frequently
Signed-off-by: Qing He <qing.he@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, msix table pages are allocated a fixmap page per vector,
the available fixmap pages will be depleted when assigning devices
with large number of vectors. This patch fixes it, and a bug that
prevents cross-page MSI-X table from working properly
It now allocates msix table fixmap pages per device, if the table
entries of two msix vectors share the same page, it will only be
mapped to fixmap once. A ref count is maintained so that it can
be unmapped when all the vectors are freed.
Also changes the meaning of msi_desc->mask_base from the va of msix
table start to the va of the target entry. The former one is currently
buggy (it always maps the first page but msix can support up to 2048
entries) and can't handle separately allocated pages.
Signed-off-by: Qing He <qing.he@intel.com>
|
|
|
|
|
| |
From: "Jiang, Yunhong" <yunhong.jiang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
| |
Currently the MSI is disabled because of some lock issue. This patch
tries to clean up the locking related to MSI lock.
Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes sure that there are no assumptions about NR_IRQS==NR_VECTORS
anymore, and it also renames various variables to properly reflect
what they represent.
While coded correctly, I wonder whether dump_irqs() shouldn't iterate
over the vector space rather than the irq space, so that MSI entries
are also processed.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
| |
... as that's not really correct, and there are devices which can't
even cope with that. Instead, check whether an MSI IRQ can be masked,
and if it can't, treat it just like a level triggered IO-APIC IRQ.
There's one other bug fix in here, correcting an off-by-one error on
the entry_nr range check in __pci_enable_msix().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSI-x may have multiple vectors, however in current interrupt
remapping code, one device only has one entry in interrupt remapping
table.
This patch adds 'remap_index' in msi_desc structure to track its index
in interrupt remapping table.
Signed-off-by: Haitao Shan <haitao.shan@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
|
|
|
|
|
|
|
|
| |
Per-domain irq mappings are now protected by d->evtchn_lock and by the
per-vector irq_desc lock.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
|
|
|
|
|
|
|
|
|
| |
Add functions for managing pci_dev structures. Create a list
containing all current pci_devs. Remove msi_pdev_list. Create a
read-write lock protecting all pci_dev lists. Add spinlocks for
pci_dev access. Do necessary modifications to MSI code.
Signed-off-by: Espen Skoglund <espen.skoglund@netronome.com>
|
|
|
|
| |
Signed-off-by: Shan Haitao <Haitao.shan@intel.com>
|
|
Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
|