| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Right after the loop the lock is being dropped, so all loop exits
should happen with the lock still held.
Reported-by: Kristoffer Egefelt <kristoffer@itoc.dk>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Kristoffer Egefelt <kristoffer@itoc.dk>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Considering that a significant share of PCI devices out there (not the
least the myriad of CPU-exposed ones) don't support MSI-X at all, and
that the amount of data is well beyond a handful of bytes, break this
out of the common structure, at once allowing the actual data to be
tracked to become architecture specific.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implies
- extending the public interface to have a way to request a block of
MSIs
- allocating a block of contiguous pIRQ-s for the target domain (but
note that the Xen IRQs allocated have no need of being contiguous)
- repeating certain operations for all involved IRQs
- fixing multi_msi_enable()
- adjusting the mask bit accesses for maskable MSIs
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While boot time calls don't need this, run time uses of the function
which may result in L2 page tables getting populated need to be
serialized to avoid two CPUs populating the same L2 (or L3) entry,
overwriting each other's results.
This is expected to fix what would seem to be a regression from commit
b0581b92 ("x86: make map_domain_page_global() a simple wrapper around
vmap()"), albeit that change only made more readily visible the already
existing issue.
This patch intentionally does not
- add locking to the page table de-allocation logic in
destroy_xen_mappings() (the only user having potential races here,
msix_put_fixmap(), gets converted to use __set_fixmap() instead)
- avoid races between super page splitting and reconstruction in
map_pages_to_xen() (no such uses exist; races between multiple
splitting attempts or between multiple reconstruction attempts are
being taken care of)
If we wanted to take care of these, we'd need to alter the behavior
of virt_to_xen_l?e() - they would need to return with the lock held
then.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 899110e3 ("AMD IOMMU: include IOMMU interrupt information in 'M'
debug key output") made the AMD IOMMU MSI setup code use more of the
generic MSI setup code (as other than for VT-d this is an ordinary MSI-
capable PCI device), but failed to notice that till now interrupt setup
there _required_ the subsequent affinity setup to be done, as that was
the only point where the MSI message would get written. The generic MSI
affinity setting routine, however, does only an incremental change,
i.e. relies on this setup to have been done before.
In order to not make the code even more clumsy, introduce a new low
level helper routine __setup_msi_irq(), thus eliminating the need for
the AMD IOMMU code to directly fiddle with the IRQ descriptor.
Reported-by: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Since the remaining uses of IS_PRIV are actually concerned with the
domain having control of the hardware (i.e. being the initial domain),
clarify this by renaming IS_PRIV to is_hardware_domain. This also
removes IS_PRIV_FOR since the only remaining user was xsm/dummy.h.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com> (for 4.3 release)
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the need to allocate multiple contiguous IRTEs for multi-vector
MSI, the chance of failure here increases. While on the AMD side
there's no allocation of IRTEs at present at all (and hence no way for
this allocation to fail, which is going to change with a later patch in
this series), VT-d already ignores an eventual error here, which this
patch fixes.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The major aspect being the removal of the overload of the MSI entry's
mask_base field for MSI purposes - a proper union is being installed
instead, tracking both the config space position needed and the number
of vectors used (which is going to be 1 until the actual multi-vector
MSI patches arrive).
It also corrects misleading information from debug key 'M': When
msi_get_mask_bit() returns a negative value, there's no mask bit, and
hence output shouldn't give the impression there is.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... rather than the IOMMU as a whole.
That in turn required to make sure iommu_intremap gets properly
cleared when the respective initialization fails (or isn't being
done at all).
Along with making sure interrupt remapping doesn't get inconsistently
enabled on some IOMMUs and not on others in the VT-d code, this in turn
allowed quite a bit of cleanup on the VT-d side (if desired, that
cleanup could of course be broken out into a separate patch).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds two new physdev operations for Dom0 to invoke when resource
allocation for devices is known to be complete, so that the hypervisor
can arrange for the respective MMIO ranges to be marked read-only
before an eventual guest getting such a device assigned even gets
started, such that it won't be able to set up writable mappings for
these MMIO ranges before Xen has a chance to protect them.
This also addresses another issue with the code being modified here,
in that so far write protection for the address ranges in question got
set up only once during the lifetime of a device (i.e. until either
system shutdown or device hot removal), while teardown happened when
the last interrupt was disposed of by the guest (which at least allowed
the tables to be writable when the device got assigned to a second
guest [instance] after the first terminated).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- force use of physical APIC mode if indicated so (as we don't support
xAPIC cluster mode, the respective flag is taken to force physical
mode too)
- don't use MSI if indicated so (implies no IOMMU)
Both can be overridden on the command line, for the MSI case this at
once adds a new command line option allowing to turn off PCI MSI (IOMMU
and HPET are unaffected by this).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include the default XSM hook action as the first argument of the hook
to facilitate quick understanding of how the call site is expected to
be used (dom0-only, arbitrary guest, or device model). This argument
does not solely define how a given hook is interpreted, since any
changes to the hook's default action need to be made identically to
all callers of a hook (if there are multiple callers; most hooks only
have one), and may also require changing the arguments of the hook.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
Note that this also adds a few pieces missing from c/s
25903:5e4a00b4114c (relevant only when the PCI MSI mask bit is
supported by an IOMMU, which apparently isn't the case for existing
implementations).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given that here, other than for VT-d, the MSI interface gets surfaced
through a normal PCI device, the code should use as much as possible of
the "normal" MSI support code.
Further, the code can (and should) follow the "normal" MSI code in
distinguishing the maskable and non-maskable cases at the IRQ
controller level rather than checking the respective flag in the
individual actors.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The first resume from S3 was corrupting internal data structures (in
that pci_restore_msi_state() updated the globally stored MSI message
from traditional to interrupt remapped format, which would then be
translated a second time during the second resume, breaking interrupt
delivery).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
Actions requiring IS_PRIV should also require some XSM access control
in order for XSM to be useful in confining multiple privileged
domains. Add XSM hooks for new hypercalls and sub-commands that are
under IS_PRIV but not currently under any access checks.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
|
|
|
|
|
|
| |
The function must not blindly take the lock on IRQ descriptors.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
This includes delaying the initialization of dynamically created IRQs
until their actual first use and some further elimination of uses of
struct irq_cfg.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct irq_cfg really has become an architecture extension to struct
irq_desc, and hence it should be treated as such (rather than as IRQ
chip specific data, which it was meant to be originally).
For a first step, only convert a subset of the uses; subsequent
patches (partly to be sent later) will aim at fully eliminating the
use of the old structure type.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
| |
The function gets called only during initialization/resume (when no
other CPUs are running) or with the IRQ descriptor lock held, so
there's no way for the CPU mask to change under its feet.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the .end() accessor having become optional and noting that
several of the accessors' behavior really depends on the result of
msi_maskable_irq(), the splits the MSI IRQ chip type into two - one
for the maskable ones, and the other for the (MSI only) non-maskable
ones.
At once the implementation of those methods gets moved from io_apic.c
to msi.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is again because the descriptor is generally more useful (with
the IRQ number being accessible in it if necessary) and going forward
will hopefully allow to remove all direct accesses to the IRQ
descriptor array, in turn making it possible to make this some other,
more efficient data structure.
This additionally makes the .end() accessor optional, noting that in a
number of cases the functions were empty.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
This is because the descriptor is generally more useful (with the IRQ
number being accessible in it if necessary) and going forward will
hopefully allow to remove all direct accesses to the IRQ descriptor
array, in turn making it possible to make this some other, more
efficient data structure.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
| |
This is particularly relevant as the number of CPUs to be supported
increases (as recently happened for the default thereof).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new PHYSDEVOP_pci_device_add is intended to be extensible, with a
first extension (to pass the proximity domain of a device) added right
away.
A couple of directly related functions at once get adjusted to account
for the segment number.
Should we deprecate the PHYSDEVOP_manage_pci_* sub-hypercalls?
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
| |
This particularly eliminates the bogus passing of NULL by hpet.c.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
As was discussed a couple of times on this list, SR-IOV virtual
functions have their BARs read as zero - the physical function's
SR-IOV capability structure must be consulted instead. The bogus
warnings people complained about are being eliminated with this
change.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the nested HVM patch series, many p2m functions were changed
to take pointers to p2m tables rather than to domains. This patch
reverses that for almost all of them, which:
- gets rid of a lot of "p2m_get_hostp2m(d)" in code which really
shouldn't have to know anything about how gfns become mfns.
- ties sharing and paging interfaces to a domain, which is
what they actually act on, rather than a particular p2m table.
In developing this patch it became clear that memory-sharing and nested
HVM are unlikely to work well together. I haven't tried to fix that
here beyond adding some assertions around suspect paths (as this patch
is big enough with just the interface changes)
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
|
|
|
|
|
|
| |
Fixes the build under gcc-4.6 -Werror=unused-but-set-variable
Signed-off-by: Olaf Hering <olaf@aepfle.de>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
Additionally simplify operations on them in a few cases.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
This leads to an erroneous WARN_ON and possibly other side effects. It
seems to me that even multi-function devices ought to enjoy the
privilege of MSI-X capabilities.
Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These structures are used by Xen, and hence guests must not be able
to fiddle with them.
qemu-dm currently plays with the MSI-X table, requiring Dom0 to
still have write access. This is broken (explicitly allowing the guest
write access to the mask bit) and should be fixed in qemu-dm, at which
time Dom0 won't need any special casing anymore.
The changes are made under the assumption that p2m_mmio_direct will
only ever be used for order 0 pages.
An open question is whether dealing with pv guests (including the
IOMMU-less case) is necessary, as handling mappings a domain may
already have in place at the time the first interrupt gets set up
would require scanning all of the guest's L1 page table pages.
Currently a hole still remains allowing PV guests to map these ranges
before actually setting up any MSI-X vector for a device.
An alternative would be to determine and insert the address ranges
earlier into mmio_ro_ranges, but that would require a hook in the
PCI config space writes, which is particularly problematic in case
MMCONFIG accesses are being used.
A second alternative would be to require Dom0 to report all devices
(or at least all MSI-X capable ones) regardless of whether they would
be used by that domain, and do so after resources got determined/
assigned for them (i.e. a second notification later than the one
currently happening from the PCI bus scan would be needed).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
| |
From: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When __pci_enable_msix() returns early, output parameter (struct
msi_desc **desc) will not be initialized. On my machine, a Broadcom
BCM5709 nic has both MSI and MSIX capability blocks and when guest
tries to enable msix interrupts but __pci_enable_msix() returns early
for encountering a msi block, the whole system will crash for fatal
page fault immediately.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
|
|
|
|
|
|
|
|
|
|
| |
Eliminate redundant ones, fix names (where so far inappropriately
referring to capability structure fields the don't really relate to),
use symbolic names instead of raw numbers, and remove an unusable one.
No functional change intended.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
Underlying interfaces allow this, and unduly (and silently) truncating
addresses doesn't seem nice.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
This matches similar checks done in Linux, since no good can come from
a domain trying to enable both MSI and MSI-X on the same device at the
same time.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
| |
Just to avoid confusing readers - no functional change.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Interrupt Remapping is used, after Dom0 S3, Dom0's filesystem
might become inaccessible as the SATA disk's MSI interrupt becomes
buggy. The cause is: After set_msi_affinity() or setup_msi_irq()
invokes write_msi_msg(), entry->msg records the remappable format
message; during S3 resume, Dom0 invokes the PHYSDEVOP_restore_msi
hypercall to restore the MSI registers of devices, and in
pci_restore_msi_state() -> write_msi_msg(), the 'entry->msg' of
remappable format is passed, but in write_msi_msg() -> ... ->
msi_msg_to_remap_entry(), the 'msg' is assumed to be in compatibility
format. As a result, after s3, the IRTE is corrupted.
Actually the only users of 'entry->msg' are pci_restore_msi_state()
and dump_msi(). That's why we don't have issue except Dom0 S3.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
|
|
|
|
|
|
|
| |
The (only) two callers of it don't need it, as the MSI-X case of
msi_set_mask_bit() already does the necessary readl().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
Equivalent to dumping IO-APIC state; the question is whether this
ought to live on its own key (as done here), or whether it should be
chanined to from the 'i' handler.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
This patch changes the IRTE allocation method, and frees unused
IRTE when device is de-assigned.
Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
To programme MSI's addr/vector safely, delay irq migration
operation before acking next interrupt. In this way, it should
avoid inconsistent interrupts generation due to non-atomic writing
addr and data registers about MSI.
Port the logic from Linux and tailor it for Xen.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
| |
When program msi, it has to mask it first, otherwise, it
may generate inconsistent interrupts. According to spec,
if not masked, the interrupt generation behaviour is undefined.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled
When x2APIC and Interrupt Remapping(IR) with EIM are enabled, we
should use 32-bit Destination ID for IOAPIC and MSI.
We implemented the IR support in xen by hooking the functions like
io_apic_write(),io_apic_modify(), write_msi_message(), and as a
result, in the hook functions in intremap.c, we can only see the 8-bit
dest id rather the 32-bit id, so we can't set IR table Entry that
requires a 32-bit dest id.
To solve the issue throughly, we need find every place in io_apic.c
and msi.c that could write ioapic RTE and and device's msi message and
explicitly handle the 32-bit dest id carefully (namely, when genapic
is x2apic, cpu_mask_to_apic could return a 32-bit value); and we have
to change the iommu_ops->{.update_ire_from_apic, .update_ire_from_msi}
interfaces. We may have to write an over-1000-LOC patch for this.
Instead, we could use a workround:
1) for ioapic, in the struct IO_APIC_route_entry, we could use a new
"dest32" to refer to the dest field;
2) for msi, in the struct msi_msg, we could add a new "u32 dest".
And in intremap.c, if x2apic_enabled, we use the new names to refer to
the dest fields.
We can improve this in future.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
|
|
|
|
|
|
|
| |
Remove the redundant logic in set_msi_affinity. And it is introduced
accidently, maybe something wrong when I generated the patch.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
|