aboutsummaryrefslogtreecommitdiffstats
path: root/xen/arch/x86/msi.c
Commit message (Collapse)AuthorAgeFilesLines
* x86/MSI: fix locking in pci_restore_msi_state()Jan Beulich2013-10-141-1/+1
| | | | | | | | | | Right after the loop the lock is being dropped, so all loop exits should happen with the lock still held. Reported-by: Kristoffer Egefelt <kristoffer@itoc.dk> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Kristoffer Egefelt <kristoffer@itoc.dk> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
* PCI: break MSI-X data out of struct pci_dev_infoJan Beulich2013-08-231-50/+51
| | | | | | | | | | | Considering that a significant share of PCI devices out there (not the least the myriad of CPU-exposed ones) don't support MSI-X at all, and that the amount of data is well beyond a handful of bytes, break this out of the common structure, at once allowing the actual data to be tracked to become architecture specific. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: enable multi-vector MSIJan Beulich2013-08-081-36/+88
| | | | | | | | | | | | | | | This implies - extending the public interface to have a way to request a block of MSIs - allocating a block of contiguous pIRQ-s for the target domain (but note that the Xen IRQs allocated have no need of being contiguous) - repeating certain operations for all involved IRQs - fixing multi_msi_enable() - adjusting the mask bit accesses for maskable MSIs Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: add locking to map_pages_to_xen()Jan Beulich2013-07-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | While boot time calls don't need this, run time uses of the function which may result in L2 page tables getting populated need to be serialized to avoid two CPUs populating the same L2 (or L3) entry, overwriting each other's results. This is expected to fix what would seem to be a regression from commit b0581b92 ("x86: make map_domain_page_global() a simple wrapper around vmap()"), albeit that change only made more readily visible the already existing issue. This patch intentionally does not - add locking to the page table de-allocation logic in destroy_xen_mappings() (the only user having potential races here, msix_put_fixmap(), gets converted to use __set_fixmap() instead) - avoid races between super page splitting and reconstruction in map_pages_to_xen() (no such uses exist; races between multiple splitting attempts or between multiple reconstruction attempts are being taken care of) If we wanted to take care of these, we'd need to alter the behavior of virt_to_xen_l?e() - they would need to return with the lock held then. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* AMD IOMMU: make interrupt work againJan Beulich2013-06-171-2/+9
| | | | | | | | | | | | | | | | | | | | Commit 899110e3 ("AMD IOMMU: include IOMMU interrupt information in 'M' debug key output") made the AMD IOMMU MSI setup code use more of the generic MSI setup code (as other than for VT-d this is an ordinary MSI- capable PCI device), but failed to notice that till now interrupt setup there _required_ the subsequent affinity setup to be done, as that was the only point where the MSI message would get written. The generic MSI affinity setting routine, however, does only an incremental change, i.e. relies on this setup to have been done before. In order to not make the code even more clumsy, introduce a new low level helper routine __setup_msi_irq(), thus eliminating the need for the AMD IOMMU code to directly fiddle with the IRQ descriptor. Reported-by: Suravee Suthikulanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
* rename IS_PRIV to is_hardware_domainDaniel De Graaf2013-05-071-1/+1
| | | | | | | | | | | Since the remaining uses of IS_PRIV are actually concerned with the domain having control of the hardware (i.e. being the initial domain), clarify this by renaming IS_PRIV to is_hardware_domain. This also removes IS_PRIV_FOR since the only remaining user was xsm/dummy.h. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> (for 4.3 release) Acked-by: Keir Fraser <keir@xen.org>
* IOMMU: allow MSI message to IRTE propagation to failJan Beulich2013-04-151-10/+12
| | | | | | | | | | | | With the need to allocate multiple contiguous IRTEs for multi-vector MSI, the chance of failure here increases. While on the AMD side there's no allocation of IRTEs at present at all (and hence no way for this allocation to fail, which is going to change with a later patch in this series), VT-d already ignores an eventual error here, which this patch fixes. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
* x86/MSI: cleanup to prepare for multi-vector MSIJan Beulich2013-04-101-19/+22
| | | | | | | | | | | | | | The major aspect being the removal of the overload of the MSI entry's mask_base field for MSI purposes - a proper union is being installed instead, tracking both the config space position needed and the number of vectors used (which is going to be 1 until the actual multi-vector MSI patches arrive). It also corrects misleading information from debug key 'M': When msi_get_mask_bit() returns a negative value, there's no mask bit, and hence output shouldn't give the impression there is. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* IOMMU: properly check whether interrupt remapping is enabledJan Beulich2013-03-251-3/+3
| | | | | | | | | | | | | | | | ... rather than the IOMMU as a whole. That in turn required to make sure iommu_intremap gets properly cleared when the respective initialization fails (or isn't being done at all). Along with making sure interrupt remapping doesn't get inconsistently enabled on some IOMMUs and not on others in the VT-d code, this in turn allowed quite a bit of cleanup on the VT-d side (if desired, that cleanup could of course be broken out into a separate patch). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
* x86/MSI: add mechanism to fully protect MSI-X table from PV guest accessesJan Beulich2013-03-081-69/+133
| | | | | | | | | | | | | | | | | | | This adds two new physdev operations for Dom0 to invoke when resource allocation for devices is known to be complete, so that the hypervisor can arrange for the respective MMIO ranges to be marked read-only before an eventual guest getting such a device assigned even gets started, such that it won't be able to set up writable mappings for these MMIO ranges before Xen has a chance to protect them. This also addresses another issue with the code being modified here, in that so far write protection for the address ranges in question got set up only once during the lifetime of a device (i.e. until either system shutdown or device hot removal), while teardown happened when the last interrupt was disposed of by the guest (which at least allowed the tables to be writable when the device got assigned to a second guest [instance] after the first terminated). Signed-off-by: Jan Beulich <jbeulich@suse.com>
* honor ACPI v4 FADT flagsJan Beulich2013-02-221-2/+16
| | | | | | | | | | | | | | - force use of physical APIC mode if indicated so (as we don't support xAPIC cluster mode, the respective flag is taken to force physical mode too) - don't use MSI if indicated so (implies no IOMMU) Both can be overridden on the command line, for the MSI case this at once adds a new command line option allowing to turn off PCI MSI (IOMMU and HPET are unaffected by this). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen/xsm: Add xsm_default parameter to XSM hooksDaniel De Graaf2013-01-111-1/+1
| | | | | | | | | | | | | | Include the default XSM hook action as the first argument of the hook to facilitate quick understanding of how the call site is expected to be used (dom0-only, arbitrary guest, or device model). This argument does not solely define how a given hook is interpreted, since any changes to the hook's default action need to be made identically to all callers of a hook (if there are multiple callers; most hooks only have one), and may also require changing the arguments of the hook. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* AMD IOMMU: include IOMMU interrupt information in 'M' debug key outputJan Beulich2012-11-281-1/+1
| | | | | | | | | | Note that this also adds a few pieces missing from c/s 25903:5e4a00b4114c (relevant only when the PCI MSI mask bit is supported by an IOMMU, which apparently isn't the case for existing implementations). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/HPET: include FSB interrupt information in 'M' debug key outputJan Beulich2012-11-221-9/+18
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* amd iommu: use base platform MSI implementationJan Beulich2012-09-141-16/+60
| | | | | | | | | | | | | | | Given that here, other than for VT-d, the MSI interface gets surfaced through a normal PCI device, the code should use as much as possible of the "normal" MSI support code. Further, the code can (and should) follow the "normal" MSI code in distinguishing the maskable and non-maskable cases at the IRQ controller level rather than checking the respective flag in the individual actors. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Wang <wei.wang2@amd.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/MSI: fix 2nd S3 resume with interrupt remapping enabledJan Beulich2012-09-071-1/+6
| | | | | | | | | | | The first resume from S3 was corrupting internal data structures (in that pci_restore_msi_state() updated the globally stored MSI message from traditional to interrupt remapped format, which would then be translated a second time during the second resume, breaking interrupt delivery). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xsm: Add missing access checksDaniel De Graaf2011-12-181-0/+6
| | | | | | | | | Actions requiring IS_PRIV should also require some XSM access control in order for XSM to be useful in confining multiple privileged domains. Add XSM hooks for new hypercalls and sub-commands that are under IS_PRIV but not currently under any access checks. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
* x86/MSI: fix dump_msi() after c/s 24068:6928172f7dedJan Beulich2011-11-091-0/+3
| | | | | | The function must not blindly take the lock on IRQ descriptors. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* IRQ: allocate CPU masks dynamicallyJan Beulich2011-11-031-2/+2
| | | | | | | | | | This includes delaying the initialization of dynamically created IRQs until their actual first use and some further elimination of uses of struct irq_cfg. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* fold struct irq_cfg into struct irq_descJan Beulich2011-10-191-6/+4
| | | | | | | | | | | | | struct irq_cfg really has become an architecture extension to struct irq_desc, and hence it should be treated as such (rather than as IRQ chip specific data, which it was meant to be originally). For a first step, only convert a subset of the uses; subsequent patches (partly to be sent later) will aim at fully eliminating the use of the old structure type. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86/MSI: drop local cpumask_t variable from msi_compose_msg()Jan Beulich2011-10-141-5/+3
| | | | | | | | | The function gets called only during initialization/resume (when no other CPUs are running) or with the IRQ descriptor lock held, so there's no way for the CPU mask to change under its feet. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* PCI multi-seg: config space accessor adjustmentsJan Beulich2011-09-221-52/+64
| | | | Signed-off-by: Jan Beulich <jbeulich@suse.com>
* x86: split MSI IRQ chipJan Beulich2011-09-181-33/+77
| | | | | | | | | | | | | With the .end() accessor having become optional and noting that several of the accessors' behavior really depends on the result of msi_maskable_irq(), the splits the MSI IRQ chip type into two - one for the maskable ones, and the other for the (MSI only) non-maskable ones. At once the implementation of those methods gets moved from io_apic.c to msi.c. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* pass struct irq_desc * to all other IRQ accessorsJan Beulich2011-09-181-8/+8
| | | | | | | | | | | | | This is again because the descriptor is generally more useful (with the IRQ number being accessible in it if necessary) and going forward will hopefully allow to remove all direct accesses to the IRQ descriptor array, in turn making it possible to make this some other, more efficient data structure. This additionally makes the .end() accessor optional, noting that in a number of cases the functions were empty. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* pass struct irq_desc * to set_affinity() IRQ accessorsJan Beulich2011-09-181-2/+1
| | | | | | | | | | This is because the descriptor is generally more useful (with the IRQ number being accessible in it if necessary) and going forward will hopefully allow to remove all direct accesses to the IRQ descriptor array, in turn making it possible to make this some other, more efficient data structure. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* convert more literal uses of cpumask_t to pointersJan Beulich2011-09-181-2/+2
| | | | | | | This is particularly relevant as the number of CPUs to be supported increases (as recently happened for the default thereof). Signed-off-by: Jan Beulich <jbeulich@suse.com>
* PCI multi-seg: add new physdevop-sJan Beulich2011-09-181-9/+13
| | | | | | | | | | | | | The new PHYSDEVOP_pci_device_add is intended to be extensible, with a first extension (to pass the proximity domain of a device) added right away. A couple of directly related functions at once get adjusted to account for the segment number. Should we deprecate the PHYSDEVOP_manage_pci_* sub-hypercalls? Signed-off-by: Jan Beulich <jbeulich@suse.com>
* x86: drop unused parameter from msi_compose_msg() and setup_msi_irq()Jan Beulich2011-08-271-4/+3
| | | | | | This particularly eliminates the bogus passing of NULL by hpet.c. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86/PCI-MSI: properly determine VF BAR valuesJan Beulich2011-08-131-10/+63
| | | | | | | | | | As was discussed a couple of times on this list, SR-IOV virtual functions have their BARs read as zero - the physical function's SR-IOV capability structure must be consulted instead. The bogus warnings people complained about are being eliminated with this change. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86/mm/p2m: Make p2m interfaces take struct domain arguments.Tim Deegan2011-06-021-1/+1
| | | | | | | | | | | | | | | | | As part of the nested HVM patch series, many p2m functions were changed to take pointers to p2m tables rather than to domains. This patch reverses that for almost all of them, which: - gets rid of a lot of "p2m_get_hostp2m(d)" in code which really shouldn't have to know anything about how gfns become mfns. - ties sharing and paging interfaces to a domain, which is what they actually act on, rather than a particular p2m table. In developing this patch it became clear that memory-sharing and nested HVM are unlikely to work well together. I haven't tried to fix that here beyond adding some assertions around suspect paths (as this patch is big enough with just the interface changes) Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
* xen: Remove some initialised but otherwise unused variables.Olaf Hering2011-05-201-9/+0
| | | | | | Fixes the build under gcc-4.6 -Werror=unused-but-set-variable Signed-off-by: Olaf Hering <olaf@aepfle.de>
* Define new <pfn.h> header for PFN_{DOWN,UP} macros.Keir Fraser2011-03-231-0/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86/IRQ: pass CPU masks by reference rather than by value in more placesKeir Fraser2010-12-011-3/+2
| | | | | | Additionally simplify operations on them in a few cases. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* msi: Mask out multi-function flag from PCI_HEADER_TYPE in read_pci_mem_bar()Keir Fraser2010-10-201-1/+1
| | | | | | | | This leads to an erroneous WARN_ON and possibly other side effects. It seems to me that even multi-function devices ought to enjoy the privilege of MSI-X capabilities. Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com>
* x86/msi: fix inverted masks in c/s 22182:68cc3c514a0aKeir Fraser2010-10-181-2/+2
| | | | Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: protect MSI-X table and pending bit array from guest writesKeir Fraser2010-09-201-2/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These structures are used by Xen, and hence guests must not be able to fiddle with them. qemu-dm currently plays with the MSI-X table, requiring Dom0 to still have write access. This is broken (explicitly allowing the guest write access to the mask bit) and should be fixed in qemu-dm, at which time Dom0 won't need any special casing anymore. The changes are made under the assumption that p2m_mmio_direct will only ever be used for order 0 pages. An open question is whether dealing with pv guests (including the IOMMU-less case) is necessary, as handling mappings a domain may already have in place at the time the first interrupt gets set up would require scanning all of the guest's L1 page table pages. Currently a hole still remains allowing PV guests to map these ranges before actually setting up any MSI-X vector for a device. An alternative would be to determine and insert the address ranges earlier into mmio_ro_ranges, but that would require a hook in the PCI config space writes, which is particularly problematic in case MMCONFIG accesses are being used. A second alternative would be to require Dom0 to report all devices (or at least all MSI-X capable ones) regardless of whether they would be used by that domain, and do so after resources got determined/ assigned for them (i.e. a second notification later than the one currently happening from the PCI bus scan would be needed). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* Rename irq_cfg->domain to irq_cfg->cpu_maskKeir Fraser2010-08-301-1/+1
| | | | | From: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* msi: Avoid uninitialized msi descriptorsKeir Fraser2010-08-111-6/+16
| | | | | | | | | | | When __pci_enable_msix() returns early, output parameter (struct msi_desc **desc) will not be initialized. On my machine, a Broadcom BCM5709 nic has both MSI and MSIX capability blocks and when guest tries to enable msix interrupts but __pci_enable_msix() returns early for encountering a msi block, the whole system will crash for fatal page fault immediately. Signed-off-by: Wei Wang <wei.wang2@amd.com>
* Consolidate MSI-X related definitionsKeir Fraser2010-07-121-2/+2
| | | | | | | | | | Eliminate redundant ones, fix names (where so far inappropriately referring to capability structure fields the don't really relate to), use symbolic names instead of raw numbers, and remove an unusable one. No functional change intended. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: allow the MSI-X table to reside beyond 4G even on 32-bit systemsKeir Fraser2010-07-121-5/+7
| | | | | | | Underlying interfaces allow this, and unduly (and silently) truncating addresses doesn't seem nice. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: prevent simultaneous use of MSI and MSI-XKeir Fraser2010-07-121-0/+16
| | | | | | | | This matches similar checks done in Linux, since no good can come from a domain trying to enable both MSI and MSI-X on the same device at the same time. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: fix a benign typoKeir Fraser2010-07-121-1/+1
| | | | | | Just to avoid confusing readers - no functional change. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: s3: write_msi_msg: entry->msg should be in the compatibility formatKeir Fraser2010-03-251-1/+2
| | | | | | | | | | | | | | | | | | When Interrupt Remapping is used, after Dom0 S3, Dom0's filesystem might become inaccessible as the SATA disk's MSI interrupt becomes buggy. The cause is: After set_msi_affinity() or setup_msi_irq() invokes write_msi_msg(), entry->msg records the remappable format message; during S3 resume, Dom0 invokes the PHYSDEVOP_restore_msi hypercall to restore the MSI registers of devices, and in pci_restore_msi_state() -> write_msi_msg(), the 'entry->msg' of remappable format is passed, but in write_msi_msg() -> ... -> msi_msg_to_remap_entry(), the 'msg' is assumed to be in compatibility format. As a result, after s3, the IRTE is corrupted. Actually the only users of 'entry->msg' are pci_restore_msi_state() and dump_msi(). That's why we don't have issue except Dom0 S3. Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
* x86: kill msix_flush_writes()Keir Fraser2010-01-211-23/+0
| | | | | | | The (only) two callers of it don't need it, as the MSI-X case of msi_set_mask_bit() already does the necessary readl(). Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: add keyhandler to dump MSI stateKeir Fraser2010-01-211-0/+77
| | | | | | | | Equivalent to dumping IO-APIC state; the question is whether this ought to live on its own key (as done here), or whether it should be chanined to from the 'i' handler. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* VT-d: Free unused interrupt remapping table entryKeir Fraser2009-11-271-0/+5
| | | | | | | This patch changes the IRTE allocation method, and frees unused IRTE when device is de-assigned. Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
* x86: IRQ Migration logic enhancement.Keir Fraser2009-10-261-4/+0
| | | | | | | | | | | To programme MSI's addr/vector safely, delay irq migration operation before acking next interrupt. In this way, it should avoid inconsistent interrupts generation due to non-atomic writing addr and data registers about MSI. Port the logic from Linux and tailor it for Xen. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
* x86: MSI: Mask/unmask msi irq during the window which programs msi.Keir Fraser2009-10-211-0/+4
| | | | | | | | When program msi, it has to mask it first, otherwise, it may generate inconsistent interrupts. According to spec, if not masked, the interrupt generation behaviour is undefined. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
* vt-d: use 32-bit Destination ID when Interrupt Remapping with EIM isKeir Fraser2009-09-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled When x2APIC and Interrupt Remapping(IR) with EIM are enabled, we should use 32-bit Destination ID for IOAPIC and MSI. We implemented the IR support in xen by hooking the functions like io_apic_write(),io_apic_modify(), write_msi_message(), and as a result, in the hook functions in intremap.c, we can only see the 8-bit dest id rather the 32-bit id, so we can't set IR table Entry that requires a 32-bit dest id. To solve the issue throughly, we need find every place in io_apic.c and msi.c that could write ioapic RTE and and device's msi message and explicitly handle the 32-bit dest id carefully (namely, when genapic is x2apic, cpu_mask_to_apic could return a 32-bit value); and we have to change the iommu_ops->{.update_ire_from_apic, .update_ire_from_msi} interfaces. We may have to write an over-1000-LOC patch for this. Instead, we could use a workround: 1) for ioapic, in the struct IO_APIC_route_entry, we could use a new "dest32" to refer to the dest field; 2) for msi, in the struct msi_msg, we could add a new "u32 dest". And in intremap.c, if x2apic_enabled, we use the new names to refer to the dest fields. We can improve this in future. Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
* x86: Remove the redundant logic in set_msi_affinityKeir Fraser2009-09-021-14/+0
| | | | | | | Remove the redundant logic in set_msi_affinity. And it is introduced accidently, maybe something wrong when I generated the patch. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>