| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
... in favor of using the new, nr_cpumask_bits-based ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
While doing so, eliminate the use of struct irq_cfg and convert the
CPU mask accessors to the new style ones as far as possible.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct irq_cfg really has become an architecture extension to struct
irq_desc, and hence it should be treated as such (rather than as IRQ
chip specific data, which it was meant to be originally).
For a first step, only convert a subset of the uses; subsequent
patches (partly to be sent later) will aim at fully eliminating the
use of the old structure type.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
| |
This includes the removal of a redundant memset() from microcode_amd.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix and clean up the logic to __clear_irq_vector().
We always need to clear the things related to cfg->vector.
If the IRQ is currently in motion, then we need to also clear
out things related to cfg->old_vector.
This patch reorganizes the function to make the parallels between
the two clean-ups more obvious.
The main functional change here is with cfg->used_vectors; make
sure to clear cfg->vector always (even if !cfg->move_in_progress);
if cfg->move_in_progress, clear cfg->old_vector as well.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IRQ handling code requires pcidevs_lock to be held only for MSI
interrupts.
As the handling of which was now fully moved into msi.c (i.e. while
applying fine without, the patch needs to be applied after the one
titled "x86: split MSI IRQ chip"), io_apic.c now also doesn't need to
include PCI headers anymore.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the .end() accessor having become optional and noting that
several of the accessors' behavior really depends on the result of
msi_maskable_irq(), the splits the MSI IRQ chip type into two - one
for the maskable ones, and the other for the (MSI only) non-maskable
ones.
At once the implementation of those methods gets moved from io_apic.c
to msi.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is again because the descriptor is generally more useful (with
the IRQ number being accessible in it if necessary) and going forward
will hopefully allow to remove all direct accesses to the IRQ
descriptor array, in turn making it possible to make this some other,
more efficient data structure.
This additionally makes the .end() accessor optional, noting that in a
number of cases the functions were empty.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
This is because the descriptor is generally more useful (with the IRQ
number being accessible in it if necessary) and going forward will
hopefully allow to remove all direct accesses to the IRQ descriptor
array, in turn making it possible to make this some other, more
efficient data structure.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
| |
This is particularly relevant as the number of CPUs to be supported
increases (as recently happened for the default thereof).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new PHYSDEVOP_pci_device_add is intended to be extensible, with a
first extension (to pass the proximity domain of a device) added right
away.
A couple of directly related functions at once get adjusted to account
for the segment number.
Should we deprecate the PHYSDEVOP_manage_pci_* sub-hypercalls?
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looking more closely at usage of action field with relation to
IRQ_GUEST flag. It appears that set IRQ_GUEST implies that action
is not NULL. As result it is not safe to set action to NULL and
leave IRQ_GUEST set.
Hence IRQ_GUEST should be cleared in dynamic_irq_cleanup where
action is set to NULL.
An addition remove BUGON at __pirq_guest_unbind that appears to be
bogus and not needed anymore.
Thanks Paolo Bonzini for NACKing previous patch, and pointing at the
correct solution.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reinstate the BUG_ON, but after the action==NULL check. Since we then
go and start interpreting action as an irq_guest_action_t, the BUG_ON
is relevant here.
More generally, the brute-force nature of dynamic_irq_cleanup() looks
a bit worrying. Possibly there should be more integratioin with
pirq_guest_unbind() logic, for cleaning up un-acked EOIs and the like.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
PV on HVM guests can have more GSIs than the host, in that case we
could run out of pirq < nr_irqs_gsi. When that happens use pirq >=
nr_irqs_gsi rather than returning an error.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Benjamin Schweikert <b.schweikert@googlemail.com>
|
|
|
|
|
|
|
|
|
| |
Introduce old_vector to irq_cfg with the same principle as
old_cpu_mask. This removes a brute force loop from
__clear_irq_vector(), and paves the way to correct bitrotten logic
elsewhere in the irq code.
Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
| |
irq_status is an int for each of nr_irqs which represents a single
boolean variable. Fold it into the bitfield in irq_cfg, which saves
768 bytes per CPU with per-cpu IDTs in use.
Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
| |
irq_desc.depth is a write only variable.
LEGACY_IRQ_FROM_VECTOR(vec) is never referenced.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As mentioned in previous changesets, AMD IOMMU interrupt
remapping tables only look at the vector, not the destination
id of an interrupt. This means that all IRQs going through
the same interrupt remapping table need to *not* share vectors.
The irq "vector map" functionality was originally introduced
after a patch which disabled global AMD IOMMUs entirely. That
patch has since been reverted, meaning that AMD intremap tables
can either be per-device or global.
This patch therefore introduces a global irq vector map option,
and enables it if we're using an AMD IOMMU with a global
interrupt remapping table.
This patch removes the "irq-perdev-vector-map" boolean
command-line optino and replaces it with "irq_vector_map",
which can have one of three values: none, global, or per-device.
Setting the irq_vector_map to any value will override the
default that the AMD code sets.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
| |
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
| |
hvm_domain_use_pirq should return true when the guest is using a
certain pirq, no matter if the corresponding event channel is
currently enabled or disabled. As an additional complication, qemu is
going to request pirqs for passthrough devices even for Xen unaware
HVM guests, so we need to wait for an event channel to be connected
before considering the pirq of a passthrough device as "in use".
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When migrating IO-APIC line level interrupts between PCPUs, the
migration code rewrites the IO-APIC entry to point to the new
CPU/Vector before EOI'ing it.
The EOI process says that EOI'ing the Local APIC will cause a
broadcast with the vector number, which the IO-APIC must listen to to
clear the IRR and Status bits.
In the case of migrating, the IO-APIC has already been
reprogrammed so the EOI broadcast with the old vector fails to match
the new vector, leaving the IO-APIC with an outstanding vector,
preventing any more use of that line interrupt. This causes a lockup
especially when your root device is using PCI INTA (megaraid_sas
driver *ehem*)
However, the problem is mostly hidden because send_cleanup_vector()
causes a cleanup of all moving vectors on the current PCPU in such a
way which does not cause the problem, and if the problem has occured,
the writes it makes to the IO-APIC clears the IRR and Status bits
which unlocks the problem.
This fix is distinctly a temporary hack, waiting on a cleanup of the
irq code. It checks for the edge case where we have moved the irq,
and manually EOI's the old vector with the IO-APIC which correctly
clears the IRR and Status bits. Also, it protects the code which
updates irq_cfg by disabling interrupts.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
| |
This particularly eliminates the bogus passing of NULL by hpet.c.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to make sure that cfg->used_vector is only cleared once;
otherwise there may be a race condition that allows the same vector to
be assigned twice, defeating the whole purpose of the map.
This makes two changes:
* __clear_irq_vector() only clears the vector if the irq is not being
moved
* smp_iqr_move_cleanup_interrupt() only clears used_vector if this
is the last place it's being used (move_cleanup_count==0 after
decrement).
Also make use of asserts more consistent, to catch this kind of logic
bug in the future.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the old code, tmp_mask is the cpu_and of cfg->cpu_mask and
cpu_online_map. However, in the usual case of moving an IRQ from one
PCPU to another because the scheduler decides its a good idea,
cfg->cpu_mask and cfg->old_cpu_mask do not intersect. This causes the
old cpu vector_irq table to keep the irq reference when it shouldn't.
This leads to a resource leak if a domain is shut down wile an irq has
a move pending, which results in Xen's create_irq() eventually failing
with -ENOSPC when all vector_irq tables are full of stale references.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
| |
Automatically enable per-device vector maps when using IOMMU,
unless disabled specifically by an IOMMU parameter.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a vector-map to pci_dev, and add an option to point MSI-related
IRQs to the vector-map of the device.
This prevents irqs from the same device from being assigned
the same vector on different pcpus. This is required for systems
using an AMD IOMMU, since the intremap tables on AMD only look at
vector, and not destination ID.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
| |
Laying the groundwork for per-device vector maps. This generic
code allows any irq to point to a vector map; all irqs sharing the
same vector map will avoid sharing vectors.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reduce size of extra_data in to avoid possible crash in trace_var.
(XEN) Assertion 'extra_word <= TRACE_EXTRA_MAX' failed at trace.c:687
(XEN) Xen call trace:
(XEN) [<ffff82c480128783>] __trace_var+0x4d/0x3b8
(XEN) [<ffff82c480162172>] trace_irq_mask+0x49/0x4b
(XEN) [<ffff82c4801631ae>] __assign_irq_vector+0x241/0x374
(XEN) [<ffff82c48015d813>] set_desc_affinity+0x5d/0xd4
(XEN) [<ffff82c480160708>] set_msi_affinity+0x44/0x1c1
(XEN) [<ffff82c480162938>] move_masked_irq+0x9c/0xcd
(XEN) [<ffff82c4801629a7>] move_native_irq+0x3e/0x53
(XEN) [<ffff82c48015d969>] ack_msi_irq+0x2c/0x6e
(XEN) [<ffff82c4801622e3>] do_IRQ+0x16f/0x66d
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
| |
The reason is:
1. The logic has negative impact on 10G NIC performance (assigned to
guest) by lowering the interrupt frequency that Xen can handle.
2. Xen already has IRQ rate limit logic, which can also help to
prevent IRQ storms.
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
| |
...and drop the now unused struct domain * parameter of the latter.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
Remove unnecessary/bogus assertions and add retry loop matching
domain_spin_lock_irq_desc().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
Add tracing for various IRQ-related events. Also, move
the exiting TRC_TRACE_IRQ from the "generic" class into the
new TRC_HW_IRQ sub-class.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this it is questionable whether retaining struct domain's
nr_pirqs is actually necessary - the value now only serves for bounds
checking, and this boundary could easily be nr_irqs.
Note that ia64, the build of which is broken currently anyway, is only
being partially fixed up.
v2: adjustments for split setup/teardown of translation data
v3: re-sync with radix tree implementation changes
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes PV on HVM interrupt remapping with recent Linux
kernels and upstream qemu. hvm_domain_use_pirq should return positive
even if the evtchn is not currently bound. If it doesn't assert_irq
ends up injecting legacy interrupts even after the guest disabled the
irq.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
|
|
|
|
|
|
|
| |
Move all extern declarations into appropriate header files.
This also fixes up a few places where the caller and the definition
had different signatures.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
|
|
|
|
| |
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It would seem possible to fold the two trees into one (making e.g. the
emuirq bits stored in the upper half of the pointer), but I'm not
certain that's worth it as it would make deletion of entries more
cumbersome. Unless pirq-s and emuirq-s were mutually exclusive...
v2: Split setup/teardown into two stages - (de-)allocation (tree node
(de-)population) is done with just d->event_lock held (and hence
interrupts enabled), while actual insertion/removal of translation
data gets done with irq_desc's lock held (and interrupts disabled).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Fix up for new radix-tree implementation. In particular, we should
never insert NULL into a radix tree, as that means empty slot (which
can be reclaimed during a deletion). Make use of
radix_tree_int_to_ptr() (and its inverse) to hide some of these
details.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
Fails current lock checking mechanism in spinlock.c in debug=y builds.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this it is questionable whether retaining struct domain's
nr_pirqs is actually necessary - the value now only serves for bounds
checking, and this boundary could easily be nr_irqs.
Another thing to consider is whether it's worth storing the pirq
number in struct pirq, to avoid passing the number and a pointer to
quite a number of functions.
Note that ia64, the build of which is broken currently anyway, is only
partially fixed up.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
| |
It would seem possible to fold the two trees into one (making e.g. the
emuirq bits stored in the upper half of the pointer), but I'm not
certain that's worth it as it would make deletion of entries more
cumbersome. Unless pirq-s and emuirq-s were mutually exclusive...
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is accomplished by converting a couple of embedded arrays (in one
case a structure containing an array) into separately allocated
pointers, and (just as for struct arch_vcpu in a prior patch)
overlaying some PV-only fields with HVM-only ones.
One particularly noteworthy change in the opposite direction is that
of PITState - this field so far lived in the HVM-only portion, but is
being used by PV guests too, and hence needed to be moved out of
struct hvm_domain.
The change to XENMEM_set_memory_map (and hence libxl__build_pre() and
the movement of the E820 related pieces to struct pv_domain) are
subject to a positive response to a query sent to xen-devel regarding
the need for this to happen for HVM guests (see
http://lists.xensource.com/archives/html/xen-devel/2011-03/msg01848.html).
The protection of arch.hvm_domain.irq.dpci accesses by is_hvm_domain()
is subject to confirmation that the field is used for HVM guests only
(see
http://lists.xensource.com/archives/html/xen-devel/2011-03/msg02004.html).
In the absence of any reply to these queries, and given the early
state of 4.2 development, I think it should be acceptable to take the
risk of having to later undo/redo some of this.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
With no modular drivers, all interrupt setup is supposed to happen
during boot.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
With no modular drivers, all interrupt setup is supposed to happen
during boot.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
| |
guest-bound irq before accessing desc->action.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
This also includes the removal of some entirely unused functions.
The patch builds upon the makefile adjustments done in the earlier
sent patch titled "move more kernel decompression bits to .init.*
sections".
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
Remove unused and pointless bits from IO-APIC handling code. Move
whatever possible into .init.*, and some data items into
.data.read_mostly. Adjust some types.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
| |
The privilege check in unmap_domain_pirq() fails since the teardown
completes in RCU (idle domain) context. We can remove the check since
it is covered in physdev_op() already, which is the only potentially
unprivileged caller.
Signed-off-by: Wei Gang <gang.wei@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
... decreasing cache footprint. As a prerequisite this requires making
cmdline_parse() a little more flexible.
Also remove a few variables altogether, and adjust sections
annotations for several others.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
... and remove some variables the value of which is never used
altogether.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
| |
Additionally simplify operations on them in a few cases.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|