| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the permission checks for memory mapping were moved from
get_pg_owner to xsm_mmu_update in aaba7a677, the exception for DOMID_IO
was not taken into account. This will cause IO memory mappings by PV
domains (mini-os in particular) to fail when XSM/FLASK is not being
used. This patch reintroduces the exception for DOMID_IO; the actual
restrictions on IO memory mappings have always been checked separately
using iomem_access_permitted, so this change should not break existing
access control.
Reported-by: Eduardo Peixoto Macedo <epm@cin.ufpe.br>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 07344c0f33be13bf9232a113681ef9087557f706
master date: 2013-09-26 10:15:47 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... as there doesn't really exists any valid mapping for them.
Particularly in the case of do_page_walk() this also avoids returning
non-NULL for such invalid input.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 6fd9b0361e2eb5a7f12bdd5cbf7e42c0d1937d26
master date: 2013-10-11 09:31:16 +0200
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just like for guest_get_eff_l1e() this prevents accessing as page
tables (and with the wrong memory attribute) internal data inside Xen
happening to be mapped with 1Gb pages.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: d06a0d715ec1423b6c42141ab1b0ff69a3effb56
master date: 2013-10-11 09:29:43 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- MMUEXT_SET_LDT should behave as similarly to the LLDT instruction as
possible: fail only if the base address is non-canonical
- instead LDT descriptor accesses should fault if the descriptor
address ends up being non-canonical (by ensuring this we at once
avoid reading an entry from the mach-to-phys table and consider it a
page table entry)
- fault propagation on using LDT selectors must distinguish #PF and #GP
(the latter must be raised for a non-canonical descriptor address,
which also applies to several other uses of propagate_page_fault(),
and hence the problem is being fixed there)
- map_ldt_shadow_page() should properly wrap addresses for 32-bit VMs
At once remove the odd invokation of map_ldt_shadow_page() from the
MMUEXT_SET_LDT handler: There's nothing really telling us that the
first LDT page is going to be preferred over others.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 40d66baa46ca8a9ffa6df3e063a967d08ec92bcf
master date: 2013-10-11 09:28:26 +0200
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
since it errors out, asking for at least one argument, and does
not display any useful output, which is wrong (we want the list
and the info about all the existing cpupools).
IOW, the output is as follows:
~# xl cpupool-list -c
'xl cpupool-list' requires at least 1 argument.
...
While it should be as follows:
~# xl cpupool-list -c
Name CPU list
Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
(cherry picked from commit 3998afdbf99959582dcd9f9f4df5a6fe7ce4ded8)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...otherwise it will return freed memory. All the current users of this
function check already for a NULL return, so use that.
Coverity-ID: 1056194
This is CVE-2013-4371 / XSA-70
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 4c37ed562224295c0f8b00211287d57cae629782)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not sure how it got there...
Coverity-ID: 1056196
This is CVE-2013-4370 / XSA-69
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 3cd10fd21220f2b814324e6e732004f8f0487d0a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
strtok can return NULL here. We don't need to use strtok anyway, so just
use a simple strchr method.
Coverity-ID: 1055642
This is CVE-2013-4369 / XSA-68
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Fix type. Add test case
Signed-off-by: Ian Campbell <Ian.campbell@citrix.com>
(cherry picked from commit c53702cee1d6f9f1b72f0cae0b412e21bcda8724)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When emulating such an operation from a 64-bit context (CS has long
mode set), and the data segment is overridden to FS/GS, the result of
reading the overridden segment's descriptor (read_descriptor) is not
checked. If it fails, data_base is left uninitialized.
This can lead to 8 bytes of Xen's stack being leaked to the guest
(implicitly, i.e. via the address given in a #PF).
Coverity-ID: 1055116
This is CVE-2013-4368 / XSA-67.
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Fix formatting.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 0771faba163769089c9f05f7f76b63e397677613
master date: 2013-10-10 15:19:53 +0200
|
|
|
|
|
|
|
|
|
|
| |
The CONSOLEIO_read operation was incorrectly allowed to PV guests if the
hypervisor was compiled in debug mode (with VERBOSE defined).
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 65ba631bcb62c79eb33ebfde8a0471fd012c37a8
master date: 2013-10-04 12:51:44 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, it use hardcode value for IA32_VMX_CR4_FIXED1. This is wrong.
We should check guest's cpuid to know which bits are writeable in CR4 by guest
and allow the guest to set the corresponding bit only when guest has the feature.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Cleanup.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
master commit: c6f92aed0e209df823d2cb5780dbb1ea12fc6d4a
master date: 2013-10-04 12:30:09 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VMX MSRs only available when the CPU support the VMX feature. In addition,
VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Cleanup.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
master commit: 190b667ac20e8175758f4a3a0f13c4d990e6af7e
master date: 2013-10-04 12:28:14 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This causes accidental uses of per_cpu() on a pcpu with an INVALID_PERCPU_AREA
to result in a #GF for attempting to access the middle of the non-canonical
virtual address region.
This is preferable to the current behaviour, where incorrect use of per_cpu()
will result in an effective NULL structure dereference which has security
implication in the context of PV guests.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 7cfb0053629c4dd1a6f01dc43cca7c0c25b8b7bf
master date: 2013-10-04 12:24:34 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checking for "idle_vcpu[cpu] != NULL" is insufficient protection against
offline pcpus. From a hypercall, vcpu_runstate_get() will determine "v !=
current", and try to take the vcpu_schedule_lock(). This will try to look up
per_cpu(schedule_data, v->processor) and promptly suffer a NULL structure
deference as v->processors' __per_cpu_offset is INVALID_PERCPU_AREA.
One example might look like this:
...
Xen call trace:
[<ffff82c4c0126ddb>] vcpu_runstate_get+0x50/0x113
[<ffff82c4c0126ec6>] get_cpu_idle_time+0x28/0x2e
[<ffff82c4c012b5cb>] do_sysctl+0x3db/0xeb8
[<ffff82c4c023280d>] compat_hypercall+0xbd/0x116
Pagetable walk from 0000000000000040:
L4[0x000] = 0000000186df8027 0000000000028207
L3[0x000] = 0000000188e36027 00000000000261c9
L2[0x000] = 0000000000000000 ffffffffffffffff
****************************************
Panic on CPU 11:
...
get_cpu_idle_time() has been updated to correctly deal with offline pcpus
itself by returning 0, in the same way as it would if it was missing the
idle_vcpu[] pointer.
In doing so, XENPF_getidletime needed updating to correctly retain its
described behaviour of clearing bits in the cpumap for offline pcpus.
As this crash can only be triggered with toolstack hypercalls, it is not a
security issue and just a simple bug.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 0aa27ce3351f7eb09d13e863a1d5f303086aa32a
master date: 2013-10-04 12:23:23 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the direct map area can extend all the way up to almost the
end of address space, this is wasteful.
Also fold two almost redundant messages in SRAT parsing into one.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: ca145fe70bad3a25ad54c6ded1ef237e45a2311e
master date: 2013-09-30 15:28:12 +0200
|
| |
|
|
|
|
|
|
|
|
|
| |
This is CVE-2013-4361 / XSA-66.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
master commit: 28b706efb6abb637fabfd74cde70a50935a5640b
master date: 2013-09-30 14:18:58 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shadowed PV L4 tables must have the same Xen mappings as their
unshadowed equivalent. This is done by copying the Xen entries
verbatim from the idle pagetable, and then using guest_l4_slot()
in the SHADOW_FOREACH_L4E() iterator to avoid touching those entries.
adc5afbf1c70ef55c260fb93e4b8ce5ccb918706 (x86: support up to 16Tb)
changed the definition of ROOT_PAGETABLE_XEN_SLOTS to extend right to
the top of the address space, which causes the shadow code to
copy Xen mappings into guest-kernel-address slots too.
In the common case, all those slots are zero in the idle pagetable,
and no harm is done. But if any slot above #271 is non-zero, Xen will
crash when that slot is later cleared (it attempts to drop
shadow-pagetable refcounts on its own L4 pagetables).
Fix by using the new ROOT_PAGETABLE_PV_XEN_SLOTS when appropriate.
Monitor pagetables need the full Xen mappings, so they keep using the
old name (with its new semantics).
This is CVE-2013-4356 / XSA-64.
Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f46befdd825c8a459c5eb21adb7d5b0dc6e30ad5
master date: 2013-09-30 14:18:25 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ignoring them generally implies using uninitialized data and, in all
but two of the cases dealt with here, potentially leaking hypervisor
stack contents to guests.
This is CVE-2013-4355 / XSA-63.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6bb838e7375f5b031e9ac346b353775c90de45dc
master date: 2013-09-30 14:17:46 +0200
|
|
|
|
|
|
|
|
|
|
|
|
| |
We shouldn't do any acceleration for
- "rep movs" when either side is passed through MMIO or when both sides
are handled by qemu
- "rep ins" and "rep outs" when the memory operand is any kind of MMIO
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 14fcce2fa883405bab26b60821a6cc5f2c770833
master date: 2013-09-23 09:55:14 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... rather than just for the first byte.
While at it, also
- make the real mode case at least dpo a wrap around check
- drop the mis-named "gpf" label (we're not generating faults here)
and use in-place returns instead
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 7f12732670b31b2fea899a4160d455574658474f
master date: 2013-09-23 09:53:55 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in _csched_cpu_pick(), as not doing so may result in the domain's
node-affinity mask (as retrieved by csched_balance_cpumask() )
and online mask (as retrieved by cpupool_scheduler_cpumask() )
having an empty intersection.
Therefore, when attempting a node-affinity load balancing step
and running this:
...
/* Pick an online CPU from the proper affinity mask */
csched_balance_cpumask(vc, balance_step, &cpus);
cpumask_and(&cpus, &cpus, online);
...
we end up with an empty cpumask (in cpus). At this point, in
the following code:
....
/* If present, prefer vc's current processor */
cpu = cpumask_test_cpu(vc->processor, &cpus)
? vc->processor
: cpumask_cycle(vc->processor, &cpus);
....
an ASSERT (from inside cpumask_cycle() ) triggers like this:
(XEN) Xen call trace:
(XEN) [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652
(XEN) [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10
(XEN) [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e
(XEN) [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287
(XEN) [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4
(XEN) [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1
(XEN) [<ffff82d080127793>] do_tasklet_work+0x78/0xab
(XEN) [<ffff82d080127a70>] do_tasklet+0x5f/0x8b
(XEN) [<ffff82d080158985>] idle_loop+0x57/0x5e
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481
It is for example sufficient to have a domain with node-affinity
to NUMA node 1 running, and issueing a `xl cpupool-numa-split'
would make the above happen. That is because, by default, all
the existing domains remain assigned to the first cpupool, and
it now (after the cpupool-numa-split) only includes NUMA node 0.
This change prevents that by generalizing the function used
for figuring out whether a node-affinity load balancing step
is legit or not. This way we can, in _csched_cpu_pick(),
figure out early enough that the mask would end up empty,
skip the step all together and avoid the splat.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: 5e5a44b6c942d6ea47f15d6f1ed02b03e0d69445
master date: 2013-09-20 11:37:28 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Depending on the state of the conring and serial_tx_buffer,
console_force_unlock() can be a long running operation, usually because of
serial_start_sync()
XenServer testing has found a reliable case where console_force_unlock() on
one PCPU takes long enough for another PCPU to timeout due to the watchdog
(such as waiting for a tlb flush callin).
The watchdog timeout causes the second PCPU to repeat the
console_force_unlock(), at which point the first PCPU typically fails an
assertion in spin_unlock_irqrestore(&port->tx_lock) (because the tx_lock has
been unlocked behind itself).
console_force_unlock() is only on emergency paths, so one way or another the
host is going down. Disable the watchdog before forcing the console lock to
help prevent having pcpus completing with each other to bring the host down.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 7b9fa702ca323164d6b49e8b639a57f880454a8c
master date: 2013-08-13 14:31:01 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
console_lock_busted gets set when an NMI/MCE/Double Fault handler decides to
bring Xen down in an emergency. conring_puts() cannot block and does
not have problematic interactions with the console_lock.
Therefore, choosing to not put the string into the console ring simply means
that the kexec environment cant find any panic() message caused by an IST
interrupt, which is unhelpful for debugging purposes.
In the case that two pcpus fight with console_force_unlock(), having slightly
garbled strings in the console ring is far more useful than having nothing at
all.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 66450c1d1ab3c4480bbba949113b95d1ab6a943a
master date: 2013-08-06 17:45:00 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Till now, when setting previously unset bits in XCR0 we wouldn't touch
the active register state, thus leaving in the newly enabled registers
whatever a prior user of it left there, i.e. potentially leaking
information between guests.
This is CVE-2013-1442 / XSA-62.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 63a75ba0de817d6f384f96d25427a05c313e2179
master date: 2013-09-25 10:41:25 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since xen-3.3 an official unplug protocol for emulated hardware is
available in the toolstack. The pvops kernel does the unplug per
default, so it is safe to do it also in the drivers for forward ported
xenlinux.
Currently its required to load xen-platform-pci with the module
parameter dev_unplug=all, which is cumbersome.
Also recognize the dev_unplug=never parameter, which provides the
default before this patch.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
master commit: df17e9c889c48c9c10aa3f9dd0bb11077f54efc4
master date: 2013-09-20 11:41:08 +0200
|
|
|
|
|
|
|
|
|
|
|
| |
Just like real hardware we ought to split such accesses transparently
to the caller. With little extra effort we can at once even handle page
crossing accesses correctly.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 3b89f08a498ddac09d4002d9849e329018ceb107
master date: 2013-09-20 11:01:08 +0200
|
|
|
|
|
|
|
|
|
|
|
| |
If the allocation fails, make sure to call vmx_vmcs_exit().
This is a candidate for backport.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
master commit: dad7e45bf44c0569546a3ed7d0fa4182a4a73f0a
master date: 2013-09-18 14:45:42 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like one of the failure cases in hvm_vcpu_initialise jumps to
the wrong label; this could lead to slow leaks if something isn't
cleaned up properly.
I will probably change these labels in a future patch, but I figured
it was better to have this fix separately.
This is also a candidate for backport.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
master commit: 925fbcb7fdd6238f26b1576dc1f3e297f1f24f1e
master date: 2013-09-18 14:45:24 +0200
|
|
|
|
|
|
|
|
|
|
| |
Coverity CID 1055502
Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 72fa4fdf647ba99ecaf39589a93cde8dd36eed3c
master date: 2013-09-17 16:36:25 +0100
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Discovered by Coverity, CID 1055181
core2_vpmu_dump() was incorrectly setting VPMU_CONTEXT_LOADED when it
was intending to check for it.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
This would have been avoided if the dump function declared all its
pointers "const" - doing this now (also in SVM).
Also fixing some indentation issues at once.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 42c5b1214071d363a52c6356dfe2ed820f500849
master date: 2013-09-16 12:22:20 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
.. as that function is not idempotent (it always alters the table
checksum). The (generally) duplicate call was a result from it being
made before machine_restart() re-invoking itself on the boot CPU.
Considering that no problem arose so far from the table corruption I
doubt that we need to restore the correct table signature on the
reboot path in general. The only case I can see this as potentially
necessary is the tboot one, hence do the call just in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: a54dc5f4fe1eae6b1beb21326ef0338cd3969cd1
master date: 2013-09-13 14:27:34 +0200
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity CID 1055131
Coverity CID 1055132
Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 803f9a6cdfeda64beee908576de0ad02d6b0c480
master date: 2013-09-12 17:47:08 +0100
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The switch-over logic from one page directory to the next was wrong;
it needs to be deferred until we actually reach the last page within
a given region, instead of being done when the last entry of a page
directory gets started with.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
master commit: 06d086832155fc7f5344e9d108b979de34674d11
master date: 2013-09-12 17:41:04 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For one setup_max_pdx(), when invoked a second time (after SRAT got
parsed), needs to start from the original max_page value again (using
the already adjusted one from the first invocation would not allow the
cut-off boundary to be moved up).
Second, _if_ we need to cut off some part of memory, we must not allow
this to also propagate into the NUMA accounting. Otherwise
cutoff_node() results in nodes_cover_memory() to find some parts of
memory apparently not having a PXM association, causing all SRAT info
to be ignored.
The only possibly problematic consumer of node_spanned_pages (the
meaning of which gets altered here in that it now also includes memory
Xen can't actively make use of) is XEN_SYSCTL_numainfo: At a first
glance the potentially larger reported memory size shouldn't confuse
tool stacks.
And finally we must not put our boot time modules at addresses which
(at that time) can't be guaranteed to be accessible later. This applies
to both the EFI boot loader and the module relocation code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
master commit: 8efce9d69998a3d3c720ac7dbdb9b7e240369957
master date: 2013-09-12 09:52:53 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity-ID: 1055121
Coverity-ID: 1055122
Coverity-ID: 1055123
Coverity-ID: 1055124
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 546ba2f17008387cf9821df46e6dac04f0883a9b
master date: 2013-09-10 17:16:02 +0200
|
|
|
|
|
|
|
|
|
|
|
|
| |
The return value of vasprintf must be checked. This check is enforced
with the compiler options used in Debian by request and in Ubuntu by
default.
Check the return value and abort on error.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 1efe90faa31be104a24fe75323429d227eae1d9f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of commit 05bfd984dfe7014f1f5ea1133608b9bab589c120, hotplug scripts
are not run if backend_domid != LIBXL_TOOSTACK_DOMID; so there is no reason
to restrict this for network driver domains any more.
This is a candidate for backporting to 4.3.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 8f46b1cb99fe519ac39d10d0796c6be37fb1d178)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bit 31 of revision_id will set to 1 if vmcs shadowing enabled. And
according intel SDM, the bit 31 of IA32_VMX_BASIC MSR is always 0. So we
cannot set low 32 bit of IA32_VMX_BASIC to revision_id directly. Must clear
the bit 31 to 0.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f3a4eb9253826d1e49e682314c8666b28fa0b717
master date: 2013-09-10 16:41:35 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With CPUID features suitably masked this is supposed to work, but was
completely broken (i.e. the case wasn't even considered when the
original xsave save/restore code was written).
First of all, xsave_enabled() wrongly returned the value of
cpu_has_xsave, i.e. not even taking into consideration attributes of
the vCPU in question. Instead this function ought to check whether the
guest ever enabled xsave support (by writing a [non-zero] value to
XCR0). As a result of this, a vCPU's xcr0 and xcr0_accum must no longer
be initialized to XSTATE_FP_SSE (since that's a valid value a guest
could write to XCR0), and the xsave/xrstor as well as the context
switch code need to suitably account for this (by always enforcing at
least this part of the state to be saved/loaded).
This involves undoing large parts of c/s 22945:13a7d1f7f62c ("x86: add
strictly sanity check for XSAVE/XRSTOR") - we need to cleanly
distinguish between hardware capabilities and vCPU used features.
Next both HVM and PV save code needed tweaking to not always save the
full state supported by the underlying hardware, but just the parts
that the guest actually used. Similarly the restore code should bail
not just on state being restored that the hardware cannot handle, but
also on inconsistent save state (inconsistent XCR0 settings or size of
saved state not in line with XCR0).
And finally the PV extended context get/set code needs to use slightly
different logic than the HVM one, as here we can't just key off of
xsave_enabled() (i.e. avoid doing anything if a guest doesn't use
xsave) because the tools use this function to determine host
capabilities as well as read/write vCPU state. The set operation in
particular needs to be capable of cleanly dealing with input that
consists of only the xcr0 and xcr0_accum values (if they're both zero
then no further data is required).
While for things to work correctly both sides (saving _and_ restoring
host) need to run with the fixed code, afaict no breakage should occur
if either side isn't up to date (other than the breakage that this
patch attempts to fix).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 4cc1344447a0458df5d222960f2adf1b65084fa8
master date: 2013-09-09 14:36:54 +0200
|
|
|
|
|
|
|
|
|
|
|
|
| |
- properly validate available feature set on APs
- also validate xsaveopt availability on APs
- properly indicate whether the initialization is on the BSP (we
shouldn't be using "cpu == 0" checks for this)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: c6066e78f4a66005b0d5d86c6ade32e2ab78923a
master date: 2013-08-30 10:56:07 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not doing this was found to cause problems with sequences of allocation
(multi-page), freeing, and then again allocation of the same page upon
boot when interrupts are still disabled (causing the owner field to be
non-zero, thus making the allocator attempt a TLB flush and, in its
processing, triggering an assertion).
Reported-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 0fbf3208d9c1a568aeeb61d9f4fbca03b1cfa1f8
master date: 2013-09-09 14:34:12 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Guest needs the ability to enable and disable MSI-X interrupts
by setting the MSI-X control bit, for a passed-through device.
Guest is allowed to write MSI-X mask bit only if Xen *thinks*
that mask is clear (interrupts enabled). If the mask is set by
Xen (interrupts disabled), writes to mask bit by the guest is
ignored.
Currently, a write to MSI-X mask bit by the guest is silently
ignored.
A likely scenario is where we have a 82599 SR-IOV nic passed
through to a guest. From the guest if you do
ifconfig <ETH_DEV> down
ifconfig <ETH_DEV> up
the interrupts remain masked. On VF reset, the mask bit is set
by the controller. At this point, Xen is not aware that mask is set.
However, interrupts are enabled by VF driver by clearing the mask
bit by writing directly to BAR3 region containing the MSI-X table.
From dom0, we can verify that
interrupts are being masked using 'xl debug-keys M'.
Initially, guest was allowed to modify MSI-X bit.
Later this behaviour was changed.
See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
master commit: a35137373aa9042424565e5ee76dc0a3bb7642ae
master date: 2013-09-09 10:43:11 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Namely with PFN compression, MMIO ranges that the firmware may need
runtime access to can live in the holes that gets shrunk/eliminated by
PFN compression, and hence no mappings would result from simply
copying Xen's direct mapping table's L3 page table entries. Build
mappings for this "manually" in the EFI runtime call 1:1 page tables.
Use the opportunity to also properly identify (via a forcibly undefined
manifest constant) all the disabled code regions associated with it not
being acceptable for us to call SetVirtualAddressMap().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: a350f3f43bcfac9c1591e28d8e43c505fcb172a5
master date: 2013-09-09 10:40:11 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In all cases when a hypercall page is written, __HYPERVISOR_iret is first
written as a regular hypercall, then subsequently rewritten in its special
case.
For VMX and SVM, this means that following the ud2a instruction is 3 bytes of
an imm32 parameter. For a ring3 kernel, this means that following the syscall
instruction is the second half of 'pop %r11'.
For a ring1 kernel, the iret case ends up as the same number of bytes as the
rest of the hypercalls, but it is pointless writing it twice, and is changed
for consistency.
Therefore, skip the loop iteration which would write the incorrect
__HYPERVISOR_iret hypercall. This removes junk machine code from the tail and
makes disassemblers rather more happy when looking at the hypercall page.
Also, a miscellaneous whitespace fix in the comment for ring3 kernel.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: fca11da0ec956b17d7450d7776c3ffa22a8f538a
master date: 2013-07-16 11:10:45 +0200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SeaBIOS ROM image may validly exceed 128k in size, it's only our
interface code that so far assumed that it wouldn't. Remove that
restriction by setting the base address depending on image size.
Add a check to HVM loader so that too big images won't result in silent
guest failure anymore.
Uncomment the intended build-time size check for rombios, moving it
into a function so that it would actually compile.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 5f2875739beef3a75c7a7e8579b6cbcb464e61b3
master date: 2013-09-05 11:47:03 +0200
|
|
|
|
|
|
| |
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: cc29e450d1abd2f8be67208dfb78046885a50cca
master date: 2013-09-04 18:19:01 +0100
|