aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* update Xen version to 4.3.1-rc14.3.1-rc1Jan Beulich2013-10-011-1/+1
|
* x86: properly set up fbld emulation operand addressJan Beulich2013-09-301-2/+2
| | | | | | | | | This is CVE-2013-4361 / XSA-66. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> master commit: 28b706efb6abb637fabfd74cde70a50935a5640b master date: 2013-09-30 14:18:58 +0200
* x86/mm/shadow: Fix initialization of PV shadow L4 tables.Tim Deegan2013-09-301-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Shadowed PV L4 tables must have the same Xen mappings as their unshadowed equivalent. This is done by copying the Xen entries verbatim from the idle pagetable, and then using guest_l4_slot() in the SHADOW_FOREACH_L4E() iterator to avoid touching those entries. adc5afbf1c70ef55c260fb93e4b8ce5ccb918706 (x86: support up to 16Tb) changed the definition of ROOT_PAGETABLE_XEN_SLOTS to extend right to the top of the address space, which causes the shadow code to copy Xen mappings into guest-kernel-address slots too. In the common case, all those slots are zero in the idle pagetable, and no harm is done. But if any slot above #271 is non-zero, Xen will crash when that slot is later cleared (it attempts to drop shadow-pagetable refcounts on its own L4 pagetables). Fix by using the new ROOT_PAGETABLE_PV_XEN_SLOTS when appropriate. Monitor pagetables need the full Xen mappings, so they keep using the old name (with its new semantics). This is CVE-2013-4356 / XSA-64. Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: f46befdd825c8a459c5eb21adb7d5b0dc6e30ad5 master date: 2013-09-30 14:18:25 +0200
* x86: properly handle hvm_copy_from_guest_{phys,virt}() errorsJan Beulich2013-09-304-31/+66
| | | | | | | | | | | | | | Ignoring them generally implies using uninitialized data and, in all but two of the cases dealt with here, potentially leaking hypervisor stack contents to guests. This is CVE-2013-4355 / XSA-63. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 6bb838e7375f5b031e9ac346b353775c90de45dc master date: 2013-09-30 14:17:46 +0200
* x86/HVM: refuse doing string operations in certain situationsJan Beulich2013-09-271-0/+14
| | | | | | | | | | | | We shouldn't do any acceleration for - "rep movs" when either side is passed through MMIO or when both sides are handled by qemu - "rep ins" and "rep outs" when the memory operand is any kind of MMIO Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 14fcce2fa883405bab26b60821a6cc5f2c770833 master date: 2013-09-23 09:55:14 +0200
* x86/HVM: linear address must be canonical for the whole accessed rangeJan Beulich2013-09-271-12/+13
| | | | | | | | | | | | | | ... rather than just for the first byte. While at it, also - make the real mode case at least dpo a wrap around check - drop the mis-named "gpf" label (we're not generating faults here) and use in-place returns instead Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 7f12732670b31b2fea899a4160d455574658474f master date: 2013-09-23 09:53:55 +0200
* sched_credit: filter node-affinity mask against online cpusDario Faggioli2013-09-271-11/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in _csched_cpu_pick(), as not doing so may result in the domain's node-affinity mask (as retrieved by csched_balance_cpumask() ) and online mask (as retrieved by cpupool_scheduler_cpumask() ) having an empty intersection. Therefore, when attempting a node-affinity load balancing step and running this: ... /* Pick an online CPU from the proper affinity mask */ csched_balance_cpumask(vc, balance_step, &cpus); cpumask_and(&cpus, &cpus, online); ... we end up with an empty cpumask (in cpus). At this point, in the following code: .... /* If present, prefer vc's current processor */ cpu = cpumask_test_cpu(vc->processor, &cpus) ? vc->processor : cpumask_cycle(vc->processor, &cpus); .... an ASSERT (from inside cpumask_cycle() ) triggers like this: (XEN) Xen call trace: (XEN) [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652 (XEN) [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10 (XEN) [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e (XEN) [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287 (XEN) [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4 (XEN) [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1 (XEN) [<ffff82d080127793>] do_tasklet_work+0x78/0xab (XEN) [<ffff82d080127a70>] do_tasklet+0x5f/0x8b (XEN) [<ffff82d080158985>] idle_loop+0x57/0x5e (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481 It is for example sufficient to have a domain with node-affinity to NUMA node 1 running, and issueing a `xl cpupool-numa-split' would make the above happen. That is because, by default, all the existing domains remain assigned to the first cpupool, and it now (after the cpupool-numa-split) only includes NUMA node 0. This change prevents that by generalizing the function used for figuring out whether a node-affinity load balancing step is legit or not. This way we can, in _csched_cpu_pick(), figure out early enough that the mask would end up empty, skip the step all together and avoid the splat. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> master commit: 5e5a44b6c942d6ea47f15d6f1ed02b03e0d69445 master date: 2013-09-20 11:37:28 +0200
* watchdog/crash: Always disable watchdog in console_force_unlock()Andrew Cooper2013-09-272-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | Depending on the state of the conring and serial_tx_buffer, console_force_unlock() can be a long running operation, usually because of serial_start_sync() XenServer testing has found a reliable case where console_force_unlock() on one PCPU takes long enough for another PCPU to timeout due to the watchdog (such as waiting for a tlb flush callin). The watchdog timeout causes the second PCPU to repeat the console_force_unlock(), at which point the first PCPU typically fails an assertion in spin_unlock_irqrestore(&port->tx_lock) (because the tx_lock has been unlocked behind itself). console_force_unlock() is only on emergency paths, so one way or another the host is going down. Disable the watchdog before forcing the console lock to help prevent having pcpus completing with each other to bring the host down. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 7b9fa702ca323164d6b49e8b639a57f880454a8c master date: 2013-08-13 14:31:01 +0200
* xen/conring: Write to console ring even if console lock is bustedAndrew Cooper2013-09-271-4/+3
| | | | | | | | | | | | | | | | | | | | console_lock_busted gets set when an NMI/MCE/Double Fault handler decides to bring Xen down in an emergency. conring_puts() cannot block and does not have problematic interactions with the console_lock. Therefore, choosing to not put the string into the console ring simply means that the kexec environment cant find any panic() message caused by an IST interrupt, which is unhelpful for debugging purposes. In the case that two pcpus fight with console_force_unlock(), having slightly garbled strings in the console ring is far more useful than having nothing at all. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Matt Wilson <msw@amazon.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 66450c1d1ab3c4480bbba949113b95d1ab6a943a master date: 2013-08-06 17:45:00 +0200
* x86/xsave: initialize extended register state when guests enable itJan Beulich2013-09-251-0/+15
| | | | | | | | | | | | | | Till now, when setting previously unset bits in XCR0 we wouldn't touch the active register state, thus leaving in the newly enabled registers whatever a prior user of it left there, i.e. potentially leaking information between guests. This is CVE-2013-1442 / XSA-62. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 63a75ba0de817d6f384f96d25427a05c313e2179 master date: 2013-09-25 10:41:25 +0200
* unmodified_drivers: enable unplug per defaultOlaf Hering2013-09-231-1/+7
| | | | | | | | | | | | | | | Since xen-3.3 an official unplug protocol for emulated hardware is available in the toolstack. The pvops kernel does the unplug per default, so it is safe to do it also in the drivers for forward ported xenlinux. Currently its required to load xen-platform-pci with the module parameter dev_unplug=all, which is cumbersome. Also recognize the dev_unplug=never parameter, which provides the default before this patch. Signed-off-by: Olaf Hering <olaf@aepfle.de> master commit: df17e9c889c48c9c10aa3f9dd0bb11077f54efc4 master date: 2013-09-20 11:41:08 +0200
* x86/HVM: properly handle MMIO reads and writes wider than a machine wordJan Beulich2013-09-231-20/+95
| | | | | | | | | | | Just like real hardware we ought to split such accesses transparently to the caller. With little extra effort we can at once even handle page crossing accesses correctly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 3b89f08a498ddac09d4002d9849e329018ceb107 master date: 2013-09-20 11:01:08 +0200
* VMX: fix failure path in construct_vmcsGeorge Dunlap2013-09-231-0/+3
| | | | | | | | | | | If the allocation fails, make sure to call vmx_vmcs_exit(). This is a candidate for backport. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> master commit: dad7e45bf44c0569546a3ed7d0fa4182a4a73f0a master date: 2013-09-18 14:45:42 +0200
* x86/HVM: fix failure path in hvm_vcpu_initialiseGeorge Dunlap2013-09-231-1/+1
| | | | | | | | | | | | | | | | It looks like one of the failure cases in hvm_vcpu_initialise jumps to the wrong label; this could lead to slow leaks if something isn't cleaned up properly. I will probably change these labels in a future patch, but I figured it was better to have this fix separately. This is also a candidate for backport. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> master commit: 925fbcb7fdd6238f26b1576dc1f3e297f1f24f1e master date: 2013-09-18 14:45:24 +0200
* passthrough/amd: Missing 'break'Tim Deegan2013-09-231-0/+1
| | | | | | | | | | Coverity CID 1055502 Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> master commit: 72fa4fdf647ba99ecaf39589a93cde8dd36eed3c master date: 2013-09-17 16:36:25 +0100
* hvm/vpmu: Prevent dump handlers from incorrectly mutating stateAndrew Cooper2013-09-233-16/+17
| | | | | | | | | | | | | | | | | | | | Discovered by Coverity, CID 1055181 core2_vpmu_dump() was incorrectly setting VPMU_CONTEXT_LOADED when it was intending to check for it. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> This would have been avoided if the dump function declared all its pointers "const" - doing this now (also in SVM). Also fixing some indentation issues at once. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> master commit: 42c5b1214071d363a52c6356dfe2ed820f500849 master date: 2013-09-16 12:22:20 +0200
* x86: machine_restart() must not call acpi_dmar_reinstate() twiceJan Beulich2013-09-231-2/+3
| | | | | | | | | | | | | | | | .. as that function is not idempotent (it always alters the table checksum). The (generally) duplicate call was a result from it being made before machine_restart() re-invoking itself on the boot CPU. Considering that no problem arose so far from the table corruption I doubt that we need to restore the correct table signature on the reboot path in general. The only case I can see this as potentially necessary is the tboot one, hence do the call just in that case. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: a54dc5f4fe1eae6b1beb21326ef0338cd3969cd1 master date: 2013-09-13 14:27:34 +0200
* cpufreq: missing check of copy_from_guest()Tim Deegan2013-09-231-2/+6
| | | | | | | | | | | Coverity CID 1055131 Coverity CID 1055132 Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> master commit: 803f9a6cdfeda64beee908576de0ad02d6b0c480 master date: 2013-09-12 17:47:08 +0100
* libxc/x86: fix page table creation for huge guestsJan Beulich2013-09-231-8/+16
| | | | | | | | | | | | | The switch-over logic from one page directory to the next was wrong; it needs to be deferred until we actually reach the last page within a given region, instead of being done when the last entry of a page directory gets started with. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> master commit: 06d086832155fc7f5344e9d108b979de34674d11 master date: 2013-09-12 17:41:04 +0200
* x86: fix memory cut-off when using PFN compressionJan Beulich2013-09-232-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For one setup_max_pdx(), when invoked a second time (after SRAT got parsed), needs to start from the original max_page value again (using the already adjusted one from the first invocation would not allow the cut-off boundary to be moved up). Second, _if_ we need to cut off some part of memory, we must not allow this to also propagate into the NUMA accounting. Otherwise cutoff_node() results in nodes_cover_memory() to find some parts of memory apparently not having a PXM association, causing all SRAT info to be ignored. The only possibly problematic consumer of node_spanned_pages (the meaning of which gets altered here in that it now also includes memory Xen can't actively make use of) is XEN_SYSCTL_numainfo: At a first glance the potentially larger reported memory size shouldn't confuse tool stacks. And finally we must not put our boot time modules at addresses which (at that time) can't be guaranteed to be accessible later. This applies to both the EFI boot loader and the module relocation code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dario Faggioli <dario.faggioli@citrix.com> master commit: 8efce9d69998a3d3c720ac7dbdb9b7e240369957 master date: 2013-09-12 09:52:53 +0200
* sched/arinc653: check for guest data transfer failuresMatthew Daley2013-09-231-2/+11
| | | | | | | | | | | | | Coverity-ID: 1055121 Coverity-ID: 1055122 Coverity-ID: 1055123 Coverity-ID: 1055124 Signed-off-by: Matthew Daley <mattjd@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 546ba2f17008387cf9821df46e6dac04f0883a9b master date: 2013-09-10 17:16:02 +0200
* tools: xen-mceinj: Add missing return value checksBastian Blank2013-09-131-2/+4
| | | | | | | | | | | | The return value of vasprintf must be checked. This check is enforced with the compiler options used in Debian by request and in Ubuntu by default. Check the return value and abort on error. Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> (cherry picked from commit 1efe90faa31be104a24fe75323429d227eae1d9f)
* libxl: Allow network driver domains when run_hotplug_scritps is setGeorge Dunlap2013-09-132-11/+2
| | | | | | | | | | | | | | | As of commit 05bfd984dfe7014f1f5ea1133608b9bab589c120, hotplug scripts are not run if backend_domid != LIBXL_TOOSTACK_DOMID; so there is no reason to restrict this for network driver domains any more. This is a candidate for backporting to 4.3. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Jan Beulich <jbeulich@suse.com> (cherry picked from commit 8f46b1cb99fe519ac39d10d0796c6be37fb1d178)
* make this tree's maintainership explicitJan Beulich2013-09-131-3/+8
|
* Nested VMX: Clear bit 31 of IA32_VMX_BASIC MSRYang Zhang2013-09-121-1/+1
| | | | | | | | | | | | The bit 31 of revision_id will set to 1 if vmcs shadowing enabled. And according intel SDM, the bit 31 of IA32_VMX_BASIC MSR is always 0. So we cannot set low 32 bit of IA32_VMX_BASIC to revision_id directly. Must clear the bit 31 to 0. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: f3a4eb9253826d1e49e682314c8666b28fa0b717 master date: 2013-09-10 16:41:35 +0200
* x86/xsave: fix migration from xsave-capable to xsave-incapable hostJan Beulich2013-09-1210-88/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CPUID features suitably masked this is supposed to work, but was completely broken (i.e. the case wasn't even considered when the original xsave save/restore code was written). First of all, xsave_enabled() wrongly returned the value of cpu_has_xsave, i.e. not even taking into consideration attributes of the vCPU in question. Instead this function ought to check whether the guest ever enabled xsave support (by writing a [non-zero] value to XCR0). As a result of this, a vCPU's xcr0 and xcr0_accum must no longer be initialized to XSTATE_FP_SSE (since that's a valid value a guest could write to XCR0), and the xsave/xrstor as well as the context switch code need to suitably account for this (by always enforcing at least this part of the state to be saved/loaded). This involves undoing large parts of c/s 22945:13a7d1f7f62c ("x86: add strictly sanity check for XSAVE/XRSTOR") - we need to cleanly distinguish between hardware capabilities and vCPU used features. Next both HVM and PV save code needed tweaking to not always save the full state supported by the underlying hardware, but just the parts that the guest actually used. Similarly the restore code should bail not just on state being restored that the hardware cannot handle, but also on inconsistent save state (inconsistent XCR0 settings or size of saved state not in line with XCR0). And finally the PV extended context get/set code needs to use slightly different logic than the HVM one, as here we can't just key off of xsave_enabled() (i.e. avoid doing anything if a guest doesn't use xsave) because the tools use this function to determine host capabilities as well as read/write vCPU state. The set operation in particular needs to be capable of cleanly dealing with input that consists of only the xcr0 and xcr0_accum values (if they're both zero then no further data is required). While for things to work correctly both sides (saving _and_ restoring host) need to run with the fixed code, afaict no breakage should occur if either side isn't up to date (other than the breakage that this patch attempts to fix). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 4cc1344447a0458df5d222960f2adf1b65084fa8 master date: 2013-09-09 14:36:54 +0200
* x86/xsave: initialization improvementsJan Beulich2013-09-123-15/+17
| | | | | | | | | | | | - properly validate available feature set on APs - also validate xsaveopt availability on APs - properly indicate whether the initialization is on the BSP (we shouldn't be using "cpu == 0" checks for this) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: c6066e78f4a66005b0d5d86c6ade32e2ab78923a master date: 2013-08-30 10:56:07 +0200
* xmalloc: make whole pages xfree() clear the order field (ab)used by xmalloc()Jan Beulich2013-09-121-0/+1
| | | | | | | | | | | | | | | Not doing this was found to cause problems with sequences of allocation (multi-page), freeing, and then again allocation of the same page upon boot when interrupts are still disabled (causing the owner field to be non-zero, thus making the allocator attempt a TLB flush and, in its processing, triggering an assertion). Reported-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 0fbf3208d9c1a568aeeb61d9f4fbca03b1cfa1f8 master date: 2013-09-09 14:34:12 +0200
* x86: allow guest to set/clear MSI-X mask bit (try 2)Joby Poriyath2013-09-121-12/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Guest needs the ability to enable and disable MSI-X interrupts by setting the MSI-X control bit, for a passed-through device. Guest is allowed to write MSI-X mask bit only if Xen *thinks* that mask is clear (interrupts enabled). If the mask is set by Xen (interrupts disabled), writes to mask bit by the guest is ignored. Currently, a write to MSI-X mask bit by the guest is silently ignored. A likely scenario is where we have a 82599 SR-IOV nic passed through to a guest. From the guest if you do ifconfig <ETH_DEV> down ifconfig <ETH_DEV> up the interrupts remain masked. On VF reset, the mask bit is set by the controller. At this point, Xen is not aware that mask is set. However, interrupts are enabled by VF driver by clearing the mask bit by writing directly to BAR3 region containing the MSI-X table. From dom0, we can verify that interrupts are being masked using 'xl debug-keys M'. Initially, guest was allowed to modify MSI-X bit. Later this behaviour was changed. See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> master commit: a35137373aa9042424565e5ee76dc0a3bb7642ae master date: 2013-09-09 10:43:11 +0200
* x86/EFI: properly handle run time memory regions outside the 1:1 mapJan Beulich2013-09-121-11/+93
| | | | | | | | | | | | | | | | Namely with PFN compression, MMIO ranges that the firmware may need runtime access to can live in the holes that gets shrunk/eliminated by PFN compression, and hence no mappings would result from simply copying Xen's direct mapping table's L3 page table entries. Build mappings for this "manually" in the EFI runtime call 1:1 page tables. Use the opportunity to also properly identify (via a forcibly undefined manifest constant) all the disabled code regions associated with it not being acceptable for us to call SetVirtualAddressMap(). Signed-off-by: Jan Beulich <jbeulich@suse.com> master commit: a350f3f43bcfac9c1591e28d8e43c505fcb172a5 master date: 2013-09-09 10:40:11 +0200
* x86: Special case __HYPERVISOR_iret rather more when writing hypercall pagesAndrew Cooper2013-09-124-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | In all cases when a hypercall page is written, __HYPERVISOR_iret is first written as a regular hypercall, then subsequently rewritten in its special case. For VMX and SVM, this means that following the ud2a instruction is 3 bytes of an imm32 parameter. For a ring3 kernel, this means that following the syscall instruction is the second half of 'pop %r11'. For a ring1 kernel, the iret case ends up as the same number of bytes as the rest of the hypercalls, but it is pointless writing it twice, and is changed for consistency. Therefore, skip the loop iteration which would write the incorrect __HYPERVISOR_iret hypercall. This removes junk machine code from the tail and makes disassemblers rather more happy when looking at the hypercall page. Also, a miscellaneous whitespace fix in the comment for ring3 kernel. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: fca11da0ec956b17d7450d7776c3ffa22a8f538a master date: 2013-07-16 11:10:45 +0200
* hvmloader: fix SeaBIOS interfaceJan Beulich2013-09-094-7/+7
| | | | | | | | | | | | | | | | | The SeaBIOS ROM image may validly exceed 128k in size, it's only our interface code that so far assumed that it wouldn't. Remove that restriction by setting the base address depending on image size. Add a check to HVM loader so that too big images won't result in silent guest failure anymore. Uncomment the intended build-time size check for rombios, moving it into a function so that it would actually compile. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> master commit: 5f2875739beef3a75c7a7e8579b6cbcb464e61b3 master date: 2013-09-05 11:47:03 +0200
* xen/docs: Correct documentation for the conswitch parameterAndrew Cooper2013-09-091-2/+3
| | | | | | Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: cc29e450d1abd2f8be67208dfb78046885a50cca master date: 2013-09-04 18:19:01 +0100
* xend: fix file descriptor leak in pci utilitiesXi Xiong2013-09-091-0/+6
| | | | | | | | | | | | A file descriptor leak was detected after creating multiple domUs with pass-through PCI devices. This patch fixes the issue. Signed-off-by: Xi Xiong <xixiong@amazon.com> Reviewed-by: Matt Wilson <msw@amazon.com> [msw: adjusted commit message] Signed-off-by: Matt Wilson <msw@amazon.com> master commit: 749019afca4fd002d36856bad002cc11f7d0ddda master date: 2013-09-03 16:36:52 +0100
* xend: handle extended PCI configuration space when saving stateSteven Noonan2013-09-091-1/+2
| | | | | | | | | | | | | | | | Newer PCI standards (e.g., PCI-X 2.0 and PCIe) introduce extended configuration space which is larger than 256 bytes. This patch uses stat() to determine the amount of space used to correctly save all of the PCI configuration space. Resets handled by the xen-pciback driver don't have this problem, as that code correctly handles saving extended configuration space. Signed-off-by: Steven Noonan <snoonan@amazon.com> Reviewed-by: Matt Wilson <msw@amazon.com> [msw: adjusted commit message] Signed-off-by: Matt Wilson <msw@amazon.com> master commit: 1893cf77992cc0ce9d827a8d345437fa2494b540 master date: 2013-09-03 16:36:47 +0100
* public/hvm_xs_strings.h: Fix ABI regression for OEM SMBios stringsAndrew Cooper2013-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code for OEM SMBios strings was: char path[20] = "bios-strings/oem-XX"; path[(sizeof path) - 3] = '0' + ((i < 10) ? i : i / 10); path[(sizeof path) - 2] = (i < 10) ? '\0' : '0' + (i % 10); Where oem-1 thru 9 specifically had no leading 0. However, the definition of HVM_XS_OEM_STRINGS specifically requires leading 0s. This regression was introduced by the combination of c/s 4d23036e709627 and e64c3f71ceb662 I realise that this patch causes a change to the public headers. However I feel it is justified as: * All toolstacks used to have to embed the magic string (and almost certainly still do) * If by some miriacle a new toolstack has started using the new define will continue to work. * The only intree consumer of the define is hvmloader itself. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 0f4cb23c3ea5b987c49c9a9368e7a0d505ec064f master date: 2013-08-30 10:40:48 +0200
* hvmloader/smbios: Correctly count the number of tables writtenAndrew Cooper2013-09-091-1/+2
| | | | | | | | | | | | | | Fixes regression indirectly introduced by c/s 4d23036e709627 That changeset added some smbios tables which were option based on the toolstack providing appropriate xenstore keys. The do_struct() macro would unconditionally increment nr_structs, even if a table was not actually written. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 4aa19549e17650b9bfe2b31d7f52a95696d388f0 master date: 2013-08-30 10:40:29 +0200
* x86: AVX instruction emulation fixesJan Beulich2013-09-092-15/+145
| | | | | | | | | | | | | | | | | | | | | - we used the C4/C5 (first prefix) byte instead of the apparent ModR/M one as the second prefix byte - early decoding normalized vex.reg, thus corrupting it for the main consumer (copy_REX_VEX()), resulting in #UD on the two-operand instructions we emulate Also add respective test cases to the testing utility plus - fix get_fpu() (the fall-through order was inverted) - add cpu_has_avx2, even if it's currently unused (as in the new test cases I decided to refrain from using AVX2 instructions in order to be able to actually run all the tests on the hardware I have) - slightly tweak cpu_has_avx to more consistently express the outputs we don't care about (sinking them all into the same variable) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 062919448e2f4b127c9c3c085b1a8e1d56a33051 master date: 2013-08-28 17:03:50 +0200
* x86: don't allow Dom0 access to the MSI address rangeJan Beulich2013-09-091-0/+4
| | | | | | | | | In particular, MMIO assignments should not be done using this area. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by Xiantao Zhang <xiantao.zhang@intel.com> master commit: 850188e1278cecd1dfb9b936024bee2d8dfdcc18 master date: 2013-08-27 11:11:38 +0200
* AMD IOMMU: add missing checkJan Beulich2013-09-061-0/+7
| | | | | | | | | | | We shouldn't accept IVHD tables specifying IO-APIC IDs beyond the limit we support (MAX_IO_APICS, currently 128). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulapanit@amd.com> master commit: 3785d30efe8264b899499e0883b10cc434bd0959 master date: 2013-08-29 09:31:37 +0200
* Fix inactive timer list corruption on second S3 resumeTomasz Wroblewski2013-09-061-1/+3
| | | | | | | | | | | | | init_timer cannot be safely called multiple times on same timer since it does memset(0) on the structure, erasing the auxiliary member used by linked list code. This breaks inactive timer list in common/timer.c. Moved resume_timer initialisation to ns16550_init_postirq, so it's only done once. Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 9e2c5938246546a5b3f698b7421640d85602b994 master date: 2013-08-28 10:18:39 +0200
* x86/Intel: add support for Haswell CPU modelsJan Beulich2013-09-063-3/+10
| | | | | | | | | ... according to their most recent public documentation. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 3e787021fb2420851c7bdc3911ea53c728ba5ac0 master date: 2013-08-27 11:15:15 +0200
* pygrub: add Debian extlinux.conf pathIan Campbell2013-09-031-0/+1
| | | | | | | | | | This is Debian bug #697407. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=697407 Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> (cherry picked from commit 258d27a1d9fb33a490bef1381f52d522225c3dca)
* oxenstored: Protect oxenstored from malicious domains.John Liu2013-09-035-6/+31
| | | | | | | | | | | | | | | | add check logic when read from IO ring, and if error happens, then mark the reading connection as "bad", Unless vm reboot, oxenstored will not handle message from this connection any more. xs_ring_stubs.c: add a more strict check on ring reading connection.ml, domain.ml: add getter and setter for bad flag process.ml: if exception raised when reading from domain's ring, mark this domain as "bad" xenstored.ml: if a domain is marked as "bad", do not handle it. Signed-off-by: John Liu <john.liuqiming@huawei.com> Acked-by: David Scott <dave.scott@eu.citrix.com> (cherry picked from commit 704302ce9404c73cfb687d31adcf67094ab5bb53)
* Nested VMX: Update APIC-v(RVI/SVI) when vmexit to L1Yang Zhang2013-08-278-17/+62
| | | | | | | | | | | | | | | | | | | | | If enabling APIC-v, all interrupts to L1 are delivered through APIC-v. But when L2 is running, external interrupt will casue L1 vmexit with reason external interrupt. Then L1 will pick up the interrupt through vmcs12. when L1 ack the interrupt, since the APIC-v is enabled when L1 is running, so APIC-v hardware still will do vEOI updating. The problem is that the interrupt is delivered not through APIC-v hardware, this means SVI/RVI/vPPR are not setting, but hardware required them when doing vEOI updating. The solution is that, when L1 tried to pick up the interrupt from vmcs12, then hypervisor will help to update the SVI/RVI/vPPR to make sure the following vEOI updating and vPPR updating corrently. Also, since interrupt is delivered through vmcs12, so APIC-v hardware will not cleare vIRR and hypervisor need to clear it before L1 running. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com> master commit: 84e6af58707520baf59c1c86c29237419e439afb master date: 2013-08-22 10:59:01 +0200
* Nested VMX: Clear APIC-v control bit in vmcs02Yang Zhang2013-08-271-0/+12
| | | | | | | | | | There is no vAPIC-v support, so mask APIC-v control bit when constructing vmcs02. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com> master commit: 375a1035002fb257087756a86e6caeda649fc0f1 master date: 2013-08-22 10:52:05 +0200
* Nested VMX: Force check ISR when L2 is runningYang Zhang2013-08-271-1/+3
| | | | | | | | | | | | | | External interrupt is allowed to notify CPU only when it has higher priority than current in servicing interrupt. With APIC-v, the priority comparing is done by hardware and hardware will inject the interrupt to VCPU when it recognizes an interrupt. Currently, there is no virtual APIC-v feature available for L1 to use, so when L2 is running, we still need to compare interrupt priority with ISR in hypervisor instead via hardware. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com> master commit: b35d0a26983843c092bfa353fd6b9aa8c3bf4886 master date: 2013-08-22 10:50:13 +0200
* Nested VMX: Check whether interrupt is blocked by TPRYang Zhang2013-08-271-0/+5
| | | | | | | | | | If interrupt is blocked by L1's TPR, L2 should not see it and keep running. Adding the check before L2 to retrive interrupt. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com> master commit: 7fb5c6b9ef22915e3fcac95cd44857f4457ba783 master date: 2013-08-22 10:49:24 +0200
* VT-d: warn about Compatibility Format Interrupts being enabled by firmwareJan Beulich2013-08-271-6/+10
| | | | | | | | | | | ... as being insecure. Also drop the second (redundant) read DMAR_GSTS_REG from enable_intremap(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by Xiantao Zhang <xiantao.zhang@intel.com> master commit: c9c6abab583d27fdca1d979a7f1d18ae30f54e9b master date: 2013-08-21 16:44:58 +0200
* ACPI: fix acpi_os_map_memory()Jan Beulich2013-08-273-12/+21
| | | | | | | | | | | | | | | | | | | | | | | | | It using map_domain_page() was entirely wrong. Use __acpi_map_table() instead for the time being, with locking added as the mappings it produces get replaced with subsequent invocations. Using locking in this way is acceptable here since the only two runtime callers are acpi_os_{read,write}_memory(), which don't leave mappings pending upon returning to their callers. Also fix __acpi_map_table()'s first parameter's type - while benign for unstable, backports to pre-4.3 trees will need this. Signed-off-by: Jan Beulich <jbeulich@suse.com> ACPI: use ioremap() in acpi_os_map_memory() This drops the post-boot use of __acpi_map_table() here again (together with the somewhat awkward locking), in favor of using ioremap(). Signed-off-by: Jan Beulich <jbeulich@suse.com> master commit: 2ee9cbf9d8eaeff6e21222905d22dbd58dc5fe29 master date: 2013-08-21 08:38:40 +0200 master commit: c5ba8ed4c6f005d332a49d93a3ef8ff2b690b256 master date: 2013-08-21 08:40:22 +0200