aboutsummaryrefslogtreecommitdiffstats
path: root/xen/arch/x86/domain.c
Commit message (Collapse)AuthorAgeFilesLines
* x86: use {rd,wr}{fs,gs}base when availableJan Beulich2013-10-111-2/+2
| | | | | | | | | | | | ... as being intended to be faster than MSR reads/writes. In the case of emulate_privileged_op() also use these in favor of the cached (but possibly stale) addresses from arch.pv_vcpu. This allows entirely removing the code that was the subject of XSA-67. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: correct LDT checksJan Beulich2013-10-111-14/+6
| | | | | | | | | | | | | | | | | | | | | | - MMUEXT_SET_LDT should behave as similarly to the LLDT instruction as possible: fail only if the base address is non-canonical - instead LDT descriptor accesses should fault if the descriptor address ends up being non-canonical (by ensuring this we at once avoid reading an entry from the mach-to-phys table and consider it a page table entry) - fault propagation on using LDT selectors must distinguish #PF and #GP (the latter must be raised for a non-canonical descriptor address, which also applies to several other uses of propagate_page_fault(), and hence the problem is being fixed there) - map_ldt_shadow_page() should properly wrap addresses for 32-bit VMs At once remove the odd invokation of map_ldt_shadow_page() from the MMUEXT_SET_LDT handler: There's nothing really telling us that the first LDT page is going to be preferred over others. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: add missing va_end to hypercall_xlat_continuationMatthew Daley2013-09-101-0/+4
| | | | | | | Coverity-ID: 1056208 Signed-off-by: Matthew Daley <mattjd@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/xsave: fix migration from xsave-capable to xsave-incapable hostJan Beulich2013-09-091-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CPUID features suitably masked this is supposed to work, but was completely broken (i.e. the case wasn't even considered when the original xsave save/restore code was written). First of all, xsave_enabled() wrongly returned the value of cpu_has_xsave, i.e. not even taking into consideration attributes of the vCPU in question. Instead this function ought to check whether the guest ever enabled xsave support (by writing a [non-zero] value to XCR0). As a result of this, a vCPU's xcr0 and xcr0_accum must no longer be initialized to XSTATE_FP_SSE (since that's a valid value a guest could write to XCR0), and the xsave/xrstor as well as the context switch code need to suitably account for this (by always enforcing at least this part of the state to be saved/loaded). This involves undoing large parts of c/s 22945:13a7d1f7f62c ("x86: add strictly sanity check for XSAVE/XRSTOR") - we need to cleanly distinguish between hardware capabilities and vCPU used features. Next both HVM and PV save code needed tweaking to not always save the full state supported by the underlying hardware, but just the parts that the guest actually used. Similarly the restore code should bail not just on state being restored that the hardware cannot handle, but also on inconsistent save state (inconsistent XCR0 settings or size of saved state not in line with XCR0). And finally the PV extended context get/set code needs to use slightly different logic than the HVM one, as here we can't just key off of xsave_enabled() (i.e. avoid doing anything if a guest doesn't use xsave) because the tools use this function to determine host capabilities as well as read/write vCPU state. The set operation in particular needs to be capable of cleanly dealing with input that consists of only the xcr0 and xcr0_accum values (if they're both zero then no further data is required). While for things to work correctly both sides (saving _and_ restoring host) need to run with the fixed code, afaict no breakage should occur if either side isn't up to date (other than the breakage that this patch attempts to fix). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/xsave: adjust state managementJan Beulich2013-07-021-0/+4
| | | | | | | | | | | | | | | | | | The initial state for a vCPU is using default values, so there's no need to force the XRSTOR to read the state from memory. This saves a couple of thousand restores from memory just during boot of Linux on my Sandy Bridge system (I didn't try to make further measurements). The above requires that arch_set_info_guest() updates the state flags in the save area when valid floating point state got passed in, but that would really have been needed even before in case XSAVE{,OPT} decided to clear one or both of the FP and SSE bits. Furthermore, hvm_vcpu_reset_state() shouldn't just clear out the FPU/ SSE area, but needs to re-initialized MXCSR and FCW. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: fix page refcount handling in page table pin error pathJan Beulich2013-06-261-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | In the original patch 7 of the series addressing XSA-45 I mistakenly took the addition of the call to get_page_light() in alloc_page_type() to cover two decrements that would happen: One for the PGT_partial bit that is getting set along with the call, and the other for the page reference the caller hold (and would be dropping on its error path). But of course the additional page reference is tied to the PGT_partial bit, and hence any caller of a function that may leave ->arch.old_guest_table non-NULL for error cleanup purposes has to make sure a respective page reference gets retained. Similar issues were then also spotted elsewhere: In effect all callers of get_page_type_preemptible() need to deal with errors in similar ways. To make sure error handling can work this way without leaking page references, a respective assertion gets added to that function. This is CVE-2013-1432 / XSA-58. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
* x86: fix XCR0 handlingJan Beulich2013-06-041-2/+3
| | | | | | | | | | | | | | - both VMX and SVM ignored the ECX input to XSETBV - both SVM and VMX used the full 64-bit RAX when calculating the input mask to XSETBV - faults on XSETBV did not get recovered from Also consolidate the handling for PV and HVM into a single function, and make the per-CPU variable "xcr0" static to xstate.c. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
* x86: re-enable VCPUOP_register_vcpu_time_memory_areaJan Beulich2013-05-271-6/+3
| | | | | | | | | | By moving the call to update_vcpu_system_time() out of schedule() into arch-specific context switch code, the original problem of the function accessing the wrong domain's address space goes away (obvious even from patch context, as update_runstate_area() does similar copying). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: move VCPUOP_register_runstate_memory_area to common codeStefano Stabellini2013-05-081-28/+0
| | | | | Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: move VCPUOP_register_vcpu_info to common codeStefano Stabellini2013-05-081-113/+0
| | | | | | | | | | | | | | | | Move the implementation of VCPUOP_register_vcpu_info from x86 specific to commmon code. Move vcpu_info_mfn from an arch specific vcpu sub-field to the common vcpu struct. Move the initialization of vcpu_info_mfn to common code. Move unmap_vcpu_info and the call to unmap_vcpu_info at domain destruction time to common code. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: cleanup after making various page table manipulation operations preemptibleJan Beulich2013-05-021-1/+1
| | | | | | | | | | | | This drops the "preemptible" parameters from various functions where now they can't (or shouldn't, validated by assertions) be run in non- preemptible mode anymore, to prove that manipulations of at least L3 and L4 page tables and page table entries are now always preemptible, i.e. the earlier patches actually fulfill their purpose of fixing the resulting security issue. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* x86: make arch_set_info_guest() preemptibleJan Beulich2013-05-021-52/+61
| | | | | | | | | | .. as the root page table validation (and the dropping of an eventual old one) can require meaningful amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* x86: make vcpu_reset() preemptibleJan Beulich2013-05-021-7/+6
| | | | | | | | | | ... as dropping the old page tables may take significant amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* x86: make vcpu_destroy_pagetables() preemptibleJan Beulich2013-05-021-56/+4
| | | | | | | | | | | | | | | ... as it may take significant amounts of time. The function, being moved to mm.c as the better home for it anyway, and to avoid having to make a new helper function there non-static, is given a "preemptible" parameter temporarily (until, in a subsequent patch, its other caller is also being made capable of dealing with preemption). This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* x86: call unmap_vcpu_info() regardless of guest typeJan Beulich2013-05-021-2/+4
| | | | | | | | This fixes a regression from 63753b3e ("x86: allow VCPUOP_register_vcpu_info to work again on PVHVM guests"). Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
* x86: allow VCPUOP_register_vcpu_info to work again on PVHVM guestsKonrad Rzeszutek Wilk2013-04-171-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For details on the hypercall please see commit c58ae69360ccf2495a19bf4ca107e21cf873c75b (VCPUOP_register_vcpu_info) and the c/s 23143 (git commit 6b063a4a6f44245a727aa04ef76408b2e00af9c7) (x86: move pv-only members of struct vcpu to struct pv_vcpu) that introduced the regression. The current code allows the PVHVM guest to make this hypercall. But for PVHVM guest it always returns -EINVAL (-22) for Xen 4.2 and above. Xen 4.1 and earlier worked. The reason is that the check in map_vcpu_info would fail at: if ( v->arch.vcpu_info_mfn != INVALID_MFN ) The reason is that the vcpu_info_mfn for PVHVM guests ends up by defualt with the value of zero (introduced by c/s 23143). The code in vcpu_initialise which initialized vcpu_info_mfn to a valid value (INVALID_MFN), would never be called for PVHVM: if ( is_hvm_domain(d) ) { rc = hvm_vcpu_initialise(v); goto done; } v->arch.pv_vcpu.vcpu_info_mfn = INVALID_MFN; while previously it would be: v->arch.vcpu_info_mfn = INVALID_MFN; [right at the start of the function in Xen 4.1] This fixes the problem with Linux advertising this error: register_vcpu_info failed: err=-22 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* x86/VPMU: Save/restore VPMU only when necessaryBoris Ostrovsky2013-04-151-2/+12
| | | | | | | | | | | | | VPMU doesn't need to always be saved during context switch. If we are comming back to the same processor and no other VPCU has run here we can simply continue running. This is especailly useful on Intel processors where Global Control MSR is stored in VMCS, thus not requiring us to stop the counters during save operation. On AMD we need to explicitly stop the counters but we don't need to save them. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
* vpmu intel: Dump vpmu infos in 'q' keyhandlerDietmar Hahn2013-04-081-0/+3
| | | | | | | | | | | | | | | | | This patch extends the printout of the VPCU infos of the keyhandler 'q'. If vPMU is enabled is on the VCPU and active lines are printed like (when running HVM openSuSE-12.3 with 'perf top'); (XEN) vPMU running (XEN) general_0: 0x000000ffffff3ae1 ctrl: 0x000000000053003c (XEN) fixed_1: 0x000000ff90799188 ctrl: 0xb This means general counter 0 and fixed counter 1 are running with showing their contents and the contents of their configuration msr. Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* x86: use linear L1 page table for map_domain_page() page table manipulationJan Beulich2013-02-281-3/+0
| | | | | | | | This saves allocation of a Xen heap page for tracking the L1 page table pointers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: introduce create_perdomain_mapping()Jan Beulich2013-02-281-61/+31
| | | | | | | | | | | | | ... as well as free_perdomain_mappings(), and use them to carry out the existing per-domain mapping setup/teardown. This at once makes the setup of the first sub-range PV domain specific (with idle domains also excluded), as the GDT/LDT mapping area is needed only for those. Also fix an improperly scaled BUILD_BUG_ON() expression in mapcache_domain_init(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-211-1/+1
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* xmalloc: make close-to-PAGE_SIZE allocations more efficientJan Beulich2013-02-191-22/+11
| | | | | | | | | | | | | | | | | | | | | Rather than bumping their sizes to slightly above (a multiple of) PAGE_SIZE (in order to store tracking information), thus requiring a non-order-0 allocation even when no more than a page is being requested, return the result of alloc_xenheap_pages() directly, and use the struct page_info field underlying PFN_ORDER() to store the actual size (needed for freeing the memory). This leverages the fact that sub-allocation of memory obtained from the page allocator can only ever result in non-page-aligned memory chunks (with the exception of zero size allocations with sufficiently high alignment being requested, which is why zero-size allocations now get special cased). Use the new property to simplify allocation of the trap info array for PV guests on x86. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: fix map_domain_page() leak from vcpu_destroy_pagetables()Jan Beulich2013-02-131-0/+1
| | | | | | | Introduced by c/s 26450:4816763549e0 and exposed with 26523:fd997a96d448. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* x86: properly use map_domain_page() during domain creation/destructionJan Beulich2013-01-231-44/+51
| | | | | | | | This involves no longer storing virtual addresses of the per-domain mapping L2 and L3 page tables. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: consolidate initialization of PV guest L4 page tablesJan Beulich2013-01-231-7/+2
| | | | | | | So far this has been repeated in 3 places, requiring to remember to update all of them if a change is being made. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* x86: re-introduce map_domain_page() et alJan Beulich2013-01-231-8/+17
| | | | | | | | | | | | | | | | | | This is being done mostly in the form previously used on x86-32, utilizing the second L3 page table slot within the per-domain mapping area for those mappings. It remains to be determined whether that concept is really suitable, or whether instead re-implementing at least the non-global variant from scratch would be better. Also add the helpers {clear,copy}_domain_page() as well as initial uses of them. One question is whether, to exercise the non-trivial code paths, we shouldn't make the trivial shortcuts conditional upon NDEBUG being defined. See the debugging patch at the end of the series. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* miscellaneous cleanupJan Beulich2013-01-171-2/+2
| | | | | | | | | | | | | | | | | ... noticed while putting together the 16Tb support patches for x86. Briefly, this (in order of the changes below) - fixes an inefficiency in x86's context switch code (translations to/ from struct page are more involved than to/from MFNs) - drop unnecessary MFM-to-page conversions - drop a redundant call to destroy_xen_mappings() (an indentical call is being made a few lines up) - simplify a VA-to-MFN translation - drop dead code (several occurrences) - add a missing __init annotation Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: consistently mask floating point exceptionsJan Beulich2013-01-161-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | c/s 23142:f5e8d152a565 resulted in v->arch.fpu_ctxt to point into the save area allocated for xsave/xrstor (when they're available). The way vcpu_restore_fpu_lazy() works (using fpu_init() for an uninitialized vCPU only when there's no xsave support) causes this to load whatever arch_set_info_guest() put there, irrespective of whether the i387 state was specified to be valid in the respective input structure. Consequently, with a cleared (al zeroes) incoming FPU context, and with xsave available, one gets all exceptions unmasked (as opposed to to the legacy case, where FINIT and LDMXCSR get used, masking all exceptions). This causes e.g. para-virtualized NetWare to crash. The behavior of arch_set_info_guest() is thus being made more hardware- like for the FPU portion of it: Considering it to be similar to INIT, it will leave untouched all floating point state now. An alternative would be to make the behavior RESET-like, forcing all state to known values, albeit - taking into account legacy behavior - not to precisely the values RESET would enforce (which masks only SSE exceptions, but not x87 ones); that would come closest to mimicing FINIT behavior in the xsave case. Another option would be to continue copying whatever was provided, but override (at least) FCW and MXCSR if VGCF_I387_VALID isn't set. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: save/restore only partial register state where possibleJan Beulich2012-10-301-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | ... and make restore conditional not only upon having saved the state, but also upon whether saved state was actually modified (and register values are known to have been preserved). Note that RBP is unconditionally considered a volatile register (i.e. irrespective of CONFIG_FRAME_POINTER), since the RBP handling would become overly complicated due to the need to save/restore it on the compat mode hypercall path [6th argument]. Note further that for compat mode code paths, saving/restoring R8...R15 is entirely unnecessary - we don't allow those guests to enter 64-bit mode, and hence they have no way of seeing these registers' contents (and there consequently also is no information leak, except if the context saving domctl would be considered such). Finally, note that this may not properly deal with gdbstub's needs, yet (but if so, I can't really suggest adjustments, as I don't know that code). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when appropriateStefano Stabellini2012-10-171-1/+1
| | | | | | | | | | | | Note: these changes don't make any difference on x86. Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as an hypercall argument. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* x86: replace literal numbersJan Beulich2012-09-281-5/+6
| | | | | | | | | In various cases, 256 was being used instead of NR_VECTORS or a derived ARRAY_SIZE() expression. In one case (guest_has_trap_callback()), a wrong (unrelated) constant was used instead of NR_VECTORS. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: vMCE emulationLiu, Jinsong2012-09-261-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides virtual MCE support to guest. It emulates a simple and clean MCE MSRs interface to guest by faking caps to guest if needed and masking caps if unnecessary: 1. Providing a well-defined MCG_CAP to guest, filter out un-necessary caps and provide only guest needed caps; 2. Disabling MCG_CTL to avoid model specific; 3. Sticking all 1's to MCi_CTL to guest to avoid model specific; 4. Enabling CMCI cap but never really inject to guest to prevent polling periodically; 5. Masking MSCOD field of MCi_STATUS to avoid model specific; 6. Keeping natural semantics by per-vcpu instead of per-domain variables; 7. Using bank1 and reserving bank0 to work around 'bank0 quirk' of some very old processors; 8. Cleaning some vMCE# injection logic which shared by Intel and AMD but useless under new vMCE implement; 9. Keeping compatilbe w/ old xen version which has been backported to SLES11 SP2, so that old vMCE would not blocked when migrate to new vMCE; Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> - make printing consistent (and non-exploitable) - fix return values of intel_mce_{rd,wr}msr() for out of range banks - miscellaneous cleanup Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* x86/S3: add cache flush on secondary CPUs before going to sleepBen Guthro2012-09-251-2/+8
| | | | | | | | | | | | | | | | | | Secondary CPUs, between doing their final memory writes (particularly updating cpu_initialized) and getting a subsequent INIT, may not write back all modified data. The INIT itself then causes those modifications to be lost, so in the cpu_initialized case the CPU would find itself already initialized, (intentionally) entering an infinite loop instead of actually coming online. Signed-off-by: Ben Guthro <ben@guthro.net> Make acpi_dead_idle() call default_dead_idle() rather than duplicating the logic there. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
* x86: Remove CONFIG_COMPAT ifdef'ery from arch/x86 -- it is always defined.Keir Fraser2012-09-121-20/+0
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86: We can assume CONFIG_PAGING_LEVELS==4.Keir Fraser2012-09-121-7/+2
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* xen: Remove x86_32 build target.Keir Fraser2012-09-121-96/+11
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86-64: refine the XSA-9 fixJan Beulich2012-08-201-0/+15
| | | | | | | | | | | | | | Our product management wasn't happy with the "solution" for XSA-9, and demanded that customer systems must continue to boot. Rather than having our and perhaps other distros carry non-trivial patches, allow for more fine grained control (panic on boot, deny guest creation, or merely warn) by means of a single line change. Also, as this was found to be a problem with remotely managed systems, don't default to boot denial (just deny guest creation). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: don't allow non-canonical addresses to be set for any callbackJan Beulich2012-06-181-0/+12
| | | | | | | | | | Rather than deferring the detection of these to the point where they get actually used (the fix for XSA-7, 25480:76eaf5966c05, causing a #GP to be raised by IRET, which invokes the guest's [fragile] fail-safe callback), don't even allow such to be set. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Use get_page_from_gfn() instead of get_gfn()/put_gfn.Tim Deegan2012-05-171-42/+43
| | | | Signed-off-by: Tim Deegan <tim@xen.org>
* x86: restore vcpu_destroy_pagetables() call on HVM domain teardown.Tim Deegan2012-04-201-3/+4
| | | | | | | | | HVM vcpus that are using shadow pagetables have valid guest_table fields, which need to be tidied up on domain teardown. Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Tim Deegan <tim@xen.org>
* x86/mm/sharing: Clean ups for relinquishing shared pages on destroyAndres Lagar-Cavilla2012-04-181-2/+14
| | | | | | | | | | | | | | | | | | | When a domain is destroyed, its pages are freed in relinquish_resources in a preemptible mode, in the context of a synchronous domctl. P2m entries pointing to shared pages are, however, released during p2m cleanup in an RCU callback, and in non-preemptible mode. This is an O(n) operation for a very large n, which may include actually freeing shared pages for which the domain is the last holder. To improve responsiveness, move this operation to the preemtible portion of domain destruction, during the synchronous domain_kill hypercall. And remove the bulk of the work from the RCU callback. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
* passthrough: release assigned PCI devices earlier during domain shutdownJan Beulich2012-02-241-1/+2
| | | | | | | | | | | At least with xend, where there's not even a tool stack side attempt to de-assign devices during domain shutdown, this allows immediate re- starts of a domain to work reliably. (There's no apparent reason why c/s 18010:c1577f094ae4 chose to put this in the asynchronous part of domain destruction). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/vMCE: save/restore MCA capabilitiesJan Beulich2012-02-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows migration to a host with less MCA banks than the source host had, while without this patch accesses to the excess banks' MSRs caused #GP-s in the guest after migration (and it depended on the guest kernel whether this would be fatal). A fundamental question is whether we should also save/restore MCG_CTL and MCi_CTL, as the HVM save record would better be defined to the complete state that needs saving from the beginning (I'm unsure whether the save/restore logic allows for future extension of an existing record). Of course, this change is expected to make migration from new to older Xen impossible (again I'm unsure what the save/restore logic does with records it doesn't even know about). The (trivial) tools side change may seem unrelated, but the code should have been that way from the beginning to allow the hypervisor to look at currently unused ext_vcpucontext fields without risking to read garbage when those fields get a meaning assigned in the future. This isn't being enforced here - should it be? (Obviously, for backwards compatibility, the hypervisor must assume these fields to be clear only when the extended context's size exceeds the old original one.) A future addition to this change might be to allow configuration of the number of banks and other MCA capabilities for a guest before it starts (i.e. to not inherits the values seen on the first host it runs on). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: introduce PHYSDEVOP_pirq_eoi_gmfn_v2Stefano Stabellini2012-01-281-0/+1
| | | | | | | | | | | | | | | | | | | PHYSDEVOP_pirq_eoi_gmfn changes the semantics of PHYSDEVOP_eoi. In order to improve the interface this patch: - renames PHYSDEVOP_pirq_eoi_gmfn to PHYSDEVOP_pirq_eoi_gmfn_v1; - introduces PHYSDEVOP_pirq_eoi_gmfn_v2, that is like PHYSDEVOP_pirq_eoi_gmfn_v1 but it doesn't modify the behaviour of another hypercall; - bump __XEN_LATEST_INTERFACE_VERSION__; - #define PHYSDEVOP_pirq_eoi_gmfn to PHYSDEVOP_pirq_eoi_gmfn_v1 or PHYSDEVOP_pirq_eoi_gmfn_v2 depending on the __XEN_INTERFACE_VERSION. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* x86/hvm: No need to arch_set_info_guest() before restoring per-vcpu HVM state.Keir Fraser2012-01-221-27/+18
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* move declarations of some required per-arch functions into common headersJan Beulich2012-01-121-1/+1
| | | | | | | | ... since it is pointless to have each arch declare them on their own (and now and the - see ia64 - forget to do so). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: reduce scope of some symbols used with reset_stack_and_jump()Jan Beulich2011-12-211-12/+12
| | | | | | | | | By making the macro properly advertise the use of the input symbol to the compiler, it is no longer necessary for them to be global if they're defined and used in just one source file. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Modify naming of queries into the p2mAndres Lagar-Cavilla2011-11-111-6/+21
| | | | | | | | | | | | | | | | | | | | | | Callers of lookups into the p2m code are now variants of get_gfn. All callers need to call put_gfn. The code behind it is a no-op at the moment, but will change to proper locking in a later patch. This patch does not change functionality. Only naming, and adds put_gfn's. set_p2m_entry retains its name because it is always called with p2m_lock held. This patch is humongous, unfortunately, given the dozens of call sites involved. After this patch, anyone using old style gfn_to_mfn will not succeed in compiling their code. This is on purpose: adapt to the new API. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* Hypercall continuation cancelation in compat mode for XENMEM_get/set_pod_targetJean Guyader2011-11-111-0/+18
| | | | | | | | | | | If copy_to_guest failed in the compat code after a continuation as been done in the native code we need to cancel it so we won't reexecute the hypercall but return from the hypercall with the appropriate error. Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
* eliminate cpus_xyz()Jan Beulich2011-11-081-2/+3
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>