xen/xen - xen

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86/HVM: cache emulated instruction for retry processing	Jan Beulich	2013-10-14	1	-14/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than re-reading the instruction bytes upon retry processing, stash away and re-use what we already read. That way we can be certain that the retry won't do something different from what requested the retry, getting once again closer to real hardware behavior (where what we use retries for is simply a bus operation, not involving redundant decoding of instructions). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86/HVM: properly deal with hvm_copy_*_guest_phys() errors	Jan Beulich	2013-10-14	2	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In memory read/write handling the default case should tell the caller that the operation cannot be handled rather than the operation having succeeded, so that when new HVMCOPY_* states get added not handling them explicitly will not result in errors being ignored. In task switch emulation code stop handling some errors, but not others. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86/HVM: don't ignore hvm_copy_to_guest_phys() errors during I/O intercept	Jan Beulich	2013-10-14	1	-13/+107
\| \| \| \| \| \| \| \| \|	Building upon the extended retry logic we can now also make sure to not ignore errors resulting from writing data back to guest memory. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86/HVM: fix direct PCI port I/O emulation retry and error handling	Jan Beulich	2013-10-14	2	-18/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dpci_ioport_{read,write}() guest memory access failure handling should be modelled after process_portio_intercept()'s (and others): Upon encountering an error on other than the first iteration, the count successfully handled needs to be stored and X86EMUL_OKAY returned, in order for the generic instruction emulator to update register state correctly before reporting failure or retrying (both of which would only happen after re-invoking emulation). Further we leverage (and slightly extend, due to the above mentioned need to return X86EMUL_OKAY) the "large MMIO" retry model. Note that there is still a special case not explicitly taken care of here: While the first retry on the last iteration of a "rep ins" correctly recovers the already read data, an eventual subsequent retry is being handled by the pre-existing mmio-large logic (through hvmemul_do_io() storing the [recovered] data [again], also taking into consideration that the emulator converts a single iteration "ins" to ->read_io() plus ->write()). Also fix an off-by-one in the mmio-large-read logic, and slightly simplify the copying of the data. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86/HVM: properly handle backward string instruction emulation	Jan Beulich	2013-10-14	3	-44/+23
\| \| \| \| \| \| \| \| \| \| \|	Multiplying a signed 32-bit quantity with an unsigned 32-bit quantity produces an unsigned 32-bit result, yet for emulation of backward string instructions we need the result sign extended before getting added to the base address. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	hvm/vidirian: Avoid printing page_to_mfn(NULL) on error paths	Andrew Cooper	2013-10-09	2	-18/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While working in the viridian code, I noticed that 4cb6c4f4941 "x86/hvm: Use get_page_from_gfn() instead of get_gfn()/put_gfn." introduced two error paths where page_to_mfn(NULL) would be formatted and presented as a bad MFN. This provides junk in the warning rather than something useful. These two codepaths are fixed up to match their counterpart in wrmsr_hypervisor_regs() While auditing the other changes from 4cb6c4f4941, I noticed a small optimisation which could be made by changing the order of the validity checks to remove 6 NULL pointer checks. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86/HPET: basic cleanup	Andrew Cooper	2013-10-08	1	-9/+9
\| \| \| \| \| \| \| \| \|	* Strip trailing whitespace * Remove redundant definitions * Update stale documentation links * Move hpet_address into __initdata Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
*	x86: allow HVM guests to make console_io hypercall	Konrad Rzeszutek Wilk	2013-10-04	1	-0/+2
\| \| \| \| \| \| \| \| \|	The console_io hypercall is provided for PV guests and for HVM guests it is done via the 0xe9 port. However the PV hypercall is more efficient as it takes a string rather than one character per write. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	x86: make hvm_cpuid() tolerate NULL pointers	Jan Beulich	2013-10-04	3	-19/+29
\| \| \| \| \| \| \| \| \| \| \|	Now that other HVM code started making more extensive use of hvm_cpuid(), let's not force every caller to declare dummy variables for output not cared about. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
*	Nested VMX: fix IA32_VMX_CR4_FIXED1 msr emulation	Yang Zhang	2013-10-04	1	-4/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, it use hardcode value for IA32_VMX_CR4_FIXED1. This is wrong. We should check guest's cpuid to know which bits are writeable in CR4 by guest and allow the guest to set the corresponding bit only when guest has the feature. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Cleanup. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
*	VMX: clean up capability checks	Jan Beulich	2013-10-04	1	-17/+27
\| \| \| \| \| \| \| \| \| \| \| \|	VMCS size validation on APs should check against BP's size. No need for a separate cpu_has_vmx_ins_outs_instr_info variable anymore. Use proper symbolics. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
*	Nested VMX: check VMX capability before read VMX related MSRs	Yang Zhang	2013-10-04	2	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	VMX MSRs only available when the CPU support the VMX feature. In addition, VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Cleanup. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
*	x86: properly handle hvm_copy_from_guest_{phys,virt}() errors	Jan Beulich	2013-09-30	4	-31/+66
\| \| \| \| \| \| \| \| \| \| \| \|	Ignoring them generally implies using uninitialized data and, in all but two of the cases dealt with here, potentially leaking hypervisor stack contents to guests. This is CVE-2013-4355 / XSA-63. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
*	Nested VMX: Expose unrestricted guest feature to guest	Yang Zhang	2013-09-30	2	-1/+5
\| \| \| \| \| \| \| \|	With virtual unrestricted guest feature, L2 guest is allowed to run with PG cleared. Also, allow PAE not set during virtual vmexit emulation. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: Eddie.Dong@intel.com
*	VMX: also use proper instruction mnemonic for VMREAD	Jan Beulich	2013-09-23	6	-148/+233
\| \| \| \| \| \| \| \| \| \| \|	... when assembler supports it, following commit cfd54835 ("VMX: use proper instruction mnemonics if assembler supports them"). This merely got split off from the earlier change becase of the significant number of call sites needing to be changed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
*	x86/HVM: refuse doing string operations in certain situations	Jan Beulich	2013-09-23	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	We shouldn't do any acceleration for - "rep movs" when either side is passed through MMIO or when both sides are handled by qemu - "rep ins" and "rep outs" when the memory operand is any kind of MMIO Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86/HVM: linear address must be canonical for the whole accessed range	Jan Beulich	2013-09-23	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \|	... rather than just for the first byte. While at it, also - make the real mode case at least dpo a wrap around check - drop the mis-named "gpf" label (we're not generating faults here) and use in-place returns instead Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86/HVM: properly handle MMIO reads and writes wider than a machine word	Jan Beulich	2013-09-20	1	-20/+95
\| \| \| \| \| \| \| \| \|	Just like real hardware we ought to split such accesses transparently to the caller. With little extra effort we can at once even handle page crossing accesses correctly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	VMX: fix failure path in construct_vmcs	George Dunlap	2013-09-18	1	-0/+3
\| \| \| \| \| \| \| \| \|	If the allocation fails, make sure to call vmx_vmcs_exit(). This is a candidate for backport. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
*	x86/HVM: fix failure path in hvm_vcpu_initialise	George Dunlap	2013-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It looks like one of the failure cases in hvm_vcpu_initialise jumps to the wrong label; this could lead to slow leaks if something isn't cleaned up properly. I will probably change these labels in a future patch, but I figured it was better to have this fix separately. This is also a candidate for backport. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
*	hvm/vpmu: Prevent dump handlers from incorrectly mutating state	Andrew Cooper	2013-09-16	2	-15/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Discovered by Coverity, CID 1055181 core2_vpmu_dump() was incorrectly setting VPMU_CONTEXT_LOADED when it was intending to check for it. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> This would have been avoided if the dump function declared all its pointers "const" - doing this now (also in SVM). Also fixing some indentation issues at once. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
*	Nested VMX: Clear bit 31 of IA32_VMX_BASIC MSR	Yang Zhang	2013-09-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The bit 31 of revision_id will set to 1 if vmcs shadowing enabled. And according intel SDM, the bit 31 of IA32_VMX_BASIC MSR is always 0. So we cannot set low 32 bit of IA32_VMX_BASIC to revision_id directly. Must clear the bit 31 to 0. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
*	console: buffer and show origin of guest PV writes	Daniel De Graaf	2013-09-10	1	-17/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Guests other than domain 0 using the console output have previously been controlled by the VERBOSE #define, but with no designation of which guest's output was on the console. This patch converts the HVM output buffering to be used by all domains except the hardware domain (dom0): stripping non-printable characters, line buffering the output, and prefixing it with the domain ID. This is especially useful for debugging stub domains during early boot. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
*	x86/xsave: fix migration from xsave-capable to xsave-incapable host	Jan Beulich	2013-09-09	2	-35/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With CPUID features suitably masked this is supposed to work, but was completely broken (i.e. the case wasn't even considered when the original xsave save/restore code was written). First of all, xsave_enabled() wrongly returned the value of cpu_has_xsave, i.e. not even taking into consideration attributes of the vCPU in question. Instead this function ought to check whether the guest ever enabled xsave support (by writing a [non-zero] value to XCR0). As a result of this, a vCPU's xcr0 and xcr0_accum must no longer be initialized to XSTATE_FP_SSE (since that's a valid value a guest could write to XCR0), and the xsave/xrstor as well as the context switch code need to suitably account for this (by always enforcing at least this part of the state to be saved/loaded). This involves undoing large parts of c/s 22945:13a7d1f7f62c ("x86: add strictly sanity check for XSAVE/XRSTOR") - we need to cleanly distinguish between hardware capabilities and vCPU used features. Next both HVM and PV save code needed tweaking to not always save the full state supported by the underlying hardware, but just the parts that the guest actually used. Similarly the restore code should bail not just on state being restored that the hardware cannot handle, but also on inconsistent save state (inconsistent XCR0 settings or size of saved state not in line with XCR0). And finally the PV extended context get/set code needs to use slightly different logic than the HVM one, as here we can't just key off of xsave_enabled() (i.e. avoid doing anything if a guest doesn't use xsave) because the tools use this function to determine host capabilities as well as read/write vCPU state. The set operation in particular needs to be capable of cleanly dealing with input that consists of only the xcr0 and xcr0_accum values (if they're both zero then no further data is required). While for things to work correctly both sides (saving _and_ restoring host) need to run with the fixed code, afaict no breakage should occur if either side isn't up to date (other than the breakage that this patch attempts to fix). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: allow guest to set/clear MSI-X mask bit (try 2)	Joby Poriyath	2013-09-09	1	-12/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Guest needs the ability to enable and disable MSI-X interrupts by setting the MSI-X control bit, for a passed-through device. Guest is allowed to write MSI-X mask bit only if Xen thinks that mask is clear (interrupts enabled). If the mask is set by Xen (interrupts disabled), writes to mask bit by the guest is ignored. Currently, a write to MSI-X mask bit by the guest is silently ignored. A likely scenario is where we have a 82599 SR-IOV nic passed through to a guest. From the guest if you do ifconfig <ETH_DEV> down ifconfig <ETH_DEV> up the interrupts remain masked. On VF reset, the mask bit is set by the controller. At this point, Xen is not aware that mask is set. However, interrupts are enabled by VF driver by clearing the mask bit by writing directly to BAR3 region containing the MSI-X table. From dom0, we can verify that interrupts are being masked using 'xl debug-keys M'. Initially, guest was allowed to modify MSI-X bit. Later this behaviour was changed. See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
*	x86: Introduce and use GLOBAL() in asm code	Andrew Cooper	2013-09-09	1	-2/+1
\| \| \| \| \| \|	Also clean up some cases of misused/opencoded ENTRY() Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
*	SVM: streamline entry.S code	Jan Beulich	2013-09-09	2	-37/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|	- fix a bogus "test" with zero immediate - move stuff easily/better done in C into C code - re-arrange code paths so that no redundant GET_CURRENT() would remain on the fast paths - move long latency operations earlier - slightly defer disabling global interrupts on the VM entry path Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
*	VMX: use proper instruction mnemonics if assembler supports them	Jan Beulich	2013-09-09	2	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the hex byte emission we were taking away a good part of flexibility from the compiler, as for simplicity reasons these were built using fixed operands. All half way modern build environments would allow using the mnemonics (but we can't disable the hex variants yet, since the binutils around at the time gcc 4.1 got released didn't support these yet). I didn't convert __vmread() yet because that would, just like for __vmread_safe(), imply converting to a macro so that the output operand can be the caller supplied variable rather than an intermediate one. As that would require touching all invocation points of __vmread() (of which there are quite a few), I'd first like to be certain the approach is acceptable; the main question being whether the now conditional code might be considered to cause future maintenance issues, and the second being that of parameter/argument ordering (here I made __vmread_safe() match __vmwrite(), but one could also take the position that read and write should use the inverse order of one another, in line with the actual instruction operands). Additionally I was quite puzzled to find that all the asm()-s involved here have memory clobbers - what are they needed for? Or can they be dropped at least in some cases? Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
*	VMX: streamline entry.S code	Jan Beulich	2013-09-09	2	-62/+33
\| \| \| \| \| \| \| \| \| \| \| \| \|	- move stuff easily/better done in C into C code - re-arrange code paths so that no redundant GET_CURRENT() would remain on the fast paths - move long latency operations earlier - slightly defer disabling interrupts on the VM entry path - use ENTRY() instead of open coding it Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
*	x86/Intel: add support for Haswell CPU models	Jan Beulich	2013-08-27	2	-2/+7
\| \| \| \| \| \| \|	... according to their most recent public documentation. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	VMX: convert EOI exit bitmap to a proper bitmap	Jan Beulich	2013-08-27	2	-32/+19
\| \| \| \| \| \| \| \| \|	... allowing bitmap operations to be used on it, making things consistent with struct pi_desc's pir field, and shrinking overall source code size. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	Revert "interrupts: allow guest to set/clear MSI-X mask bit"	Jan Beulich	2013-08-26	1	-47/+14
\| \| \| \| \|	This reverts commit 54a46bce768033b1c36e25eace15f7abde972389. It's not fully cooked yet.
*	PCI: break MSI-X data out of struct pci_dev_info	Jan Beulich	2013-08-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Considering that a significant share of PCI devices out there (not the least the myriad of CPU-exposed ones) don't support MSI-X at all, and that the amount of data is well beyond a handful of bytes, break this out of the common structure, at once allowing the actual data to be tracked to become architecture specific. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	Correct X2-APIC HVM emulation	Juergen Gross	2013-08-22	1	-0/+1
\| \| \| \| \| \| \| \|	commit 6859874b61d5ddaf5289e72ed2b2157739b72ca5 ("x86/HVM: fix x2APIC APIC_ID read emulation") introduced an error for the hvm emulation of x2apic. Any try to write to APIC_ICR MSR will result in a GP fault. Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
*	Nested VMX: Update APIC-v(RVI/SVI) when vmexit to L1	Yang Zhang	2013-08-22	5	-15/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If enabling APIC-v, all interrupts to L1 are delivered through APIC-v. But when L2 is running, external interrupt will casue L1 vmexit with reason external interrupt. Then L1 will pick up the interrupt through vmcs12. when L1 ack the interrupt, since the APIC-v is enabled when L1 is running, so APIC-v hardware still will do vEOI updating. The problem is that the interrupt is delivered not through APIC-v hardware, this means SVI/RVI/vPPR are not setting, but hardware required them when doing vEOI updating. The solution is that, when L1 tried to pick up the interrupt from vmcs12, then hypervisor will help to update the SVI/RVI/vPPR to make sure the following vEOI updating and vPPR updating corrently. Also, since interrupt is delivered through vmcs12, so APIC-v hardware will not cleare vIRR and hypervisor need to clear it before L1 running. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com>
*	Nested VMX: Clear APIC-v control bit in vmcs02	Yang Zhang	2013-08-22	1	-0/+12
\| \| \| \| \| \| \| \|	There is no vAPIC-v support, so mask APIC-v control bit when constructing vmcs02. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com>
*	Nested VMX: Force check ISR when L2 is running	Yang Zhang	2013-08-22	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	External interrupt is allowed to notify CPU only when it has higher priority than current in servicing interrupt. With APIC-v, the priority comparing is done by hardware and hardware will inject the interrupt to VCPU when it recognizes an interrupt. Currently, there is no virtual APIC-v feature available for L1 to use, so when L2 is running, we still need to compare interrupt priority with ISR in hypervisor instead via hardware. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com>
*	Nested VMX: Check whether interrupt is blocked by TPR	Yang Zhang	2013-08-22	1	-0/+5
\| \| \| \| \| \| \| \|	If interrupt is blocked by L1's TPR, L2 should not see it and keep running. Adding the check before L2 to retrive interrupt. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com>
*	interrupts: allow guest to set/clear MSI-X mask bit	Joby Poriyath	2013-08-20	1	-14/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Guest needs the ability to enable and disable MSI-X interrupts by setting the MSI-X control bit, for a passed-through device. Guest is allowed to write MSI-X mask bit only if Xen thinks that mask is clear (interrupts enabled). If the mask is set by Xen (interrupts disabled), writes to mask bit by the guest is ignored. Currently, a write to MSI-X mask bit by the guest is silently ignored. A likely scenario is where we have a 82599 SR-IOV nic passed through to a guest. From the guest if you do ifconfig <ETH_DEV> down ifconfig <ETH_DEV> up the interrupts remain masked. On VF reset, the mask bit is set by the controller. At this point, Xen is not aware that mask is set. However, interrupts are enabled by VF driver by clearing the mask bit by writing directly to BAR3 region containing the MSI-X table. From dom0, we can verify that interrupts are being masked using 'xl debug-keys M'. Initially, guest was allowed to modify MSI-X bit. Later this behaviour was changed. See changeset 74c213c506afcd74a8556dd092995fd4dc38b225. Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> [jb: add assertion to msixtbl_pt_register()]
*	VMX: add boot parameter to enable/disable APIC-v dynamically	Yang Zhang	2013-08-13	1	-2/+5
\| \| \| \| \| \| \|	Add a boot parameter to enable/disable the APIC-v dynamically. APIC-v is enabled by default. User can use apicv=0 to disable it. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
*	x86/AMD: Inject #GP instead of #UD when unable to map vmcb	Suravee Suthikulpanit	2013-08-13	1	-10/+14
\| \| \| \| \| \| \| \| \| \|	According to AMD Programmer's Manual vol2, vmrun, vmsave and vmload should inject #GP instead of #UD when unable to access memory location for vmcb. Also, the code should make sure that L1 guest EFER.SVME is not zero. Otherwise, #UD should be injected. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Tim Deegan <tim@xen.org>
*	x86/AMD: Fix nested svm crash due to assertion in __virt_to_maddr	Suravee Suthikulpanit	2013-08-13	1	-9/+43
\| \| \| \| \| \| \| \| \|	Fix assertion in __virt_to_maddr when starting nested SVM guest in debug mode. Investigation has shown that svm_vmsave/svm_vmload make use of __pa() with invalid address. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Tim Deegan <tim@xen.org>
*	Intel/VPMU: Add support for full-width PMC writes	Boris Ostrovsky	2013-08-07	1	-5/+36
\| \| \| \| \| \| \| \| \| \| \|	A recent Linux commit (069e0c3c405814778c7475d95b9fff5318f39834) added support for full-width PMC writes to performance counter registers, making these registers default for perf. Since current Xen VPMU does not support these new MSRs perf will fail to initialise in guests. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Acked-by: Keir Faser <keir@xen.org>
*	Viridian: cleanup	Jan Beulich	2013-07-17	1	-4/+4
\| \| \| \| \| \| \| \| \|	- functions used only locally should be static - constify parameters of dump functions Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
*	Viridian: populate CPUID leaf 6	Jan Beulich	2013-07-17	1	-0/+14
\| \| \| \| \| \| \| \| \|	Properly reporting hardware features we use can only help Windows in making decisions towards its own performance tuning. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
*	VMX: suppress pointless indirect calls	Jan Beulich	2013-07-17	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \|	Get the other virtual interrupt delivery related actors in sync with the newly added handle_eoi() one: Clear the respective pointers (thus avoiding the call from generic code) when the feature is unavailable instead of checking feature availability in the actors. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
*	VMX: fix interaction of APIC-V and Viridian emulation	Jan Beulich	2013-07-17	2	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	Viridian using a synthetic MSR for issuing EOI notifications bypasses the normal in-processor handling, which would clear GUEST_INTR_STATUS.SVI. Hence we need to do this in software in order for future interrupts to get delivered. Based on analysis by Yang Z Zhang <yang.z.zhang@intel.com>. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
*	x86: Special case __HYPERVISOR_iret rather more when writing hypercall pages	Andrew Cooper	2013-07-16	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In all cases when a hypercall page is written, __HYPERVISOR_iret is first written as a regular hypercall, then subsequently rewritten in its special case. For VMX and SVM, this means that following the ud2a instruction is 3 bytes of an imm32 parameter. For a ring3 kernel, this means that following the syscall instruction is the second half of 'pop %r11'. For a ring1 kernel, the iret case ends up as the same number of bytes as the rest of the hypercalls, but it is pointless writing it twice, and is changed for consistency. Therefore, skip the loop iteration which would write the incorrect __HYPERVISOR_iret hypercall. This removes junk machine code from the tail and makes disassemblers rather more happy when looking at the hypercall page. Also, a miscellaneous whitespace fix in the comment for ring3 kernel. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
*	x86/HVM: key handler registration functions can be __init	Jan Beulich	2013-07-10	2	-2/+2
\| \| \| \| \| \| \|	This applies to both SVM and VMX. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: drop MAX_VECTOR definition	Jan Beulich	2013-07-04	1	-3/+3
\| \| \| \| \| \| \| \| \|	.. in favor of NR_VECTORS, as being redundant and as the latter is correct in terms of its naming, while the former is off by one. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>