aboutsummaryrefslogtreecommitdiffstats
path: root/xen/include/asm-x86
Commit message (Collapse)AuthorAgeFilesLines
* x86/irq: local_irq_restore() should not blindly popfAndrew Cooper2013-10-221-3/+8
| | | | | | | | | | local_irq_restore() should only be concerned with possibly changing the interrupt flag. A blind popf could corrupt other system flags. While playing in this area, fixup an opencoded use of X86_EFLAGS_IF. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/HVM: cache emulated instruction for retry processingJan Beulich2013-10-141-0/+3
| | | | | | | | | | | | | Rather than re-reading the instruction bytes upon retry processing, stash away and re-use what we already read. That way we can be certain that the retry won't do something different from what requested the retry, getting once again closer to real hardware behavior (where what we use retries for is simply a bus operation, not involving redundant decoding of instructions). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/HVM: fix direct PCI port I/O emulation retry and error handlingJan Beulich2013-10-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | dpci_ioport_{read,write}() guest memory access failure handling should be modelled after process_portio_intercept()'s (and others): Upon encountering an error on other than the first iteration, the count successfully handled needs to be stored and X86EMUL_OKAY returned, in order for the generic instruction emulator to update register state correctly before reporting failure or retrying (both of which would only happen after re-invoking emulation). Further we leverage (and slightly extend, due to the above mentioned need to return X86EMUL_OKAY) the "large MMIO" retry model. Note that there is still a special case not explicitly taken care of here: While the first retry on the last iteration of a "rep ins" correctly recovers the already read data, an eventual subsequent retry is being handled by the pre-existing mmio-large logic (through hvmemul_do_io() storing the [recovered] data [again], also taking into consideration that the emulator converts a single iteration "ins" to ->read_io() plus ->write()). Also fix an off-by-one in the mmio-large-read logic, and slightly simplify the copying of the data. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: fix bug_line()Jan Beulich2013-10-141-2/+4
| | | | | | | | | | | Due to the packing into a bit field together with a relocated field, the computation can overflow when the relocated field ends up getting a negative value stored. Hence it isn't sufficient to correct the value by 1 in this case, but we also need to mask the result to the width of the original bit field. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: use {rd,wr}{fs,gs}base when availableJan Beulich2013-10-112-3/+59
| | | | | | | | | | | | ... as being intended to be faster than MSR reads/writes. In the case of emulate_privileged_op() also use these in favor of the cached (but possibly stale) addresses from arch.pv_vcpu. This allows entirely removing the code that was the subject of XSA-67. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: add address validity check to guest_map_l1e()Jan Beulich2013-10-111-1/+2
| | | | | | | | | | Just like for guest_get_eff_l1e() this prevents accessing as page tables (and with the wrong memory attribute) internal data inside Xen happening to be mapped with 1Gb pages. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: correct LDT checksJan Beulich2013-10-112-2/+3
| | | | | | | | | | | | | | | | | | | | | | - MMUEXT_SET_LDT should behave as similarly to the LLDT instruction as possible: fail only if the base address is non-canonical - instead LDT descriptor accesses should fault if the descriptor address ends up being non-canonical (by ensuring this we at once avoid reading an entry from the mach-to-phys table and consider it a page table entry) - fault propagation on using LDT selectors must distinguish #PF and #GP (the latter must be raised for a non-canonical descriptor address, which also applies to several other uses of propagate_page_fault(), and hence the problem is being fixed there) - map_ldt_shadow_page() should properly wrap addresses for 32-bit VMs At once remove the odd invokation of map_ldt_shadow_page() from the MMUEXT_SET_LDT handler: There's nothing really telling us that the first LDT page is going to be preferred over others. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen/x86: Remove GB macro in asm-x86/config.hJulien Grall2013-10-081-1/+0
| | | | | | | | | | | Commit 983843e "xen: Add macros MB and GB" introduce a generic GB macro. By mistake, the macro in asm-x86/config.h was not removed. This is result to a compilation error when Xen is build for x86. Signed-off-by: Julien Grall <julien.grall@linaro.org> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86/HPET: basic cleanupAndrew Cooper2013-10-081-4/+2
| | | | | | | | | * Strip trailing whitespace * Remove redundant definitions * Update stale documentation links * Move hpet_address into __initdata Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
* xen: add LZ4 decompression supportKyungsik Lee2013-10-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for LZ4 decompression in Xen. LZ4 Decompression APIs for Xen are based on LZ4 implementation by Yann Collet. Benchmark Results(PATCH v3) Compiler: Linaro ARM gcc 4.6.2 1. ARMv7, 1.5GHz based board Kernel: linux 3.4 Uncompressed Kernel Size: 14MB Compressed Size Decompression Speed LZO 6.7MB 20.1MB/s, 25.2MB/s(UA) LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA) 2. ARMv7, 1.7GHz based board Kernel: linux 3.7 Uncompressed Kernel Size: 14MB Compressed Size Decompression Speed LZO 6.0MB 34.1MB/s, 52.2MB/s(UA) LZ4 6.5MB 86.7MB/s - UA: Unaligned memory Access support - Latest patch set for LZO applied This patch set is for adding support for LZ4-compressed Kernel. LZ4 is a very fast lossless compression algorithm and it also features an extremely fast decoder [1]. But we have five of decompressors already and one question which does arise, however, is that of where do we stop adding new ones? This issue had been discussed and came to the conclusion [2]. Russell King said that we should have: - one decompressor which is the fastest - one decompressor for the highest compression ratio - one popular decompressor (eg conventional gzip) If we have a replacement one for one of these, then it should do exactly that: replace it. The benchmark shows that an 8% increase in image size vs a 66% increase in decompression speed compared to LZO(which has been known as the fastest decompressor in the Kernel). Therefore the "fast but may not be small" compression title has clearly been taken by LZ4 [3]. [1] http://code.google.com/p/lz4/ [2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157 [3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347 LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html LZ4 source repository: http://code.google.com/p/lz4/ Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com> Signed-off-by: Yann Collet <yann.collet.73@gmail.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Improve information from domain_crash_synchronousAndrew Cooper2013-10-041-0/+4
| | | | | | | | | | | | | | | | | | | | | As it currently stands, the string "domain_crash_sync called from entry.S" is not helpful at identifying why the domain was crashed, and a debug build of Xen doesn't help the matter This patch improves the information printed, by pointing to where the crash decision was made. Specific improvements include: * Moving the ascii string "domain_crash_sync called from entry.S\n" away from some semi-hot code cache lines. * Moving the printk into C code (especially as this_cpu() is miserable to use in assembly code) * Undo the previous confusing situation of having the domain_crash_synchronous() as a macro in C code, yet a global symbol in assembly code. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* Nested VMX: fix IA32_VMX_CR4_FIXED1 msr emulationYang Zhang2013-10-042-0/+2
| | | | | | | | | | | | | Currently, it use hardcode value for IA32_VMX_CR4_FIXED1. This is wrong. We should check guest's cpuid to know which bits are writeable in CR4 by guest and allow the guest to set the corresponding bit only when guest has the feature. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Cleanup. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* VMX: clean up capability checksJan Beulich2013-10-041-2/+8
| | | | | | | | | | | | VMCS size validation on APs should check against BP's size. No need for a separate cpu_has_vmx_ins_outs_instr_info variable anymore. Use proper symbolics. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* Nested VMX: check VMX capability before read VMX related MSRsYang Zhang2013-10-041-0/+2
| | | | | | | | | | | | VMX MSRs only available when the CPU support the VMX feature. In addition, VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Cleanup. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* x86: don't blindly create L3 tables for the direct mapJan Beulich2013-09-301-1/+1
| | | | | | | | | | | | Now that the direct map area can extend all the way up to almost the end of address space, this is wasteful. Also fold two almost redundant messages in SRAT parsing into one. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/AMD-Vi: Fix IVRS HPET special->handle overrideSuravee Suthikulpanit2013-09-301-2/+5
| | | | | | | | | | | | | The current logic does not handle the case when HPET special->handle is invalid in IVRS. On such system, the following message is shown: (XEN) AMD-Vi: Failed to setup HPET MSI remapping: Wrong HPET This patch will allow the ivrs_hpet[<handle>]=<sbdf> to override the IVRS. Also, it removes struct hpet_sbdf.iommu since it is not used anywhere in the code. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
* x86: fix rdrand asm()Jan Beulich2013-09-261-1/+1
| | | | | | | | | | | | Just learned the hard way that at least for non-volatile asm()s gcc indeed does what the documentation says: It may move it across jumps (i.e. ahead of the cpu_has() check). While the documentation claims that this can also happen for volatile asm()s, if that was the case we'd have many more problems in our code (and e,g, Linux would too). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* VMX: drop memory clobbers from vmread/vmwrite wrappersJan Beulich2013-09-231-9/+6
| | | | | | | | All effects are properly being described by the asm() constraints. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* VMX: also use proper instruction mnemonic for VMREADJan Beulich2013-09-231-8/+13
| | | | | | | | | | | ... when assembler supports it, following commit cfd54835 ("VMX: use proper instruction mnemonics if assembler supports them"). This merely got split off from the earlier change becase of the significant number of call sites needing to be changed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* x86: mark BUG()s and assertion failures as terminal.Tim Deegan2013-09-191-3/+8
| | | | | | | | | This helps avoid static analysis false-positives, and might lead to better code density as the compiler knows it doesn't have to restore spilled state &c. Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
* hvm/vpmu: Prevent dump handlers from incorrectly mutating stateAndrew Cooper2013-09-161-1/+1
| | | | | | | | | | | | | | | | | | Discovered by Coverity, CID 1055181 core2_vpmu_dump() was incorrectly setting VPMU_CONTEXT_LOADED when it was intending to check for it. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> This would have been avoided if the dump function declared all its pointers "const" - doing this now (also in SVM). Also fixing some indentation issues at once. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
* console: buffer and show origin of guest PV writesDaniel De Graaf2013-09-101-6/+0
| | | | | | | | | | | | | Guests other than domain 0 using the console output have previously been controlled by the VERBOSE #define, but with no designation of which guest's output was on the console. This patch converts the HVM output buffering to be used by all domains except the hardware domain (dom0): stripping non-printable characters, line buffering the output, and prefixing it with the domain ID. This is especially useful for debugging stub domains during early boot. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
* x86/xsave: fix migration from xsave-capable to xsave-incapable hostJan Beulich2013-09-093-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CPUID features suitably masked this is supposed to work, but was completely broken (i.e. the case wasn't even considered when the original xsave save/restore code was written). First of all, xsave_enabled() wrongly returned the value of cpu_has_xsave, i.e. not even taking into consideration attributes of the vCPU in question. Instead this function ought to check whether the guest ever enabled xsave support (by writing a [non-zero] value to XCR0). As a result of this, a vCPU's xcr0 and xcr0_accum must no longer be initialized to XSTATE_FP_SSE (since that's a valid value a guest could write to XCR0), and the xsave/xrstor as well as the context switch code need to suitably account for this (by always enforcing at least this part of the state to be saved/loaded). This involves undoing large parts of c/s 22945:13a7d1f7f62c ("x86: add strictly sanity check for XSAVE/XRSTOR") - we need to cleanly distinguish between hardware capabilities and vCPU used features. Next both HVM and PV save code needed tweaking to not always save the full state supported by the underlying hardware, but just the parts that the guest actually used. Similarly the restore code should bail not just on state being restored that the hardware cannot handle, but also on inconsistent save state (inconsistent XCR0 settings or size of saved state not in line with XCR0). And finally the PV extended context get/set code needs to use slightly different logic than the HVM one, as here we can't just key off of xsave_enabled() (i.e. avoid doing anything if a guest doesn't use xsave) because the tools use this function to determine host capabilities as well as read/write vCPU state. The set operation in particular needs to be capable of cleanly dealing with input that consists of only the xcr0 and xcr0_accum values (if they're both zero then no further data is required). While for things to work correctly both sides (saving _and_ restoring host) need to run with the fixed code, afaict no breakage should occur if either side isn't up to date (other than the breakage that this patch attempts to fix). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Introduce and use GLOBAL() in asm codeAndrew Cooper2013-09-091-0/+3
| | | | | | Also clean up some cases of misused/opencoded ENTRY() Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
* SVM: streamline entry.S codeJan Beulich2013-09-091-2/+8
| | | | | | | | | | | | | - fix a bogus "test" with zero immediate - move stuff easily/better done in C into C code - re-arrange code paths so that no redundant GET_CURRENT() would remain on the fast paths - move long latency operations earlier - slightly defer disabling global interrupts on the VM entry path Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
* VMX: use proper instruction mnemonics if assembler supports themJan Beulich2013-09-091-20/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the hex byte emission we were taking away a good part of flexibility from the compiler, as for simplicity reasons these were built using fixed operands. All half way modern build environments would allow using the mnemonics (but we can't disable the hex variants yet, since the binutils around at the time gcc 4.1 got released didn't support these yet). I didn't convert __vmread() yet because that would, just like for __vmread_safe(), imply converting to a macro so that the output operand can be the caller supplied variable rather than an intermediate one. As that would require touching all invocation points of __vmread() (of which there are quite a few), I'd first like to be certain the approach is acceptable; the main question being whether the now conditional code might be considered to cause future maintenance issues, and the second being that of parameter/argument ordering (here I made __vmread_safe() match __vmwrite(), but one could also take the position that read and write should use the inverse order of one another, in line with the actual instruction operands). Additionally I was quite puzzled to find that all the asm()-s involved here have memory clobbers - what are they needed for? Or can they be dropped at least in some cases? Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
* VMX: move various uses of UD2 out of fast pathsJan Beulich2013-09-092-6/+43
| | | | | | | | | ... at once making conditional forward jumps, which are statically predicted to be not taken, only used for the unlikely (error) cases. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
* VMX: streamline entry.S codeJan Beulich2013-09-091-0/+15
| | | | | | | | | | | | | - move stuff easily/better done in C into C code - re-arrange code paths so that no redundant GET_CURRENT() would remain on the fast paths - move long latency operations earlier - slightly defer disabling interrupts on the VM entry path - use ENTRY() instead of open coding it Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
* xen/x86: don't use '.ifnes' in bug frame construction.Tim Deegan2013-08-301-6/+8
| | | | | | | | | | | | | | | Spotted because it breaks the clang build for LLVM <3.2. .ifnes is not right here as it will choke on a string with embedded quotes. .ifnb would be better except that LLVM <3.2 doesn't support that either (and nor does binutils 2.16). It should be possible to use something like !!msg or !!msg[0] instead of a separate flag, but I gave up trying to find something that would make it through CPP, asm() and gas as a usable constant. :| Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
* x86/mwait_idle: export both C1 and C1ELen Brown2013-08-301-0/+2
| | | | | | | | | | | | | | | | | | Here we disable HW promotion of C1 to C1E and export both C1 and C1E as distinct C-states. This allows a cpuidle governor to choose a lower latency C-state than C1E when necessary to satisfy performance and QOS constraints -- and still save power versus polling. This also corrects the erroneous latency previously reported for C1E -- it is 10usec, not 1usec. Signed-off-by: Len Brown <len.brown@intel.com> Avoided the effect of changing the meaning of "max_cstate=". Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/mwait_idle: remove assumption of one C-state per MWAIT flagLen Brown2013-08-301-1/+0
| | | | | | | | | | | | | | | | | | | Remove the assumption that cstate_tables are indexed by MWAIT flag values. Each entry identifies itself via its own flags value. This change is needed to support multiple states that share the same MWAIT flags. Note that this can have an effect on what state is described by 'N' on cmdline max_cstate=N on some systems. Signed-off-by: Len Brown <len.brown@intel.com> Avoided the effect of changing the meaning of "max_cstate=". Drop MWAIT_MAX_NUM_CSTATES (done differently in a prior patch on Linux). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/xsave: initialization improvementsJan Beulich2013-08-301-1/+1
| | | | | | | | | | - properly validate available feature set on APs - also validate xsaveopt availability on APs - properly indicate whether the initialization is on the BSP (we shouldn't be using "cpu == 0" checks for this) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* AMD IOMMU: allow command line overrides for broken IVRS tablesJan Beulich2013-08-291-0/+1
| | | | | | | | | | | | | | With there being so many systems with broken ACPI tables, and with it generally being known what's wrong with those tables, give people a handle to overcome the resulting disabling of their IOMMUs. Inspired by Linux side patches providing similar functionality. Suggested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulapanit@amd.com>
* x86/time: remove Cyclone as a platform timerMatt Wilson2013-08-271-1/+0
| | | | | | | | | The Cyclone time source was part of IBM's Summit chipset, which was only used for 32-bit only ccNUMA and IA-64 machines. Neither of these are supported by Xen anymore. Signed-off-by: Matt Wilson <msw@amazon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86/apic: remove Summit supportMatt Wilson2013-08-272-109/+1
| | | | | | | | IBM's Summit chipset was only used for 32-bit only Intel ccNUMA and IA-64 machines, neither of which are supported by Xen anymore. Signed-off-by: Matt Wilson <msw@amazon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
* VMX: convert EOI exit bitmap to a proper bitmapJan Beulich2013-08-271-9/+3
| | | | | | | | | ... allowing bitmap operations to be used on it, making things consistent with struct pi_desc's pir field, and shrinking overall source code size. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* un-alias cpumask_any() from cpumask_first()Jan Beulich2013-08-231-0/+16
| | | | | | | | | | | | | | | | | | In order to achieve more symmetric distribution of certain things, cpumask_any() shouldn't always pick the first CPU (which frequently will end up being CPU0). To facilitate that, introduce a library-like function to obtain random numbers. The per-architecture function is supposed to return zero if no valid random number can be obtained (implying that if occasionally zero got produced as random number, it wouldn't be considered such). As fallback this uses the trivial algorithm from the C standard, extended to produce "unsigned int" results. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
* PCI: break MSI-X data out of struct pci_dev_infoJan Beulich2013-08-231-0/+16
| | | | | | | | | | | Considering that a significant share of PCI devices out there (not the least the myriad of CPU-exposed ones) don't support MSI-X at all, and that the amount of data is well beyond a handful of bytes, break this out of the common structure, at once allowing the actual data to be tracked to become architecture specific. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: move struct bug_frame instances out of lineJan Beulich2013-08-232-40/+47
| | | | | | | | | | | | | | Just like Linux did many years ago, move them into a separate (data) section, such that they no longer pollute instruction caches and TLBs. Assertion frames, requiring two pointers to be stored, occupy two slots in the array, with the second slot mimicking a frame the location pointer of which doesn't match any address within .text or .init.text (it effectively points back to the slot itself, which - being in a data section - can't be reached by non-buggy execution). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Nested VMX: Update APIC-v(RVI/SVI) when vmexit to L1Yang Zhang2013-08-223-2/+4
| | | | | | | | | | | | | | | | | | | If enabling APIC-v, all interrupts to L1 are delivered through APIC-v. But when L2 is running, external interrupt will casue L1 vmexit with reason external interrupt. Then L1 will pick up the interrupt through vmcs12. when L1 ack the interrupt, since the APIC-v is enabled when L1 is running, so APIC-v hardware still will do vEOI updating. The problem is that the interrupt is delivered not through APIC-v hardware, this means SVI/RVI/vPPR are not setting, but hardware required them when doing vEOI updating. The solution is that, when L1 tried to pick up the interrupt from vmcs12, then hypervisor will help to update the SVI/RVI/vPPR to make sure the following vEOI updating and vPPR updating corrently. Also, since interrupt is delivered through vmcs12, so APIC-v hardware will not cleare vIRR and hypervisor need to clear it before L1 running. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: "Dong, Eddie" <eddie.dong@intel.com>
* xen: remove evtchn_upcall_mask from interface on ARMIan Campbell2013-08-202-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | On ARM event-channel upcalls are masked using the hardware's interrupt mask bit and not by a software bit. Leaving this field present in the interface has caused some confusion already and is liable to mean it gets inadvertently used in the future. So arrange for this field to be turned into a padding field on ARM by introducing a XEN_HAVE_PV_UPCALL_MASK define. This bit is also unused for x86 PV-on-HVM guests, but we can't realistically distinguish those from x86 PV guests in the headers. Add a per-arch vcpu_event_delivery_is_enabled function to replace an open coded use of evtchn_upcall_mask in common code (in a debug keyhandler). The existing local_event_delivery_is_enabled, which operates only on current, was unimplemented on ARM and unused on x86, so remove it. ifdef the use of evtchn_upcall_mask when setting up a new vcpu info page. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
* xen: only expose start_info on architectures which have a PV boot pathIan Campbell2013-08-201-0/+1
| | | | | | | | | Most of this struct is PV MMU specific and it is not used on ARM at all. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
* watchdog: Move watchdog from being x86 specific to common codeAndrew Cooper2013-08-132-4/+1
| | | | | | | | | | | | | | | Augment watchdog_setup() to be able to possibly return an error, and introduce watchdog_enabled() as a better alternative to knowing the architectures internal details. This patch does not change the x86 implementaion, beyond making it compile. For header files, some includes of xen/nmi.h were only for the watchdog functions, so are replaced rather than adding an extra include of xen/watchdog.h Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/AMD: Inject #GP instead of #UD when unable to map vmcbSuravee Suthikulpanit2013-08-131-1/+1
| | | | | | | | | | According to AMD Programmer's Manual vol2, vmrun, vmsave and vmload should inject #GP instead of #UD when unable to access memory location for vmcb. Also, the code should make sure that L1 guest EFER.SVME is not zero. Otherwise, #UD should be injected. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Tim Deegan <tim@xen.org>
* x86/AMD: Fix nested svm crash due to assertion in __virt_to_maddrSuravee Suthikulpanit2013-08-131-4/+7
| | | | | | | | | Fix assertion in __virt_to_maddr when starting nested SVM guest in debug mode. Investigation has shown that svm_vmsave/svm_vmload make use of __pa() with invalid address. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Tim Deegan <tim@xen.org>
* x86: enable multi-vector MSIJan Beulich2013-08-082-1/+2
| | | | | | | | | | | | | | | This implies - extending the public interface to have a way to request a block of MSIs - allocating a block of contiguous pIRQ-s for the target domain (but note that the Xen IRQs allocated have no need of being contiguous) - repeating certain operations for all involved IRQs - fixing multi_msi_enable() - adjusting the mask bit accesses for maskable MSIs Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* Intel/VPMU: Add support for full-width PMC writesBoris Ostrovsky2013-08-071-1/+1
| | | | | | | | | | | A recent Linux commit (069e0c3c405814778c7475d95b9fff5318f39834) added support for full-width PMC writes to performance counter registers, making these registers default for perf. Since current Xen VPMU does not support these new MSRs perf will fail to initialise in guests. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Acked-by: Keir Faser <keir@xen.org>
* x86: don't use destroy_xen_mappings() for vunmap()Jan Beulich2013-07-171-0/+1
| | | | | | | | | | | | | | | | | | Its attempt to tear down intermediate page table levels may race with map_pages_to_xen() establishing them, and now that map_domain_page_global() is backed by vmap() this teardown is also wasteful (as it's very likely to need the same address space populated again within foreseeable time). As the race between vmap() and vunmap(), according to the latest stage tester logs, doesn't appear to be the only one still left, the patch also adds logging for vmap() and vunmap() uses (there shouldn't be too many of them, so logs shouldn't get flooded). These are supposed to get removed (and are made stand out clearly) as soon as we're certain that there's no issue left. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* VMX: fix interaction of APIC-V and Viridian emulationJan Beulich2013-07-171-0/+1
| | | | | | | | | | | | | Viridian using a synthetic MSR for issuing EOI notifications bypasses the normal in-processor handling, which would clear GUEST_INTR_STATUS.SVI. Hence we need to do this in software in order for future interrupts to get delivered. Based on analysis by Yang Z Zhang <yang.z.zhang@intel.com>. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
* AMD IOMMU: allocate IRTE entries instead of using a static mappingJan Beulich2013-07-163-7/+8
| | | | | | | | | | | For multi-vector MSI, where we surely don't want to allocate contiguous vectors and be able to set affinities of the individual vectors separately, we need to drop the use of the tuple of vector and delivery mode to determine the IRTE to use, and instead allocate IRTEs (which imo should have been done from the beginning). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>