aboutsummaryrefslogtreecommitdiffstats
path: root/xen/arch/x86/x86_64/entry.S
Commit message (Collapse)AuthorAgeFilesLines
* x86: Improve information from domain_crash_synchronousAndrew Cooper2013-10-041-24/+24
| | | | | | | | | | | | | | | | | | | | | As it currently stands, the string "domain_crash_sync called from entry.S" is not helpful at identifying why the domain was crashed, and a debug build of Xen doesn't help the matter This patch improves the information printed, by pointing to where the crash decision was made. Specific improvements include: * Moving the ascii string "domain_crash_sync called from entry.S\n" away from some semi-hot code cache lines. * Moving the printk into C code (especially as this_cpu() is miserable to use in assembly code) * Undo the previous confusing situation of having the domain_crash_synchronous() as a macro in C code, yet a global symbol in assembly code. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Introduce and use GLOBAL() in asm codeAndrew Cooper2013-09-091-9/+4
| | | | | | Also clean up some cases of misused/opencoded ENTRY() Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86: clear EFLAGS.NT in SYSENTER entry pathJan Beulich2013-04-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | ... as it causes problems if we happen to exit back via IRET: In the course of trying to handle the fault, the hypervisor creates a stack frame by hand, and uses PUSHFQ to set the respective EFLAGS field, but expects to be able to IRET through that stack frame to the second portion of the fixup code (which causes a #GP due to the stored EFLAGS having NT set). And even if this worked (e.g if we cleared NT in that path), it would then (through the fail safe callback) cause a #GP in the guest with the SYSENTER handler's first instruction as the source, which in turn would allow guest user mode code to crash the guest kernel. Inject a #GP on the fake (NULL) address of the SYSENTER instruction instead, just like in the case where the guest kernel didn't register a corresponding entry point. This is CVE-2013-1917 / XSA-44. Reported-by: Andrew Cooper <andrew.cooper3@citirx.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86: don't rely on __softirq_pending to be the first field in irq_cpustat_tJan Beulich2013-03-041-3/+3
| | | | | | | | | | | | | This is even more so as the field doesn't have a comment to that effect in the structure definition. Once modifying the respective assembly code, also convert the IRQSTAT_shift users to do a 32-bit shift only (as we won't support 48M CPUs any time soon) and use "cmpl" instead of "testl" when checking the field (both reducing code size). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: defer processing events on the NMI exit pathJan Beulich2013-03-041-5/+22
| | | | | | | | | | | | Otherwise, we may end up in the scheduler, keeping NMIs masked for a possibly unbounded period of time (until whenever the next IRET gets executed). Enforce timely event processing by sending a self IPI. Of course it's open for discussion whether to always use the straight exit path from handle_ist_exception. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Assert !in_atomic() before exiting to guest context.Keir Fraser2013-01-141-0/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86/kexec: Change NMI and MCE handling on kexec pathAndrew Cooper2012-12-131-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Experimentally, certain crash kernels will triple fault very early after starting if started with NMIs disabled. This was discovered when experimenting with a debug keyhandler which deliberately created a reentrant NMI, causing stack corruption. Because of this discovered bug, and that the future changes to the NMI handling will make the kexec path more fragile, take the time now to bullet-proof the kexec behaviour to be safer in more circumstances. This patch adds three new low level routines: * nmi_crash This is a special NMI handler for using during a kexec crash. * enable_nmis This function enables NMIs by executing an iret-to-self, to disengage the hardware NMI latch. * trap_nop This is a no op handler which irets immediately. It is not declared with ENTRY() to avoid the extra alignment overhead. And adds three new IDT entry helper routines: * _write_gate_lower This is a substitute for using cmpxchg16b to update a 128bit structure at once. It assumes that the top 64 bits are unchanged (and ASSERT()s the fact) and performs a regular write on the lower 64 bits. * _set_gate_lower This is functionally equivalent to the already present _set_gate(), except it uses _write_gate_lower rather than updating both 64bit values. * _update_gate_addr_lower This is designed to update an IDT entry handler only, without altering any other settings in the entry. It also uses _write_gate_lower. The IDT entry helpers are required because: * Is it unsafe to attempt a disable/update/re-enable cycle on the NMI or MCE IDT entries. * We need to be able to update NMI handlers without changing the IST entry. As a result, the new behaviour of the kexec_crash path is: nmi_shootdown_cpus() will: * Disable the crashing cpus NMI/MCE interrupt stack tables. Disabling the stack tables removes race conditions which would lead to corrupt exception frames and infinite loops. As this pcpu is never planning to execute a sysret back to a pv vcpu, the update is safe from a security point of view. * Swap the NMI trap handlers. The crashing pcpu gets the nop handler, to prevent it getting stuck in an NMI context, causing a hang instead of crash. The non-crashing pcpus all get the nmi_crash handler which is designed never to return. do_nmi_crash() will: * Save the crash notes and shut the pcpu down. There is now an extra per-cpu variable to prevent us from executing this multiple times. In the case where we reenter midway through, attempt the whole operation again in preference to not completing it in the first place. * Set up another NMI at the LAPIC. Even when the LAPIC has been disabled, the ID and command registers are still usable. As a result, we can deliberately queue up a new NMI to re-interrupt us later if NMIs get unlatched. Because of the call to __stop_this_cpu(), we have to hand craft self_nmi() to be safe from General Protection Faults. * Fall into infinite loop. machine_kexec() will: * Swap the MCE handlers to be a nop. We cannot prevent MCEs from being delivered when we pass off to the crash kernel, and the less Xen context is being touched the better. * Explicitly enable NMIs. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Minor style changes. Signed-off-by: Keir Fraser <keir@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* x86: save/restore only partial register state where possibleJan Beulich2012-10-301-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | ... and make restore conditional not only upon having saved the state, but also upon whether saved state was actually modified (and register values are known to have been preserved). Note that RBP is unconditionally considered a volatile register (i.e. irrespective of CONFIG_FRAME_POINTER), since the RBP handling would become overly complicated due to the need to save/restore it on the compat mode hypercall path [6th argument]. Note further that for compat mode code paths, saving/restoring R8...R15 is entirely unnecessary - we don't allow those guests to enter 64-bit mode, and hence they have no way of seeing these registers' contents (and there consequently also is no information leak, except if the context saving domctl would be considered such). Finally, note that this may not properly deal with gdbstub's needs, yet (but if so, I can't really suggest adjustments, as I don't know that code). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: use MOV instead of PUSH/POP when saving/restoring register stateJan Beulich2012-10-301-10/+6
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* trace: rename trace_hypercall() to __trace_hypercall_entry()David Vrabel2012-10-091-2/+2
| | | | | | | | Tracing functions that don't check tb_init_done are (by convention) prefixed with __. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* x86: enhance rsp-relative calculationsJan Beulich2012-09-261-4/+4
| | | | | | | | | | The use of "or" in GET_CPUINFO_FIELD so far wasn't ideal, as it doesn't lend itself to folding this operation with a possibly subsequent one (e.g. the well known mov+add=lea conversion). Split out the sub- operations, and shorten assembly code slightly with this. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: use only a single branch for upcall-pending exit path checksJan Beulich2012-09-121-6/+6
| | | | | | | | | This utilizes the fact that the two bytes of interest are adjacent to one another and that the resulting 16-bit values of interest are within a contiguous range of numbers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: drop updating of UREGS_rip when converting sysenter to #GPJan Beulich2012-07-271-5/+2
| | | | | | | | | | | | | | | This was set to zero immediately before the #GP injection code, since SYSENTER doesn't really have a return address. Reported-by: Ian Campbell <Ian.Campbell@citrix.com> Furthermore, UREGS_cs and UREGS_rip don't need to be written a second time, as the PUSHes above already can/do take care of putting in place the intended values. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86_64: Do not execute sysret with a non-canonical return addressJan Beulich2012-06-121-0/+11
| | | | | | | | | | | | | | | | Check for non-canonical guest RIP before attempting to execute sysret. If sysret is executed with a non-canonical value in RCX, Intel CPUs take the fault in ring0, but we will necessarily already have switched to the the user's stack pointer. This is a security vulnerability, XSA-7 / CVE-2012-0217. Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Keir Fraser <keir.xen@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
* x86_64: Record entry vector for double faults.Andrew Cooper2012-05-251-0/+1
| | | | | Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* x86: don't hold off NMI delivery when MCE is maskedJan Beulich2012-05-221-1/+2
| | | | | | | | | | | Likely through copy'n'paste, all three instances of guest MCE processing jumped to the wrong place (where NMI processing code correctly jumps to) when MCE-s are temporarily masked (due to one currently being processed by the guest). A nested, unmasked NMI should get delivered immediately, however. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: fix updating of UREGS_rip when converting sysenter to #GPJan Beulich2012-04-171-1/+1
| | | | | | | | (I spotted this copy-and-paste mistake only when backporting c/s 25200:80f4113be500 to 4.1 and 4.0.) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: fix #GP generation in assembly codeJan Beulich2012-04-171-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | When guest use of sysenter (64-bit PV guest) or syscall (32-bit PV guest) gets converted into a GP fault (due to no callback having got registered), we must - honor the GP fault handler's request the keep enabled or mask event delivery - not allow TBF_EXCEPTION to remain set past the generation of the (guest) exception in the vCPU's trap_bounce.flags, as that would otherwise allow for the next exception occurring in guest mode, should it happen to get handled in Xen itself, to nevertheless get bounced to the guest kernel. Also, just like compat mode syscall handling already did, native mode sysenter handling should, when converting to #GP, subtract 2 from the RIP present in the frame so that the guest's GP fault handler would see the fault pointing to the offending instruction instead of past it. Finally, since those exception generating code blocks needed to be modified anyway, convert them to make use of UNLIKELY_{START,END}(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: remove unnecessary ALIGN directivesJan Beulich2011-07-011-2/+0
| | | | | | ENTRY() already includes an ALIGN. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: streamline page fault pathJan Beulich2011-07-011-6/+4
| | | | | | | | | #PF is, in all "normal" usage models, the only potentially high frequency (and hence performance sensitive) exception. Thus make it the fall-through case into handle_exception (rather than divide_error for x86-32 and not having one at all for x86-64). Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86, vtd: [CVE-2011-1898] Protect against malicious MSIs from untrusted devices.Keir Fraser2011-05-121-0/+8
| | | | | | | | | | | | | | | | | | | | | In the absence of VT-d interrupt remapping support, a device can send arbitrary APIC messages to host CPUs. One class of attack that results is to confuse the hypervisor by delivering asynchronous interrupts to vectors that are expected to handle only synchronous traps/exceptions. We block this class of attack by: (1) setting APIC.TPR=0x10, to block all interrupts below vector 0x20. This blocks delivery to all architectural exception vectors. (2) checking APIC.ISR[vec] for vectors 0x80 (fast syscall) and 0x82 (hypercall). In these cases we BUG if we detect we are handling a hardware interrupt -- turning a potentially more severe infiltration into a straightforward system crash (i.e, DoS). Thanks to Invisible Things Lab <http://www.invisiblethingslab.com> for discovery and detailed investigation of this attack. Signed-off-by: Keir Fraser <keir@xen.org>
* x86: split struct vcpuJan Beulich2011-04-051-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | This is accomplished by splitting the guest_context member, which by itself is larger than a page on x86-64. Quite a number of fields of this structure is completely meaningless for HVM guests, and thus a new struct pv_vcpu gets introduced, which is being overlaid with struct hvm_vcpu in struct arch_vcpu. The one member that is mostly responsible for the large size is trap_ctxt, which now gets allocated separately (unless fitting on the same page as struct arch_vcpu, as is currently the case for x86-32), and only for non-hvm, non-idle domains. This change pointed out a latent problem in arch_set_info_guest(), which is permitted to be called on already initialized vCPU-s, but so far copied the new state into struct arch_vcpu without (in this case) actually going through all the necessary accounting/validation steps. The logic gets changed so that the pieces that bypass accounting will at least be verified to be no different from the currently active bits, and the whole change will fail in case they are. The logic does *not* get adjusted here to do full error recovery, that is, partially modified state continues to not get unrolled in case of failure. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: Do not pollute namespace with asm defns of PERFC_*.Keir Fraser2011-01-261-2/+2
| | | | | | This fixes the build with perfc=y. Signed-off-by: Keir Fraser <keir@xen.org>
* x86-64: use PC-relative exception table entriesKeir Fraser2010-12-241-14/+14
| | | | | | | | | | | | | ... thus allowing to make the entries half their current size. Rather than adjusting all instances to the new layout, abstract the construction the table entries via a macro (paralleling a similar one in recent Linux). Also change the name of the section (to allow easier detection of missed cases) and merge the final resulting output sections into .data.read_mostly. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: move early page fault code into .init.textKeir Fraser2010-12-161-0/+2
| | | | Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86/asm: allow some unlikely taken branches to be statically predicted this wayKeir Fraser2010-12-161-7/+10
| | | | | | | | ... by moving the respective code out of line (into sub-section 1 of the particular section). A few other branches could be eliminated altogether. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86-64: fix restoring of hypercall arguments after trace calloutKeir Fraser2010-12-151-2/+2
| | | | Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: Fix NMI injection to PV guestsKeir Fraser2010-08-051-1/+1
| | | | Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* x86: Avoid assumptions about C struct layouts from asm code.Keir Fraser2010-07-131-14/+3
| | | | | | Largely this involves avoiding assumptions about 'struct cpu_info'. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86: fix MCE/NMI injectionKeir Fraser2009-12-011-10/+12
| | | | | | | | | | | | | | | | | | | | This attempts to address all the concerns raised in http://lists.xensource.com/archives/html/xen-devel/2009-11/msg01195.html, but I'm nevertheless still not convinced that all aspects of the injection handling really work reliably. In particular, while the patch here on top of the fixes for the problems menioned in the referenced mail also adds code to keep send_guest_trap() from injecting multiple events at a time, I don't think the is the right mechanism - it should be possible to handle NMI/MCE nested within each other. Another fix on top of the ones for the earlier described problems is that the vCPU affinity restore logic didn't account for software injected NMIs - these never set cpu_affinity_tmp, but due to it most likely being different from cpu_affinity it would have got restored (to a potentially random value) nevertheless. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Miscellaneous data placement adjustmentsKeir Fraser2009-10-281-1/+1
| | | | | | | Make various data items const or __read_mostly where possible/reasonable. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* tmem: Placeholder hypercall.Keir Fraser2009-03-201-0/+2
| | | | | Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86_64: Remove bogus extra do_xsm_op from hypercall_args_tableKeir Fraser2009-01-301-1/+0
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86: MCA support.Keir Fraser2008-07-041-3/+32
| | | | Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
* x86: Change a local label in asm entry stubs to really be local.Keir Fraser2008-05-221-3/+2
| | | | | | This prevents it appearing in crash traces, where it can be a bit confusing. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86: Clean up NMI delivery logic. Allow set_trap_table vector 2 to beKeir Fraser2007-10-291-6/+4
| | | | | specified as not disabling event delivery, just like any other vector. Signed-off-by: Keir Fraser <keir@xensource.com>
* x86: Remove CALLBACKTYPE_sysexit.Keir Fraser2007-10-241-4/+2
| | | | | | | Looking at the Linux patch as an example, it adds more code and complexity than it removes, for no obvious performance benefit. Signed-off-by: Keir Fraser <keir@xensource.com>
* x86-64: syscall/sysenter support for 32-bit apps for both 32-bit appsKeir Fraser2007-10-241-10/+57
| | | | | | | in 64-bit pv guests and 32on64. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xensource.com>
* x86: Fix xentrace of hypercalls in debug builds of Xen.Keir Fraser2007-10-231-7/+11
| | | | | | Based on a patch by Yosuke Iwamatsu <y-iwamatsu@ab.jp.nec.com> Signed-off-by: Keir Fraser <keir@xensource.com>
* x86/64: Do not clobber %r11 (user rflags) on syscall from guestKeir Fraser2007-10-151-1/+2
| | | | | | userspace to guest kernel. The flags are saved on the guest kernel stack anyway, but some guests rely on %r11 instead. Signed-off-by: Keir Fraser <keir@xensource.com>
* xentrace/x86: PV guest tracing extensions.Keir Fraser2007-10-121-0/+12
| | | | | From: George Dunlap <gdunlap@xensource.com> Signed-off-by: Keir Fraser <keir@xensource.com>
* x86: Rename math_state_restore() to more logicalKeir Fraser2007-10-011-1/+1
| | | | | | do_device_not_available(), following naming convection for all other C exception handlers. Signed-off-by: Keir Fraser <keir@xensource.com>
* Xen Security Modules: ACM.kfraser@localhost.localdomain2007-08-311-3/+2
| | | | Signed-off-by: George Coker <gscoker@alpha.ncsc.mil>
* Xen Security Modules: XSMkfraser@localhost.localdomain2007-08-311-0/+2
| | | | Signed-off-by: George Coker <gscoker@alpha.ncsc.mil>
* x86-64: clear DF for kernel when forwarding syscall.kfraser@localhost.localdomain2007-07-031-0/+1
| | | | Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Fix x86/64 failsafe callback handling.kfraser@localhost.localdomain2007-06-211-11/+11
| | | | Signed-off-by: Keir Fraser <keir@xensource.com>
* x86: machine check exception handlingkfraser@localhost.localdomain2007-06-211-19/+21
| | | | | | | | | | Properly handle MCE (connecting the exisiting, but so far unused vendor specific handlers). HVM guests don't own CR4.MCE (and hence can't suppress the exception) anymore, preventing silent machine shutdown. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xensource.com>
* xen: More 'IS_COMPAT' cleanups.kfraser@localhost.localdomain2007-04-271-4/+4
| | | | Signed-off-by: Keir Fraser <keir@xensource.com>
* xen: Fix up use of trap_bounce structure.kfraser@localhost.localdomain2007-04-251-11/+11
| | | | | Fixes suggested by Jan Beulich. Signed-off-by: Keir Fraser <keir@xensource.com>
* xen x86/64: Fix int80 direct trap. It must check for events and alsoKeir Fraser2007-04-061-2/+4
| | | | | | | | disable interrupts before exiting to guest context. Also sprinkle about some assertions about interrupt-enable status. Signed-off-by: Keir Fraser <keir@xensource.com>