aboutsummaryrefslogtreecommitdiffstats
path: root/xen/arch/x86/x86_64
Commit message (Collapse)AuthorAgeFilesLines
* x86: check for canonical address before doing page walksJan Beulich2013-10-112-1/+3
| | | | | | | | | | | | ... as there doesn't really exists any valid mapping for them. Particularly in the case of do_page_walk() this also avoids returning non-NULL for such invalid input. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Improve information from domain_crash_synchronousAndrew Cooper2013-10-042-28/+31
| | | | | | | | | | | | | | | | | | | | | As it currently stands, the string "domain_crash_sync called from entry.S" is not helpful at identifying why the domain was crashed, and a debug build of Xen doesn't help the matter This patch improves the information printed, by pointing to where the crash decision was made. Specific improvements include: * Moving the ascii string "domain_crash_sync called from entry.S\n" away from some semi-hot code cache lines. * Moving the printk into C code (especially as this_cpu() is miserable to use in assembly code) * Undo the previous confusing situation of having the domain_crash_synchronous() as a macro in C code, yet a global symbol in assembly code. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: don't blindly create L3 tables for the direct mapJan Beulich2013-09-301-17/+12
| | | | | | | | | | | | Now that the direct map area can extend all the way up to almost the end of address space, this is wasteful. Also fold two almost redundant messages in SRAT parsing into one. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: fix compat guest handling of XENPF_enter_acpi_sleepJan Beulich2013-09-261-11/+7
| | | | | | | | | | | | | Rather than blindly defining the native name to the compat one, when we want to pass the compat structure to a native function we ought to verify that their layouts match. With a respective xlat.lst entry there's then also no need anymore to do such aliasing. While cleaaning up that file I also noticed that the Cx and Px interface handling here has quite a few unnecessary #define-s - delete them. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* x86: Introduce and use GLOBAL() in asm codeAndrew Cooper2013-09-091-9/+4
| | | | | | Also clean up some cases of misused/opencoded ENTRY() Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
* watchdog/crash: Always disable watchdog in console_force_unlock()Andrew Cooper2013-08-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | Depending on the state of the conring and serial_tx_buffer, console_force_unlock() can be a long running operation, usually because of serial_start_sync() XenServer testing has found a reliable case where console_force_unlock() on one PCPU takes long enough for another PCPU to timeout due to the watchdog (such as waiting for a tlb flush callin). The watchdog timeout causes the second PCPU to repeat the console_force_unlock(), at which point the first PCPU typically fails an assertion in spin_unlock_irqrestore(&port->tx_lock) (because the tx_lock has been unlocked behind itself). console_force_unlock() is only on emergency paths, so one way or another the host is going down. Disable the watchdog before forcing the console lock to help prevent having pcpus completing with each other to bring the host down. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* watchdog: Move watchdog from being x86 specific to common codeAndrew Cooper2013-08-131-0/+1
| | | | | | | | | | | | | | | Augment watchdog_setup() to be able to possibly return an error, and introduce watchdog_enabled() as a better alternative to knowing the architectures internal details. This patch does not change the x86 implementaion, beyond making it compile. For header files, some includes of xen/nmi.h were only for the watchdog functions, so are replaced rather than adding an extra include of xen/watchdog.h Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Special case __HYPERVISOR_iret rather more when writing hypercall pagesAndrew Cooper2013-07-162-2/+8
| | | | | | | | | | | | | | | | | | | | | | In all cases when a hypercall page is written, __HYPERVISOR_iret is first written as a regular hypercall, then subsequently rewritten in its special case. For VMX and SVM, this means that following the ud2a instruction is 3 bytes of an imm32 parameter. For a ring3 kernel, this means that following the syscall instruction is the second half of 'pop %r11'. For a ring1 kernel, the iret case ends up as the same number of bytes as the rest of the hypercalls, but it is pointless writing it twice, and is changed for consistency. Therefore, skip the loop iteration which would write the incorrect __HYPERVISOR_iret hypercall. This removes junk machine code from the tail and makes disassemblers rather more happy when looking at the hypercall page. Also, a miscellaneous whitespace fix in the comment for ring3 kernel. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86: add locking to map_pages_to_xen()Jan Beulich2013-07-151-62/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | While boot time calls don't need this, run time uses of the function which may result in L2 page tables getting populated need to be serialized to avoid two CPUs populating the same L2 (or L3) entry, overwriting each other's results. This is expected to fix what would seem to be a regression from commit b0581b92 ("x86: make map_domain_page_global() a simple wrapper around vmap()"), albeit that change only made more readily visible the already existing issue. This patch intentionally does not - add locking to the page table de-allocation logic in destroy_xen_mappings() (the only user having potential races here, msix_put_fixmap(), gets converted to use __set_fixmap() instead) - avoid races between super page splitting and reconstruction in map_pages_to_xen() (no such uses exist; races between multiple splitting attempts or between multiple reconstruction attempts are being taken care of) If we wanted to take care of these, we'd need to alter the behavior of virt_to_xen_l?e() - they would need to return with the lock held then. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: drop setup_idle_pagetable()Jan Beulich2013-06-121-8/+0
| | | | | | | | | | | With vcpu->domain->arch.perdomain_l3_pg no longer getting set up for the idle domain, this creates an invalid L4 entry (due to translating a NULL struct page_info pointer to a physical address). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
* also move compat mode VCPUOP_register_vcpu_info into common codeJan Beulich2013-05-131-9/+0
| | | | | | | | | Otherwise, with arch_compat_vcpu_op() calling arch_do_vcpu_op() to handle it, it results in -ENOSYS after 6ff9e4f7 ("xen: move VCPUOP_register_vcpu_info to common code") for 32-bit x86 domains. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: make page table unpinning preemptibleJan Beulich2013-05-021-5/+18
| | | | | | | | | | | | | | ... as it may take significant amounts of time. Since we can't re-invoke the operation in a second attempt, the continuation logic must be slightly tweaked so that we make sure do_mmuext_op() gets run one more time even when the preempted unpin operation was the last one in a batch. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* x86: make vcpu_destroy_pagetables() preemptibleJan Beulich2013-05-021-1/+1
| | | | | | | | | | | | | | | ... as it may take significant amounts of time. The function, being moved to mm.c as the better home for it anyway, and to avoid having to make a new helper function there non-static, is given a "preemptible" parameter temporarily (until, in a subsequent patch, its other caller is also being made capable of dealing with preemption). This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* x86: clear EFLAGS.NT in SYSENTER entry pathJan Beulich2013-04-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | ... as it causes problems if we happen to exit back via IRET: In the course of trying to handle the fault, the hypervisor creates a stack frame by hand, and uses PUSHFQ to set the respective EFLAGS field, but expects to be able to IRET through that stack frame to the second portion of the fixup code (which causes a #GP due to the stored EFLAGS having NT set). And even if this worked (e.g if we cleared NT in that path), it would then (through the fail safe callback) cause a #GP in the guest with the SYSENTER handler's first instruction as the source, which in turn would allow guest user mode code to crash the guest kernel. Inject a #GP on the fake (NULL) address of the SYSENTER instruction instead, just like in the case where the guest kernel didn't register a corresponding entry point. This is CVE-2013-1917 / XSA-44. Reported-by: Andrew Cooper <andrew.cooper3@citirx.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86: don't rely on __softirq_pending to be the first field in irq_cpustat_tJan Beulich2013-03-043-7/+8
| | | | | | | | | | | | | This is even more so as the field doesn't have a comment to that effect in the structure definition. Once modifying the respective assembly code, also convert the IRQSTAT_shift users to do a 32-bit shift only (as we won't support 48M CPUs any time soon) and use "cmpl" instead of "testl" when checking the field (both reducing code size). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: defer processing events on the NMI exit pathJan Beulich2013-03-042-6/+23
| | | | | | | | | | | | Otherwise, we may end up in the scheduler, keeping NMIs masked for a possibly unbounded period of time (until whenever the next IRET gets executed). Enforce timely event processing by sending a self IPI. Of course it's open for discussion whether to always use the straight exit path from handle_ist_exception. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: rework hypercall argument translation area setupJan Beulich2013-02-281-15/+5
| | | | | | | | | | | | | | ... using the new per-domain mapping management functions, adding destroy_perdomain_mapping() to the previously introduced pair. Rather than using an order-1 Xen heap allocation, use (currently 2) individual domain heap pages to populate space in the per-domain mapping area. Also fix a benign off-by-one mistake in is_compat_arg_xlat_range(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: consolidate implementations of LOG() macroIan Campbell2013-02-221-7/+1
| | | | | | | | arm64 is going to add another one shortly, so take control now. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-2110-10/+10
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* xen: Define debug_build() based on NDEBUG. Use it in a few printk's.Keir Fraser2013-01-301-6/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86: support up to 16TbJan Beulich2013-01-231-4/+17
| | | | | | | | | | | | | | This mainly involves adjusting the number of L4 entries needing copying between page tables (which is now different between PV and HVM/idle domains), and changing the cutoff point and method when more than the supported amount of memory is found in a system. Since TMEM doesn't currently cope with the full 1:1 map not always being visible, it gets forcefully disabled in that case. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* x86: properly use map_domain_page() during page table manipulationJan Beulich2013-01-233-40/+40
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: properly use map_domain_page() during domain creation/destructionJan Beulich2013-01-231-11/+7
| | | | | | | | This involves no longer storing virtual addresses of the per-domain mapping L2 and L3 page tables. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: extend frame table virtual spaceJan Beulich2013-01-231-2/+2
| | | | | | | | | | ... to allow frames for up to 16Tb. At the same time, add the super page frame table coordinates to the comment describing the address space layout. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: introduce virt_to_xen_l1e()Jan Beulich2013-01-231-0/+22
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* miscellaneous cleanupJan Beulich2013-01-171-16/+11
| | | | | | | | | | | | | | | | | ... noticed while putting together the 16Tb support patches for x86. Briefly, this (in order of the changes below) - fixes an inefficiency in x86's context switch code (translations to/ from struct page are more involved than to/from MFNs) - drop unnecessary MFM-to-page conversions - drop a redundant call to destroy_xen_mappings() (an indentical call is being made a few lines up) - simplify a VA-to-MFN translation - drop dead code (several occurrences) - add a missing __init annotation Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Assert !in_atomic() before exiting to guest context.Keir Fraser2013-01-142-0/+2
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86: compat_show_guest_stack() should not truncate MFNJan Beulich2013-01-071-2/+3
| | | | | | | | Re-using "addr" here was a mistake, as it is a 32-bit quantity. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: also print CRn register values upon double faultJan Beulich2012-12-201-16/+13
| | | | | | | Do so by simply re-using _show_registers(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/kexec: Change NMI and MCE handling on kexec pathAndrew Cooper2012-12-131-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Experimentally, certain crash kernels will triple fault very early after starting if started with NMIs disabled. This was discovered when experimenting with a debug keyhandler which deliberately created a reentrant NMI, causing stack corruption. Because of this discovered bug, and that the future changes to the NMI handling will make the kexec path more fragile, take the time now to bullet-proof the kexec behaviour to be safer in more circumstances. This patch adds three new low level routines: * nmi_crash This is a special NMI handler for using during a kexec crash. * enable_nmis This function enables NMIs by executing an iret-to-self, to disengage the hardware NMI latch. * trap_nop This is a no op handler which irets immediately. It is not declared with ENTRY() to avoid the extra alignment overhead. And adds three new IDT entry helper routines: * _write_gate_lower This is a substitute for using cmpxchg16b to update a 128bit structure at once. It assumes that the top 64 bits are unchanged (and ASSERT()s the fact) and performs a regular write on the lower 64 bits. * _set_gate_lower This is functionally equivalent to the already present _set_gate(), except it uses _write_gate_lower rather than updating both 64bit values. * _update_gate_addr_lower This is designed to update an IDT entry handler only, without altering any other settings in the entry. It also uses _write_gate_lower. The IDT entry helpers are required because: * Is it unsafe to attempt a disable/update/re-enable cycle on the NMI or MCE IDT entries. * We need to be able to update NMI handlers without changing the IST entry. As a result, the new behaviour of the kexec_crash path is: nmi_shootdown_cpus() will: * Disable the crashing cpus NMI/MCE interrupt stack tables. Disabling the stack tables removes race conditions which would lead to corrupt exception frames and infinite loops. As this pcpu is never planning to execute a sysret back to a pv vcpu, the update is safe from a security point of view. * Swap the NMI trap handlers. The crashing pcpu gets the nop handler, to prevent it getting stuck in an NMI context, causing a hang instead of crash. The non-crashing pcpus all get the nmi_crash handler which is designed never to return. do_nmi_crash() will: * Save the crash notes and shut the pcpu down. There is now an extra per-cpu variable to prevent us from executing this multiple times. In the case where we reenter midway through, attempt the whole operation again in preference to not completing it in the first place. * Set up another NMI at the LAPIC. Even when the LAPIC has been disabled, the ID and command registers are still usable. As a result, we can deliberately queue up a new NMI to re-interrupt us later if NMIs get unlatched. Because of the call to __stop_this_cpu(), we have to hand craft self_nmi() to be safe from General Protection Faults. * Fall into infinite loop. machine_kexec() will: * Swap the MCE handlers to be a nop. We cannot prevent MCEs from being delivered when we pass off to the crash kernel, and the less Xen context is being touched the better. * Explicitly enable NMIs. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Minor style changes. Signed-off-by: Keir Fraser <keir@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* x86/IST: Create set_ist() helper functionAndrew Cooper2012-12-111-3/+3
| | | | | | | | ... to save using open-coded bitwise operations, and update all IST manipulation sites to use the function. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* streamline guest copy operationsJan Beulich2012-12-102-8/+8
| | | | | | | | | | | | - use the variants not validating the VA range when writing back structures/fields to the same space that they were previously read from - when only a single field of a structure actually changed, copy back just that field where possible - consolidate copying back results in a few places Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: mark certain items staticJan Beulich2012-12-071-1/+1
| | | | | | | ..., and at once constify the data items among them. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: fix hypercall continuation cancellation in XENMAPSPACE_gmfn_range ↵Jan Beulich2012-11-281-11/+10
| | | | | | | | | | | | | | | | | compat wrapper When no continuation was established, there must also not be an attempt to cancel it - hypercall_cancel_continuation(), in the non-HVM, non- multicall case, adjusts the guest mode return address in a way assuming that an earlier call hypercall_create_continuation() took place. Once touching this code, also restructure it slightly to improve readability and switch to using the more relaxed copy function (copying from the same guest memory already validated the virtual address range). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: save/restore only partial register state where possibleJan Beulich2012-10-303-19/+25
| | | | | | | | | | | | | | | | | | | | | | | | ... and make restore conditional not only upon having saved the state, but also upon whether saved state was actually modified (and register values are known to have been preserved). Note that RBP is unconditionally considered a volatile register (i.e. irrespective of CONFIG_FRAME_POINTER), since the RBP handling would become overly complicated due to the need to save/restore it on the compat mode hypercall path [6th argument]. Note further that for compat mode code paths, saving/restoring R8...R15 is entirely unnecessary - we don't allow those guests to enter 64-bit mode, and hence they have no way of seeing these registers' contents (and there consequently also is no information leak, except if the context saving domctl would be considered such). Finally, note that this may not properly deal with gdbstub's needs, yet (but if so, I can't really suggest adjustments, as I don't know that code). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: use MOV instead of PUSH/POP when saving/restoring register stateJan Beulich2012-10-302-14/+8
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: more XEN_GUEST_HANDLE_PARAM substitutionsStefano Stabellini2012-10-173-2/+7
| | | | | | | | | | More substitutions in this patch, not as obvious as the ones in the previous patch. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* xen: replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when appropriateStefano Stabellini2012-10-174-8/+8
| | | | | | | | | | | | Note: these changes don't make any difference on x86. Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as an hypercall argument. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* trace: rename trace_hypercall() to __trace_hypercall_entry()David Vrabel2012-10-092-4/+4
| | | | | | | | Tracing functions that don't check tb_init_done are (by convention) prefixed with __. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* x86: replace literal numbersJan Beulich2012-09-281-1/+1
| | | | | | | | | In various cases, 256 was being used instead of NR_VECTORS or a derived ARRAY_SIZE() expression. In one case (guest_has_trap_callback()), a wrong (unrelated) constant was used instead of NR_VECTORS. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: enhance rsp-relative calculationsJan Beulich2012-09-261-4/+4
| | | | | | | | | | The use of "or" in GET_CPUINFO_FIELD so far wasn't ideal, as it doesn't lend itself to folding this operation with a possibly subsequent one (e.g. the well known mov+add=lea conversion). Split out the sub- operations, and shorten assembly code slightly with this. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: enable VIA CPU supportJan Beulich2012-09-211-1/+2
| | | | | | | | | | | | | Newer VIA CPUs have both 64-bit and VMX support. Enable them to be recognized for these purposes, at once stripping off any 32-bit CPU only bits from the respective CPU support file, and adding 64-bit ones found in recent Linux. This particularly implies untying the VMX == Intel assumption in a few places. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: use only a single branch for upcall-pending exit path checksJan Beulich2012-09-122-10/+10
| | | | | | | | | This utilizes the fact that the two bytes of interest are adjacent to one another and that the resulting 16-bit values of interest are within a contiguous range of numbers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: construct static, uniform parts of page tables at build timeJan Beulich2012-09-111-18/+0
| | | | | | | | ... rather than at boot time, removing unnecessary redundancy between EFI and legacy boot code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: construct static part of 1:1 mapping at build timeJan Beulich2012-09-111-2/+0
| | | | | | | | ... rather than at boot time, removing unnecessary redundancy between EFI and legacy boot code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: allow early use of fixmapsJan Beulich2012-09-111-0/+4
| | | | | | | | | As a prerequisite for adding an EHCI debug port based console implementation, set up the page tables needed for (a sub-portion of) the fixmaps together with other boot time page table construction. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: drop updating of UREGS_rip when converting sysenter to #GPJan Beulich2012-07-271-5/+2
| | | | | | | | | | | | | | | This was set to zero immediately before the #GP injection code, since SYSENTER doesn't really have a return address. Reported-by: Ian Campbell <Ian.Campbell@citrix.com> Furthermore, UREGS_cs and UREGS_rip don't need to be written a second time, as the PUSHes above already can/do take care of putting in place the intended values. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* AMD IOMMU: add mechanism to protect their PCI devices' config spacesJan Beulich2012-06-221-0/+33
| | | | | | | | | | | | | | | | | | | Recent Dom0 kernels want to disable PCI MSI on all devices, yet doing so on AMD IOMMUs (which get represented by a PCI device) disables part of the functionality set up by the hypervisor. Add a mechanism to mark certain PCI devices as having write protected config spaces (both through port based [method 1] accesses and, for x86-64, mmconfig), and use that for AMD's IOMMUs. Note that due to ptwr_do_page_fault() being run first, there'll be a MEM_LOG() issued for each such mmconfig based write attempt. If that's undesirable, the order of the calls in fixup_page_fault() would need to be swapped. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Wei Wang <wei.wang2@amd.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: revert mmconfig part of c/s 24425:053a44894279Jan Beulich2012-06-221-13/+0
| | | | | | | | These additions did not fulfill their purpose - they checked hypervisor config space accesses instead of guest (Dom0) ones. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86_64: Do not execute sysret with a non-canonical return addressJan Beulich2012-06-121-0/+11
| | | | | | | | | | | | | | | | Check for non-canonical guest RIP before attempting to execute sysret. If sysret is executed with a non-canonical value in RCX, Intel CPUs take the fault in ring0, but we will necessarily already have switched to the the user's stack pointer. This is a security vulnerability, XSA-7 / CVE-2012-0217. Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Keir Fraser <keir.xen@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>