| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
... as there doesn't really exists any valid mapping for them.
Particularly in the case of do_page_walk() this also avoids returning
non-NULL for such invalid input.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Depending on the state of the conring and serial_tx_buffer,
console_force_unlock() can be a long running operation, usually because of
serial_start_sync()
XenServer testing has found a reliable case where console_force_unlock() on
one PCPU takes long enough for another PCPU to timeout due to the watchdog
(such as waiting for a tlb flush callin).
The watchdog timeout causes the second PCPU to repeat the
console_force_unlock(), at which point the first PCPU typically fails an
assertion in spin_unlock_irqrestore(&port->tx_lock) (because the tx_lock has
been unlocked behind itself).
console_force_unlock() is only on emergency paths, so one way or another the
host is going down. Disable the watchdog before forcing the console lock to
help prevent having pcpus completing with each other to bring the host down.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Augment watchdog_setup() to be able to possibly return an error, and introduce
watchdog_enabled() as a better alternative to knowing the architectures
internal details.
This patch does not change the x86 implementaion, beyond making it compile.
For header files, some includes of xen/nmi.h were only for the watchdog
functions, so are replaced rather than adding an extra include of
xen/watchdog.h
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In all cases when a hypercall page is written, __HYPERVISOR_iret is first
written as a regular hypercall, then subsequently rewritten in its special
case.
For VMX and SVM, this means that following the ud2a instruction is 3 bytes of
an imm32 parameter. For a ring3 kernel, this means that following the syscall
instruction is the second half of 'pop %r11'.
For a ring1 kernel, the iret case ends up as the same number of bytes as the
rest of the hypercalls, but it is pointless writing it twice, and is changed
for consistency.
Therefore, skip the loop iteration which would write the incorrect
__HYPERVISOR_iret hypercall. This removes junk machine code from the tail and
makes disassemblers rather more happy when looking at the hypercall page.
Also, a miscellaneous whitespace fix in the comment for ring3 kernel.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
| |
The emacs variable to set the C style from a local variable block is
c-file-style, not c-set-style.
Signed-off-by: David Vrabel <david.vrabel@citrix.com
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
Do so by simply re-using _show_registers().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
... to save using open-coded bitwise operations, and update all IST
manipulation sites to use the function.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
| |
..., and at once constify the data items among them.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... and make restore conditional not only upon having saved the state,
but also upon whether saved state was actually modified (and register
values are known to have been preserved).
Note that RBP is unconditionally considered a volatile register (i.e.
irrespective of CONFIG_FRAME_POINTER), since the RBP handling would
become overly complicated due to the need to save/restore it on the
compat mode hypercall path [6th argument].
Note further that for compat mode code paths, saving/restoring R8...R15
is entirely unnecessary - we don't allow those guests to enter 64-bit
mode, and hence they have no way of seeing these registers' contents
(and there consequently also is no information leak, except if the
context saving domctl would be considered such).
Finally, note that this may not properly deal with gdbstub's needs, yet
(but if so, I can't really suggest adjustments, as I don't know that
code).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note: these changes don't make any difference on x86.
Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as
an hypercall argument.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Newer VIA CPUs have both 64-bit and VMX support. Enable them to be
recognized for these purposes, at once stripping off any 32-bit CPU
only bits from the respective CPU support file, and adding 64-bit ones
found in recent Linux.
This particularly implies untying the VMX == Intel assumption in a few
places.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Interrupt Stack Table entries in a 64bit TSS are a 1 based data
structure as far as hardware is concerned. As a result, the code
setting up stacks in subarch_percpu_traps_init() fills in the wrong
IST entries.
The result is that the MCE handler executes on the stack set up for
NMIs; the NMI handler executes on a stack set up for Double Faults,
and Double Faults are executed with a stack pointer set to 0.
Once the #DF handler starts to execute, it will usually take a page
fault looking up the address at 0xfffffffffffffff8, which will cause a
triple fault. If a guest has mapped a page in that location, then it
will have some state overwritten, but as the #DF handler always calls
panic(), this is not a problem the guest will have time to care about.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
At once, move the common (between 32- and 64-bit) definition of
machine_to_phys_mapping_valid to a common location.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch moves some more, mostly data, extern declarations into
header files. I haven't been as strict as I was with functions;
in particular there are a number of declarations of assembler labels
that are only used in one place. I've also left a few compat-mode
tricks, and all the magic in symbols.c
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
|
|
|
|
|
|
|
|
|
|
|
| |
... thus further shrinking overall size of struct arch_vcpu.
This has a minor effect on XEN_DOMCTL_{get,set}_ext_vcpucontext - for
HVM guests, some meaningless fields will no longer get stored or
retrieved: reads will now return zero, and writes are required to be
(mostly) zero (the same as was already done on x86-32).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is accomplished by splitting the guest_context member, which by
itself is larger than a page on x86-64. Quite a number of fields of
this structure is completely meaningless for HVM guests, and thus a
new struct pv_vcpu gets introduced, which is being overlaid with
struct hvm_vcpu in struct arch_vcpu. The one member that is mostly
responsible for the large size is trap_ctxt, which now gets allocated
separately (unless fitting on the same page as struct arch_vcpu, as is
currently the case for x86-32), and only for non-hvm, non-idle
domains.
This change pointed out a latent problem in arch_set_info_guest(),
which is permitted to be called on already initialized vCPU-s, but
so far copied the new state into struct arch_vcpu without (in this
case) actually going through all the necessary accounting/validation
steps. The logic gets changed so that the pieces that bypass
accounting
will at least be verified to be no different from the currently active
bits, and the whole change will fail in case they are. The logic does
*not* get adjusted here to do full error recovery, that is, partially
modified state continues to not get unrolled in case of failure.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This attempts to address all the concerns raised in
http://lists.xensource.com/archives/html/xen-devel/2009-11/msg01195.html,
but I'm nevertheless still not convinced that all aspects of the
injection handling really work reliably. In particular, while the
patch here on top of the fixes for the problems menioned in the
referenced mail also adds code to keep send_guest_trap() from
injecting multiple events at a time, I don't think the is the right
mechanism - it should be possible to handle NMI/MCE nested within
each other.
Another fix on top of the ones for the earlier described problems is
that the vCPU affinity restore logic didn't account for software
injected NMIs - these never set cpu_affinity_tmp, but due to it most
likely being different from cpu_affinity it would have got restored
(to a potentially random value) nevertheless.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
| |
Also add in a missing line in x86-64's do_page_walk().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
When MCE# happens, if the error has been contained/recovered by XEN
and it impacts one guest Domain(DOM0/HVM Guest/PV Guest), we will
inject the corresponding vMCE# into the impacted Domain. Guest OS will
go on its own recovery job if it has MCA handler.
Signed-off-by: Liping Ke <liping.ke@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Those patches based on AMD and SUN's MCA related jobs.
We have latest rebase after SUN's latest improvements.
We will have late following patches for recovery actions. This is a
basic framework for Intel.
Some implementation notes:
1) When error happens, if the error is fatal (pcc = 1) or can't be
recovered (pcc = 0, yet no good recovery methods),
for avoiding losing logs in DOM0, we will reset machine
immediately. Most of MCA MSRs are sticky. After reboot,
MCA polling mechanism will send vIRQ to DOM0 for logging.
2) When MCE# happens, all CPUs enter MCA context. The first CPU who
read&clear the error MSR bank will be this
MCE# owner. Necessary locks/synchronization will help to judge the
owner and select most severe error.
3) For convenience, we will select the most offending CPU to do most
of processing&recovery job.
4) MCE# happens, we will do three jobs:
a. Send vIRQ to DOM0 for logging
b. Send vMCE# to Impacted Guest (Currently Only inject to impacted
DOM0)
c. Guest vMCE MSR virtualization
5) Some further improvement/adds for newer CPUs might be done later
a) Connection with recovery actions (cpu/memory online/offline)
b) More software-recovery identification in severity_scan
c) More refines and tests for HVM might be done when needed.
This patch Enable basic MCA support For Intel
Signed-off-by: Jiang, Yunhong<yunhong.jiang@intel.com>
Signed-off-by: Ke, Liping <Liping.ke@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The major issue with supporting a significantly larger number of
physical CPUs appears to be the use of per-CPU GDT entries - at
present, x86-64 could support only up to 126 CPUs (with code changes
to also use the top-most GDT page, that would be 254). Instead of
trying to go with incremental steps here, by converting the GDT itself
to be per-CPU, limitations in that respect go away entirely.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This event channel will be notified when the domain transitions to the
suspended state, which can be much faster than raising VIRQ_DOM_EXC
and waiting for the notification to be propagated via xenstore.
No attempt is made here to prevent multiple subscribers (last one
wins), or to detect that the subscriber has gone away. Userspace tools
should take care.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
|
|
|
|
|
|
|
|
|
|
| |
Since xenctx cannot (for obvious reasons) display the context of
dom0's vCPU-s, here are the beginnings of a console based mechanism to
achieve the same (useful if dom0 hangs with one or more de-scheduled
vCPU-s). The stack handling obviously needs improvement, but the
register context should come out fine in all cases.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At once adjust the 2/4Mb page handling slightly in a few places (to
match the newly added code):
- when re-creating a large page mapping after finding that all small
page mappings in the respective area are using identical flags and
suitable MFNs, the virtual address was already incremented pas the
area to be dealt with, which needs to be accounted for in the
invocation of flush_area() in that path
- don't or-in/and-out _PAGE_PSE on non-present pages
- when comparing flags, try minimse the number of l1f_to_lNf()/
lNf_to_l1f() instances used
- instead of skipping a single page when encountering a big page
mapping equalling to what a small page mapping would establish, skip
to the next larger page boundary
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
Also, automatically generate const version of every guest handle
definition.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
| |
declare these (mostly static) argument structures 'const'.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
Looking at the Linux patch as an example, it adds more code and
complexity than it removes, for no obvious performance benefit.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
| |
in 64-bit pv guests and 32on64.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
| |
guest_cpu_user_regs(). Reduces complexity at little or no performance
cost (except on really old Intel P4 hardware where VMREAD/VMWRITE are
silly expensive).
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
| |
selectors. We have separate accessors for that now. It is now an
invariant that guest_cpu_user_regs()->{cs,ds,es,fs,gs,ss} are invalid
for an HVM guest.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__volatile__ in most places) and ensure we use volatile keyword
wherever we have an asm stmt that produces outputs but has other
unspecified side effects or dependencies other than the
explicitly-stated inputs.
Also added volatile in a few places where its not strictly necessary
but where it's unlikely to produce worse code and it makes our
intentions perfectly clear.
The original problem this patch fixes was tracked down by Joseph
Cihula <joseph.cihula@intel.com>.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Need to ensure all the code slice in the wakeup path still
existing. For this purpose, we have to use __devinit instead
of __init, since the former is null for CONFIG_HOTPLUG while
the latter always causes related code to be free-ed after
first boot.
Later when adding __init to some function, be sure to consider
wakeup case and cpu hotplug!
Signed-off-by <Kevin.Tian@intel.com>
|
|
|
|
|
| |
without undue squeezing.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
Properly handle MCE (connecting the exisiting, but so far unused
vendor specific handlers). HVM guests don't own CR4.MCE (and hence
can't suppress the exception) anymore, preventing silent machine
shutdown.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
| |
memset() calls (likewise a few replacements memcpy -> copy_page).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. x86/64 Xen now relocates itself to physical high memory. This is
useful if we have devices that need very low memory, or if in
future we want to grant a 1:1 mapping of low physical memory to a
special 'native client domain'.
2. We now only map low 16MB RAM statically. All other RAM is mapped
dynamically within the constraints of the e820 map. It is
recommended never to map MMIO regions, and this change means that
Xen now obeys this constraint.
3. The CPU bootup trampoline is now permanently installed at
0x90000. This is necessary prereq for CPU hotplug.
4. Start-of-day asm is generally cleaned up and diff between x86/32
and x86/64 is reduced.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
| |
Fixes suggested by Jan Beulich.
Signed-off-by: Keir Fraser <keir@xensource.com>
|