| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
... as being intended to be faster than MSR reads/writes.
In the case of emulate_privileged_op() also use these in favor of the
cached (but possibly stale) addresses from arch.pv_vcpu. This allows
entirely removing the code that was the subject of XSA-67.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For one setup_max_pdx(), when invoked a second time (after SRAT got
parsed), needs to start from the original max_page value again (using
the already adjusted one from the first invocation would not allow the
cut-off boundary to be moved up).
Second, _if_ we need to cut off some part of memory, we must not allow
this to also propagate into the NUMA accounting. Otherwise
cutoff_node() results in nodes_cover_memory() to find some parts of
memory apparently not having a PXM association, causing all SRAT info
to be ignored.
The only possibly problematic consumer of node_spanned_pages (the
meaning of which gets altered here in that it now also includes memory
Xen can't actively make use of) is XEN_SYSCTL_numainfo: At a first
glance the potentially larger reported memory size shouldn't confuse
tool stacks.
And finally we must not put our boot time modules at addresses which
(at that time) can't be guaranteed to be accessible later. This applies
to both the EFI boot loader and the module relocation code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Augment watchdog_setup() to be able to possibly return an error, and introduce
watchdog_enabled() as a better alternative to knowing the architectures
internal details.
This patch does not change the x86 implementaion, beyond making it compile.
For header files, some includes of xen/nmi.h were only for the watchdog
functions, so are replaced rather than adding an extra include of
xen/watchdog.h
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
With vcpu->domain->arch.perdomain_l3_pg no longer getting set up for
the idle domain, this creates an invalid L4 entry (due to translating
a NULL struct page_info pointer to a physical address).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
| |
Move for_each_set_bit from asm-x86/bitops.h to xen/bitops.h.
Replace #include <asm/bitops.h> with #include <xen/bitops.h> everywhere.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are BIOSes that want to map the IO-APIC MMIO region from some
ACPI method(s), and there is at least one BIOS flavor that wants to
use this mapping to clear an RTE's mask bit. While we can't allow the
latter, we can permit reads and simply drop write attempts, leveraging
the already existing infrastructure introduced for dealing with AMD
IOMMUs' representation as PCI devices.
This fixes an interrupt setup problem on a system where _CRS evaluation
involved the above described BIOS/ACPI behavior, and is expected to
also deal with a boot time crash of pv-ops Linux upon encountering the
same kind of system.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
This allow modules to set initializer functions.
This is used by Gcc instrumentation code for profiling arcs and test
coverage.
|
|
|
|
|
|
|
| |
The emacs variable to set the C style from a local variable block is
c-file-style, not c-set-style.
Signed-off-by: David Vrabel <david.vrabel@citrix.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Copying the contents of the VGA hole is at best pointless and at worst
dangerous. Booting Xen on Xen, it causes a very long delay as each
byte is referred to qemu.
Since we were already discarding the first 1MB of the relocated area,
just avoid copying it in the first place.
Reported-by: Jon Ludlam <jonathan.ludlam@eu.citrix.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mainly involves adjusting the number of L4 entries needing copying
between page tables (which is now different between PV and HVM/idle
domains), and changing the cutoff point and method when more than the
supported amount of memory is found in a system.
Since TMEM doesn't currently cope with the full 1:1 map not always
being visible, it gets forcefully disabled in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
|
| |
... to allow frames for up to 16Tb.
At the same time, add the super page frame table coordinates to the
comment describing the address space layout.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
... and use it as basis for a proper ioremap() on x86.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
Performance is not an issue with printk(), so let the function do
minimally more work and instead save a byte per affected format
specifier.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given that here, other than for VT-d, the MSI interface gets surfaced
through a normal PCI device, the code should use as much as possible of
the "normal" MSI support code.
Further, the code can (and should) follow the "normal" MSI code in
distinguishing the maskable and non-maskable cases at the IRQ
controller level rather than checking the respective flag in the
individual actors.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Low level hardware interface pieces adapted from Linux.
For setup information, see Linux'es Documentation/x86/earlyprintk.txt
and/or http://www.coreboot.org/EHCI_Debug_Port.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While triggered by the XSA-9 fix, this really is of more general use;
that fix just pointed out very sharply that the current situation
with all domain creation failures reported to user (tools) space as
-ENOMEM is very unfortunate (actively misleading users _and_ support
personnel).
Pull over the pointer <-> error code conversion infrastructure from
Linux, and use it in domain_create() and all it callers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some UEFI systems do not provide e820 information. In this case we
should take the detailed memory map provided by a multiboot-capable
loader, rather than rely on very conservative values from the e801
bios call. Using the latter on any modern system really hardly makes
good sense.
[Excellent candidate for 4.1 backport]
Signed-off-by: Keir Fraser <keir@xen.org>
Tested-by: Jonathan Tripathy <jonnyt@abpni.co.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is now quite easy to buy servers with incorrectly populated DIMMs,
especially with AMD Magny-Cours and Interlagos systems which have two
NUMA nodes per socket.
Currently, Xen will assign all CPUs on nodes without memory to node 0,
which leads to interestingly wrong NUMA information, causing numa
aware functionality such as alloc_domheap_pages() to get things very
wrong.
This patch splits the current logic to accept NUMA nodes without
memory, which corrects the accounting of CPUs to online NUMA nodes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- no need for calling nestedhvm_setup() explicitly (can be a normal
init-call, and can be __init)
- calling _xmalloc() for multi-page, page-aligned memory regions is
inefficient - use alloc_xenheap_pages() instead
- albeit an allocation error is unlikely here, add error handling
nevertheless (and have nestedhvm_vcpu_initialise() bail if an error
occurred during setup)
- nestedhvm_enabled() must no access d->arch.hvm_domain without first
checking that 'd' actually represents a HVM domain
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
Use it to replace x86-specific early_boot boolean variable.
Also use it to detect suspend/resume case during cpu offline/online
to avoid unnecessarily breaking vcpu and cpupool affinities.
Signed-off-by: Keir Fraser <keir@xen.org>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such
as the CPU crash notes will go into the xenheap, which tends to be in
upper memory. This causes problems on machines with more than 64GB
(or 4GB if no PAE support) of ram as the crashkernel physically cant
access the crash notes.
The solution is to force Xen to allocate certain structures in lower
memory. This is achieved by introducing two new command line
parameters; low_crashinfo and crashinfo_maxaddr. Because of the
potential impact on 32bit PV guests, and that this problem does not
exist for 64bit dom0 on 64bit Xen, this new functionality defaults to
the codebase's previous behavior, requiring the user to explicitly
add extra command line parameters to change the behavior.
This patch consists of 3 logically distinct but closely related
changes.
1) Add the two new command line parameters.
2) Change crash note allocation to use lower memory when instructed.
3) Change the conring buffer to use lower memory when instructed.
There result is that the crash notes and console ring will be placed
in lower memory so useful information can be recovered in the case of
a crash.
Changes since v1:
- Patch xen-command-line.markdown to document new options
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
Introduce a command parameter to set the watchtog timeout. Manually
specifying "watchdog_timeout=<seconds>" on the command line will also
turn the watchdog on. For consistency, move opt_watchdog into nmi.c
along with opt_watchdog_timeout.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
By making the macro properly advertise the use of the input symbol to
the compiler, it is no longer necessary for them to be global if
they're defined and used in just one source file.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Largely as a result of the continuing resistance of Linux maintainers
to accept a microcode loading patch for pv-ops Xen kernels, this
follows the suggested route and provides a means to load microcode
updates without the assistance of Dom0, thus also addressing eventual
problems in the hardware much earlier.
This leverages the fact that via the multiboot protocol another blob
of data can be easily added in the form of just an extra module. Since
microcode data cannot reliably be recognized by looking at the
provided data, this requires (in the non-EFI case) the use of a
command line parameter ("ucode=<number>") to identify which of the
modules is to be parsed for an eventual microcode update (in the EFI
case the module is being identified in the config file, and hence the
command line argument, if given, will be ignored).
This required to adjust the XSM module determination logic accordingly.
The format of the data to be provided is the raw binary blob already
used for AMD CPUs, and the output of the intel-microcode2ucode utility
for the Intel case (either the per-(family,model,stepping) file or -
to make things easier for distro-s integration-wise - simply the
concatenation of all of them).
In order to not convert the spin_lock() in microcode_update_cpu() (and
then obviously also all other uses on microcode_mutex) to
spin_lock_irqsave() (which would be undesirable for the hypercall
context in which the function also runs), the boot time handling gets
done using a tasklet (instead of using on_selected_cpus()).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
This includes the conversion from for_each_cpu_mask() to for_each-cpu().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS,
where necessary, get adjusted accordingly), while the latter is for the
sole use of determining the allocation size when dynamically allocating
CPU masks (done later in this series).
Adjust accessors to use either of the two to bound their bitmap
operations - which one gets used depends on whether accessing the bits
in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more
efficient.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AMD CPUs by default enable X86_FEATURE_TSC_RELIABLE, and depend upon a
later check to disable this feature if TSC drift is detected.
Unfortunately, this check is done in time.c:init_xen_time(), which is
done before any secondary CPUs are brought up, and is thus guaranteed
to succed.
This patch moves the check into its own function, and calls it after
cpus are brought up.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Forcibly enabling the MMCFG space on AMD Fam10 CPUs cannot be expected
to work since with the firmware not being aware of the address range
used it cannot possibly reserve the space in E820 or ACPI resources.
Hence we need to manually insert the range into the E820 table, and
enable the range only when the insertion actually works without
conflict.
Further, the actual enabling of the space is done from identify_cpu(),
which means that acpi_mmcfg_init() muts be called after that function
(and hance should not be called from acpi_boot_init()). Otherwise,
Dom0 would be able to use MMCFG, but Xen wouldn't.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Changeset 19632:b0966b6f5180 wasn't really complete: The Xen image
mapping doesn't end at _end, but a full 16Mb gets mapped during boot
(and never got unmapped so far), hence all of this space was subject
to alias mappings when it comes to cache attribute changes. Unmap all
full large pages between _end and the 16Mb boundary, and include all
other pages beyond _end when checking for aliases.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Besides introducing the relevant code paralleling parts of what is
under xen/arch/x86/boot/, this adjusts the build logic so that with a
single compilation two images (gzip-compressed ELF and EFI
application)
can get created. The EFI part of this depends on a new enough compiler
(supposedly gcc 4.4.x and above, but so far only tested to work with
4.5.x) and a properly configured linker (must support the i386pep
emulation). If either functionality is found to not be available, the
EFI part of the build will simply be skipped.
The patch adds all code to allow Xen and the (accordingly enabled)
Dom0 kernel to boot, but doesn't allow Dom0 to make use of EFI
runtime calls (this will be the subject of the next patch).
Parts of the code were lifted from an earlier never published OS
project of ours - whether respective license information needs to be
added to the respective source file is unclear to me (I was told
internally that adding a GPLv2 license header can be done if needed by
the community).
Open issues (not preventing this from being committed imo):
The trampoline allocation and initialization isn't really nice. This
is due to the trampoline needing to be placed at a fixed address, and
hence making the trampoline relocatable would seem desirable here (as
well as for BIOS-based booting, where the trampoline location needed
to be adjusted a number of time already in the past, due to it
colliding with firmware data).
By excluding mem.S, edd.S, and video.S from copied trampoline (i.e.
moving up wakeup.S? and making sure none of the symbols are used from
EFI code), the effective trampoline size could at least be reduced.
Should the mappings of [__XEN_VIRT_START, mbi.mem_upper) and
[_end, __XEN_VIRT_START+BOOTSTRAP_MAP_BASE) be destroyed, despite
non-EFI code also keeping them?
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intel new CPU supports SMEP (Supervisor Mode Execution
Protection). SMEP prevents software operating with CPL < 3 (supervisor
mode) from fetching instructions from any linear address with a valid
translation for which the U/S flag (bit 2) is 1 in every
paging-structure entry controlling the translation for the linear
address.
This patch enables SMEP in Xen to protect Xen hypervisor from
executing pv guest instructions, whose translation paging-structure
entries' U/S flags are all set.
Signed-off-by: Yang Wei <wei.y.yang@intel.com>
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Signed-off-by: Li Xin <xin.li@intel.com>
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
This patch moves some more, mostly data, extern declarations into
header files. I haven't been as strict as I was with functions;
in particular there are a number of declarations of assembler labels
that are only used in one place. I've also left a few compat-mode
tricks, and all the magic in symbols.c
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
|
|
|
|
|
|
|
| |
With no modular drivers, all CPU notifier setup is supposed to happen
during boot. There also is a respective comment in the function.=20
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
This also includes the removal of some entirely unused functions.
The patch builds upon the makefile adjustments done in the earlier
sent patch titled "move more kernel decompression bits to .init.*
sections".
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This was already inconsistent, so make them consistently quiet and
leave it to callers to log an error. Add suitable error logging to the
arch-specific CPU bringup loops,
In particular this avoids printing error on EBUSY, in which case
caller may want a silent retry loop.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
... decreasing cache footprint. As a prerequisite this requires making
cmdline_parse() a little more flexible.
Also remove a few variables altogether, and adjust sections
annotations for several others.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|