aboutsummaryrefslogtreecommitdiffstats
path: root/xen/arch/x86/setup.c
Commit message (Collapse)AuthorAgeFilesLines
* x86: use {rd,wr}{fs,gs}base when availableJan Beulich2013-10-111-0/+3
| | | | | | | | | | | | ... as being intended to be faster than MSR reads/writes. In the case of emulate_privileged_op() also use these in favor of the cached (but possibly stale) addresses from arch.pv_vcpu. This allows entirely removing the code that was the subject of XSA-67. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: fix memory cut-off when using PFN compressionJan Beulich2013-09-121-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | For one setup_max_pdx(), when invoked a second time (after SRAT got parsed), needs to start from the original max_page value again (using the already adjusted one from the first invocation would not allow the cut-off boundary to be moved up). Second, _if_ we need to cut off some part of memory, we must not allow this to also propagate into the NUMA accounting. Otherwise cutoff_node() results in nodes_cover_memory() to find some parts of memory apparently not having a PXM association, causing all SRAT info to be ignored. The only possibly problematic consumer of node_spanned_pages (the meaning of which gets altered here in that it now also includes memory Xen can't actively make use of) is XEN_SYSCTL_numainfo: At a first glance the potentially larger reported memory size shouldn't confuse tool stacks. And finally we must not put our boot time modules at addresses which (at that time) can't be guaranteed to be accessible later. This applies to both the EFI boot loader and the module relocation code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
* watchdog: Move watchdog from being x86 specific to common codeAndrew Cooper2013-08-131-0/+1
| | | | | | | | | | | | | | | Augment watchdog_setup() to be able to possibly return an error, and introduce watchdog_enabled() as a better alternative to knowing the architectures internal details. This patch does not change the x86 implementaion, beyond making it compile. For header files, some includes of xen/nmi.h were only for the watchdog functions, so are replaced rather than adding an extra include of xen/watchdog.h Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: drop setup_idle_pagetable()Jan Beulich2013-06-121-1/+0
| | | | | | | | | | | With vcpu->domain->arch.perdomain_l3_pg no longer getting set up for the idle domain, this creates an invalid L4 entry (due to translating a NULL struct page_info pointer to a physical address). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
* xen: move for_each_set_bit to xen/bitops.hStefano Stabellini2013-05-081-1/+1
| | | | | | | | Move for_each_set_bit from asm-x86/bitops.h to xen/bitops.h. Replace #include <asm/bitops.h> with #include <xen/bitops.h> everywhere. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: allow Dom0 read-only access to IO-APICsJan Beulich2013-05-021-0/+3
| | | | | | | | | | | | | | | | | There are BIOSes that want to map the IO-APIC MMIO region from some ACPI method(s), and there is at least one BIOS flavor that wants to use this mapping to clear an RTE's mask bit. While we can't allow the latter, we can permit reads and simply drop write attempts, leveraging the already existing infrastructure introduced for dealing with AMD IOMMUs' representation as PCI devices. This fixes an interrupt setup problem on a system where _CRS evaluation involved the above described BIOS/ACPI behavior, and is expected to also deal with a boot time crash of pv-ops Linux upon encountering the same kind of system. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* gcov: Call constructors during initializationFrediano Ziglio2013-02-211-0/+2
| | | | | | This allow modules to set initializer functions. This is used by Gcc instrumentation code for profiling arcs and test coverage.
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-211-1/+1
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* x86/setup: don't relocate the VGA hole.Tim Deegan2013-02-141-5/+3
| | | | | | | | | | | | | Copying the contents of the VGA hole is at best pointless and at worst dangerous. Booting Xen on Xen, it causes a very long delay as each byte is referred to qemu. Since we were already discarding the first 1MB of the relocated area, just avoid copying it in the first place. Reported-by: Jon Ludlam <jonathan.ludlam@eu.citrix.com> Signed-off-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* x86: debugging code for testing 16Tb support on smaller memory systemsJan Beulich2013-02-081-1/+20
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: Fix some over-long source lines.Keir Fraser2013-01-301-1/+2
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86: support up to 16TbJan Beulich2013-01-231-3/+50
| | | | | | | | | | | | | | This mainly involves adjusting the number of L4 entries needing copying between page tables (which is now different between PV and HVM/idle domains), and changing the cutoff point and method when more than the supported amount of memory is found in a system. Since TMEM doesn't currently cope with the full 1:1 map not always being visible, it gets forcefully disabled in that case. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* x86: extend frame table virtual spaceJan Beulich2013-01-231-2/+2
| | | | | | | | | | ... to allow frames for up to 16Tb. At the same time, add the super page frame table coordinates to the comment describing the address space layout. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* implement vmap()Jan Beulich2012-11-221-0/+1
| | | | | | | ... and use it as basis for a proper ioremap() on x86. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* printk: prefer %#x et at over 0x%xJan Beulich2012-09-211-2/+2
| | | | | | | | | Performance is not an issue with printk(), so let the function do minimally more work and instead save a byte per affected format specifier. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* amd iommu: use base platform MSI implementationJan Beulich2012-09-141-0/+3
| | | | | | | | | | | | | | | Given that here, other than for VT-d, the MSI interface gets surfaced through a normal PCI device, the code should use as much as possible of the "normal" MSI support code. Further, the code can (and should) follow the "normal" MSI code in distinguishing the maskable and non-maskable cases at the IRQ controller level rather than checking the respective flag in the individual actors. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Wang <wei.wang2@amd.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: HYPERVISOR_VIRT_END is always defined. Remove ifdef'ery.Keir Fraser2012-09-121-2/+0
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* x86: Remove CONFIG_COMPAT ifdef'ery from arch/x86 -- it is always defined.Keir Fraser2012-09-121-4/+0
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* xen: Remove x86_32 build target.Keir Fraser2012-09-121-73/+0
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* console: add EHCI debug port based serial consoleJan Beulich2012-09-111-0/+1
| | | | | | | | | | | Low level hardware interface pieces adapted from Linux. For setup information, see Linux'es Documentation/x86/earlyprintk.txt and/or http://www.coreboot.org/EHCI_Debug_Port. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
* make domain_create() return a proper error codeJan Beulich2012-09-031-1/+2
| | | | | | | | | | | | | | While triggered by the XSA-9 fix, this really is of more general use; that fix just pointed out very sharply that the current situation with all domain creation failures reported to user (tools) space as -ENOMEM is very unfortunate (actively misleading users _and_ support personnel). Pull over the pointer <-> error code conversion infrastructure from Linux, and use it in domain_create() and all it callers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Prefer multiboot-provided e820 over bios-provided e801 memory info.Keir Fraser2012-08-281-11/+11
| | | | | | | | | | | | | Some UEFI systems do not provide e820 information. In this case we should take the detailed memory map provided by a multiboot-capable loader, rather than rely on very conservative values from the e801 bios call. Using the latter on any modern system really hardly makes good sense. [Excellent candidate for 4.1 backport] Signed-off-by: Keir Fraser <keir@xen.org> Tested-by: Jonathan Tripathy <jonnyt@abpni.co.uk>
* x86/numa: Correct assumption that each NUMA node has memoryAndrew Cooper2012-07-191-1/+3
| | | | | | | | | | | | | | | | | It is now quite easy to buy servers with incorrectly populated DIMMs, especially with AMD Magny-Cours and Interlagos systems which have two NUMA nodes per socket. Currently, Xen will assign all CPUs on nodes without memory to node 0, which leads to interestingly wrong NUMA information, causing numa aware functionality such as alloc_domheap_pages() to get things very wrong. This patch splits the current logic to accept NUMA nodes without memory, which corrects the accounting of CPUs to online NUMA nodes. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* x86: fix nested HVM initializationJan Beulich2012-05-301-2/+0
| | | | | | | | | | | | | | | - no need for calling nestedhvm_setup() explicitly (can be a normal init-call, and can be __init) - calling _xmalloc() for multi-page, page-aligned memory regions is inefficient - use alloc_xenheap_pages() instead - albeit an allocation error is unlikely here, add error handling nevertheless (and have nestedhvm_vcpu_initialise() bail if an error occurred during setup) - nestedhvm_enabled() must no access d->arch.hvm_domain without first checking that 'd' actually represents a HVM domain Signed-off-by: Jan Beulich <JBeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
* Introduce system_state variable.Keir Fraser2012-03-221-4/+4
| | | | | | | | | | Use it to replace x86-specific early_boot boolean variable. Also use it to detect suspend/resume case during cpu offline/online to avoid unnecessarily breaking vcpu and cpupool affinities. Signed-off-by: Keir Fraser <keir@xen.org> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* KEXEC: Allocate crash structures in low memoryAndrew Cooper2012-03-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such as the CPU crash notes will go into the xenheap, which tends to be in upper memory. This causes problems on machines with more than 64GB (or 4GB if no PAE support) of ram as the crashkernel physically cant access the crash notes. The solution is to force Xen to allocate certain structures in lower memory. This is achieved by introducing two new command line parameters; low_crashinfo and crashinfo_maxaddr. Because of the potential impact on 32bit PV guests, and that this problem does not exist for 64bit dom0 on 64bit Xen, this new functionality defaults to the codebase's previous behavior, requiring the user to explicitly add extra command line parameters to change the behavior. This patch consists of 3 logically distinct but closely related changes. 1) Add the two new command line parameters. 2) Change crash note allocation to use lower memory when instructed. 3) Change the conring buffer to use lower memory when instructed. There result is that the crash notes and console ring will be placed in lower memory so useful information can be recovered in the case of a crash. Changes since v1: - Patch xen-command-line.markdown to document new options Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* NMI: Command line parameter for watchdog timeoutAndrew Cooper2012-03-081-4/+1
| | | | | | | | | | Introduce a command parameter to set the watchtog timeout. Manually specifying "watchdog_timeout=<seconds>" on the command line will also turn the watchdog on. For consistency, move opt_watchdog into nmi.c along with opt_watchdog_timeout. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* x86: reduce scope of some symbols used with reset_stack_and_jump()Jan Beulich2011-12-211-1/+1
| | | | | | | | | By making the macro properly advertise the use of the input symbol to the compiler, it is no longer necessary for them to be global if they're defined and used in just one source file. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: move some function scope statics into .init.dataJan Beulich2011-12-211-3/+3
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/microcode: enable boot time (pre-Dom0) loadingJan Beulich2011-12-011-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Largely as a result of the continuing resistance of Linux maintainers to accept a microcode loading patch for pv-ops Xen kernels, this follows the suggested route and provides a means to load microcode updates without the assistance of Dom0, thus also addressing eventual problems in the hardware much earlier. This leverages the fact that via the multiboot protocol another blob of data can be easily added in the form of just an extra module. Since microcode data cannot reliably be recognized by looking at the provided data, this requires (in the non-EFI case) the use of a command line parameter ("ucode=<number>") to identify which of the modules is to be parsed for an eventual microcode update (in the EFI case the module is being identified in the config file, and hence the command line argument, if given, will be ignored). This required to adjust the XSM module determination logic accordingly. The format of the data to be provided is the raw binary blob already used for AMD CPUs, and the output of the intel-microcode2ucode utility for the Intel case (either the per-(family,model,stepping) file or - to make things easier for distro-s integration-wise - simply the concatenation of all of them). In order to not convert the spin_lock() in microcode_update_cpu() (and then obviously also all other uses on microcode_mutex) to spin_lock_irqsave() (which would be undesirable for the hypercall context in which the function also runs), the boot time handling gets done using a tasklet (instead of using on_selected_cpus()). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* eliminate first_cpu() etcJan Beulich2011-11-081-4/+4
| | | | | | | | This includes the conversion from for_each_cpu_mask() to for_each-cpu(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* introduce and use nr_cpu_ids and nr_cpumask_bitsJan Beulich2011-10-211-7/+15
| | | | | | | | | | | | | | | The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS, where necessary, get adjusted accordingly), while the latter is for the sole use of determining the allocation size when dynamically allocating CPU masks (done later in this series). Adjust accessors to use either of the two to bound their bitmap operations - which one gets used depends on whether accessing the bits in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more efficient. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* PCI multi-seg: introduce notion of PCI segmentsJan Beulich2011-09-181-0/+2
| | | | Signed-off-by: Jan Beulich <jbeulich@suse.com>
* x86/time: verify_tsc_reliability() can be run as a generic initcall.Keir Fraser2011-09-171-3/+0
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* xen: Move tsc reliability check until after CPUs have bootedGeorge Dunlap2011-09-171-0/+3
| | | | | | | | | | | | | AMD CPUs by default enable X86_FEATURE_TSC_RELIABLE, and depend upon a later check to disable this feature if TSC drift is detected. Unfortunately, this check is done in time.c:init_xen_time(), which is done before any secondary CPUs are brought up, and is thus guaranteed to succed. This patch moves the check into its own function, and calls it after cpus are brought up. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* x86-64/MMCFG: finally make Fam10 enabling workJan Beulich2011-07-251-0/+2
| | | | | | | | | | | | | | | | Forcibly enabling the MMCFG space on AMD Fam10 CPUs cannot be expected to work since with the firmware not being aware of the address range used it cannot possibly reserve the space in E820 or ACPI resources. Hence we need to manually insert the range into the E820 table, and enable the range only when the insertion actually works without conflict. Further, the actual enabling of the space is done from identify_cpu(), which means that acpi_mmcfg_init() muts be called after that function (and hance should not be called from acpi_boot_init()). Otherwise, Dom0 would be able to use MMCFG, but Xen wouldn't. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86-64: properly handle alias mappings beyond _endJan Beulich2011-07-141-0/+5
| | | | | | | | | | | Changeset 19632:b0966b6f5180 wasn't really complete: The Xen image mapping doesn't end at _end, but a full 16Mb gets mapped during boot (and never got unmapped so far), hence all of this space was subject to alias mappings when it comes to cache attribute changes. Unmap all full large pages between _end and the 16Mb boundary, and include all other pages beyond _end when checking for aliases. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86-64: EFI boot codeJan Beulich2011-06-281-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides introducing the relevant code paralleling parts of what is under xen/arch/x86/boot/, this adjusts the build logic so that with a single compilation two images (gzip-compressed ELF and EFI application) can get created. The EFI part of this depends on a new enough compiler (supposedly gcc 4.4.x and above, but so far only tested to work with 4.5.x) and a properly configured linker (must support the i386pep emulation). If either functionality is found to not be available, the EFI part of the build will simply be skipped. The patch adds all code to allow Xen and the (accordingly enabled) Dom0 kernel to boot, but doesn't allow Dom0 to make use of EFI runtime calls (this will be the subject of the next patch). Parts of the code were lifted from an earlier never published OS project of ours - whether respective license information needs to be added to the respective source file is unclear to me (I was told internally that adding a GPLv2 license header can be done if needed by the community). Open issues (not preventing this from being committed imo): The trampoline allocation and initialization isn't really nice. This is due to the trampoline needing to be placed at a fixed address, and hence making the trampoline relocatable would seem desirable here (as well as for BIOS-based booting, where the trampoline location needed to be adjusted a number of time already in the past, due to it colliding with firmware data). By excluding mem.S, edd.S, and video.S from copied trampoline (i.e. moving up wakeup.S? and making sure none of the symbols are used from EFI code), the effective trampoline size could at least be reduced. Should the mappings of [__XEN_VIRT_START, mbi.mem_upper) and [_end, __XEN_VIRT_START+BOOTSTRAP_MAP_BASE) be destroyed, despite non-EFI code also keeping them? Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: Enable Supervisor Mode Execution Protection (SMEP)Keir Fraser2011-06-031-0/+10
| | | | | | | | | | | | | | | | | | Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP prevents software operating with CPL < 3 (supervisor mode) from fetching instructions from any linear address with a valid translation for which the U/S flag (bit 2) is 1 in every paging-structure entry controlling the translation for the linear address. This patch enables SMEP in Xen to protect Xen hypervisor from executing pv guest instructions, whose translation paging-structure entries' U/S flags are all set. Signed-off-by: Yang Wei <wei.y.yang@intel.com> Signed-off-by: Shan Haitao <haitao.shan@intel.com> Signed-off-by: Li Xin <xin.li@intel.com> Signed-off-by: Keir Fraser <keir@xen.org>
* xen: remove more declarations from C files.Tim Deegan2011-05-271-5/+1
| | | | | | | | | | This patch moves some more, mostly data, extern declarations into header files. I haven't been as strict as I was with functions; in particular there are a number of declarations of assembler labels that are only used in one place. I've also left a few compat-mode tricks, and all the magic in symbols.c Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
* A little bit of SMP boot code cleanupJan Beulich2011-05-011-4/+3
| | | | Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Nested Virtualization core implementationcegger2011-02-281-0/+2
| | | | | | | Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Acked-by: Eddie Dong <eddie.dong@intel.com> Acked-by: Tim Deegan <Tim.Deegan@citrix.com> Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
* move register_cpu_notifier() into .init.textJan Beulich2011-04-021-1/+1
| | | | | | | With no modular drivers, all CPU notifier setup is supposed to happen during boot. There also is a respective comment in the function.=20 Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Remove unmaintained Access Control Module (ACM) from hypervisor.Keir Fraser2011-03-251-2/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* Define new <pfn.h> header for PFN_{DOWN,UP} macros.Keir Fraser2011-03-231-0/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* move various bits into .init.* sectionsJan Beulich2011-03-091-1/+3
| | | | | | | | | | This also includes the removal of some entirely unused functions. The patch builds upon the makefile adjustments done in the earlier sent patch titled "move more kernel decompression bits to .init.* sections". Signed-off-by: Jan Beulich <jbeulich@novell.com>
* cpu hotplug: Core functions are quiet on failure.Keir Fraser2011-01-141-1/+5
| | | | | | | | | | | This was already inconsistent, so make them consistently quiet and leave it to callers to log an error. Add suitable error logging to the arch-specific CPU bringup loops, In particular this avoids printing error on EBUSY, in which case caller may want a silent retry loop. Signed-off-by: Keir Fraser <keir@xen.org>
* Use bool_t for various boolean variablesKeir Fraser2010-12-241-11/+6
| | | | | | | | | | | ... decreasing cache footprint. As a prerequisite this requires making cmdline_parse() a little more flexible. Also remove a few variables altogether, and adjust sections annotations for several others. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xen.org>
* x86: adjust x2apic section placementKeir Fraser2010-12-151-1/+1
| | | | Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: x2apic: Large cleanupKeir Fraser2010-12-091-5/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>