aboutsummaryrefslogtreecommitdiffstats
path: root/xen/arch/x86/x86_64/mm.c
Commit message (Collapse)AuthorAgeFilesLines
* x86: check for canonical address before doing page walksJan Beulich2013-10-111-1/+1
| | | | | | | | | | | | ... as there doesn't really exists any valid mapping for them. Particularly in the case of do_page_walk() this also avoids returning non-NULL for such invalid input. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: don't blindly create L3 tables for the direct mapJan Beulich2013-09-301-17/+12
| | | | | | | | | | | | Now that the direct map area can extend all the way up to almost the end of address space, this is wasteful. Also fold two almost redundant messages in SRAT parsing into one. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: add locking to map_pages_to_xen()Jan Beulich2013-07-151-62/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | While boot time calls don't need this, run time uses of the function which may result in L2 page tables getting populated need to be serialized to avoid two CPUs populating the same L2 (or L3) entry, overwriting each other's results. This is expected to fix what would seem to be a regression from commit b0581b92 ("x86: make map_domain_page_global() a simple wrapper around vmap()"), albeit that change only made more readily visible the already existing issue. This patch intentionally does not - add locking to the page table de-allocation logic in destroy_xen_mappings() (the only user having potential races here, msix_put_fixmap(), gets converted to use __set_fixmap() instead) - avoid races between super page splitting and reconstruction in map_pages_to_xen() (no such uses exist; races between multiple splitting attempts or between multiple reconstruction attempts are being taken care of) If we wanted to take care of these, we'd need to alter the behavior of virt_to_xen_l?e() - they would need to return with the lock held then. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: drop setup_idle_pagetable()Jan Beulich2013-06-121-8/+0
| | | | | | | | | | | With vcpu->domain->arch.perdomain_l3_pg no longer getting set up for the idle domain, this creates an invalid L4 entry (due to translating a NULL struct page_info pointer to a physical address). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
* x86: rework hypercall argument translation area setupJan Beulich2013-02-281-15/+5
| | | | | | | | | | | | | | ... using the new per-domain mapping management functions, adding destroy_perdomain_mapping() to the previously introduced pair. Rather than using an order-1 Xen heap allocation, use (currently 2) individual domain heap pages to populate space in the per-domain mapping area. Also fix a benign off-by-one mistake in is_compat_arg_xlat_range(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-211-1/+1
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* x86: support up to 16TbJan Beulich2013-01-231-4/+17
| | | | | | | | | | | | | | This mainly involves adjusting the number of L4 entries needing copying between page tables (which is now different between PV and HVM/idle domains), and changing the cutoff point and method when more than the supported amount of memory is found in a system. Since TMEM doesn't currently cope with the full 1:1 map not always being visible, it gets forcefully disabled in that case. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* x86: properly use map_domain_page() during page table manipulationJan Beulich2013-01-231-36/+27
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: properly use map_domain_page() during domain creation/destructionJan Beulich2013-01-231-11/+7
| | | | | | | | This involves no longer storing virtual addresses of the per-domain mapping L2 and L3 page tables. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: extend frame table virtual spaceJan Beulich2013-01-231-2/+2
| | | | | | | | | | ... to allow frames for up to 16Tb. At the same time, add the super page frame table coordinates to the comment describing the address space layout. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: introduce virt_to_xen_l1e()Jan Beulich2013-01-231-0/+22
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* miscellaneous cleanupJan Beulich2013-01-171-16/+11
| | | | | | | | | | | | | | | | | ... noticed while putting together the 16Tb support patches for x86. Briefly, this (in order of the changes below) - fixes an inefficiency in x86's context switch code (translations to/ from struct page are more involved than to/from MFNs) - drop unnecessary MFM-to-page conversions - drop a redundant call to destroy_xen_mappings() (an indentical call is being made a few lines up) - simplify a VA-to-MFN translation - drop dead code (several occurrences) - add a missing __init annotation Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* streamline guest copy operationsJan Beulich2012-12-101-3/+3
| | | | | | | | | | | | - use the variants not validating the VA range when writing back structures/fields to the same space that they were previously read from - when only a single field of a structure actually changed, copy back just that field where possible - consolidate copying back results in a few places Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when appropriateStefano Stabellini2012-10-171-1/+1
| | | | | | | | | | | | Note: these changes don't make any difference on x86. Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as an hypercall argument. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* x86-64: construct static, uniform parts of page tables at build timeJan Beulich2012-09-111-18/+0
| | | | | | | | ... rather than at boot time, removing unnecessary redundancy between EFI and legacy boot code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: construct static part of 1:1 mapping at build timeJan Beulich2012-09-111-2/+0
| | | | | | | | ... rather than at boot time, removing unnecessary redundancy between EFI and legacy boot code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: allow early use of fixmapsJan Beulich2012-09-111-0/+4
| | | | | | | | | As a prerequisite for adding an EHCI debug port based console implementation, set up the page tables needed for (a sub-portion of) the fixmaps together with other boot time page table construction. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86-64: Fix memory hotplug epfn upper limit test for updating the compat M2P ↵Malcolm Crossley2012-04-251-1/+1
| | | | | | | | | | | | | table The epfn is being compared to (RDWR_COMPAT_MPT_VIRT_END - RDWR_COMPAT_MPT_VIRT_START) without a 2 bit shift, resulting in the epfn being compared to the size of the RDWR_COMPAT_MPT table in bytes instead of the maximum page frame number that the RDWR_COMPAT_MPT table can map. Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* Introduce system_state variable.Keir Fraser2012-03-221-1/+1
| | | | | | | | | | Use it to replace x86-specific early_boot boolean variable. Also use it to detect suspend/resume case during cpu offline/online to avoid unnecessarily breaking vcpu and cpupool affinities. Signed-off-by: Keir Fraser <keir@xen.org> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* x86/mm: New sharing audit memopAndres Lagar-Cavilla2012-02-101-0/+2
| | | | | | | | | | | | | | | | | Remove costly mem_sharing audits from the inline path, and instead make them callable as a memop. Have the audit function return the number of errors detected. Update memshrtool to be able to trigger audits. Set sharing audits as enabled by default. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Tim Deegan <tim@xen.org>
* Use memops for mem paging, sharing, and access, instead of domctlsAndres Lagar-Cavilla2012-02-101-0/+23
| | | | | | | | | | | | | | | | | | | | | | | Per page operations in the paging, sharing, and access tracking subsystems are all implemented with domctls (e.g. a domctl to evict one page, or to share one page). Under heavy load, the domctl path reveals a lack of scalability. The domctl lock serializes dom0's vcpus in the hypervisor. When performing thousands of per-page operations on dozens of domains, these vcpus will spin in the hypervisor. Beyond the aggressive locking, an added inefficiency of blocking vcpus in the domctl lock is that dom0 is prevented from re-scheduling any of its other work-starved processes. We retain the domctl interface for setting up and tearing down paging/sharing/mem access for a domain. But we migrate all the per page operations to use the memory_op hypercalls (e.g XENMEM_*). Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla> Signed-off-by: Adin Scannell <adin@scannell.ca> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Tim Deegan <tim@xen.org>
* x86/mm: Check how many mfns are shared, in addition to how many are savedAndres Lagar-Cavilla2012-01-261-0/+7
| | | | | | | | | | | | | This patch also moves the existing sharing-related memory op to the correct location, and adds logic to the audit() method that uses the new information. This patch only provides the Xen implementation of the domctls. Signed-off-by: Andres Lagar-Cavilla <andres@scannell.ca> Signed-off-by: Adin Scannell <adin@scannell.ca> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
* x86/mm: remove 0x55 debug pattern from M2P tableTim Deegan2011-12-021-3/+6
| | | | | | | | It's not really any more useful than explicitly setting new M2P entries to the invalid value. Signed-off-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* x86: make run-time part of trampoline relocatableJan Beulich2011-08-191-1/+1
| | | | | | | | | | | | | | In order to eliminate an initial hack in the EFI boot code (where memory for the trampoline was just "claimed" instead of properly allocated), the trampoline code must no longer make assumption on the address at which it would be located. For the time being, the fixed address is being retained for the traditional multiboot path. As an additional benefit (at least from my pov) it allows confining the visibility of the BOOT_TRAMPOLINE definition to just the boot code. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86-64: EFI boot codeJan Beulich2011-06-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides introducing the relevant code paralleling parts of what is under xen/arch/x86/boot/, this adjusts the build logic so that with a single compilation two images (gzip-compressed ELF and EFI application) can get created. The EFI part of this depends on a new enough compiler (supposedly gcc 4.4.x and above, but so far only tested to work with 4.5.x) and a properly configured linker (must support the i386pep emulation). If either functionality is found to not be available, the EFI part of the build will simply be skipped. The patch adds all code to allow Xen and the (accordingly enabled) Dom0 kernel to boot, but doesn't allow Dom0 to make use of EFI runtime calls (this will be the subject of the next patch). Parts of the code were lifted from an earlier never published OS project of ours - whether respective license information needs to be added to the respective source file is unclear to me (I was told internally that adding a GPLv2 license header can be done if needed by the community). Open issues (not preventing this from being committed imo): The trampoline allocation and initialization isn't really nice. This is due to the trampoline needing to be placed at a fixed address, and hence making the trampoline relocatable would seem desirable here (as well as for BIOS-based booting, where the trampoline location needed to be adjusted a number of time already in the past, due to it colliding with firmware data). By excluding mem.S, edd.S, and video.S from copied trampoline (i.e. moving up wakeup.S? and making sure none of the symbols are used from EFI code), the effective trampoline size could at least be reduced. Should the mappings of [__XEN_VIRT_START, mbi.mem_upper) and [_end, __XEN_VIRT_START+BOOTSTRAP_MAP_BASE) be destroyed, despite non-EFI code also keeping them? Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: show page walk also for early page faultsJan Beulich2011-06-231-2/+0
| | | | | | | At once, move the common (between 32- and 64-bit) definition of machine_to_phys_mapping_valid to a common location. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: Disable set_gpfn_from_mfn until m2p table is allocated.Keir Fraser2011-06-101-0/+4
| | | | | | | This is a prerequisite for calling set_gpfn_from_mfn() unconditionally from free_heap_pages(). Signed-off-by: Keir Fraser <keir@xen.org>
* xen: Remove some initialised but otherwise unused variables.Olaf Hering2011-05-201-3/+1
| | | | | | Fixes the build under gcc-4.6 -Werror=unused-but-set-variable Signed-off-by: Olaf Hering <olaf@aepfle.de>
* x86: split struct vcpuJan Beulich2011-04-051-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | This is accomplished by splitting the guest_context member, which by itself is larger than a page on x86-64. Quite a number of fields of this structure is completely meaningless for HVM guests, and thus a new struct pv_vcpu gets introduced, which is being overlaid with struct hvm_vcpu in struct arch_vcpu. The one member that is mostly responsible for the large size is trap_ctxt, which now gets allocated separately (unless fitting on the same page as struct arch_vcpu, as is currently the case for x86-32), and only for non-hvm, non-idle domains. This change pointed out a latent problem in arch_set_info_guest(), which is permitted to be called on already initialized vCPU-s, but so far copied the new state into struct arch_vcpu without (in this case) actually going through all the necessary accounting/validation steps. The logic gets changed so that the pieces that bypass accounting will at least be verified to be no different from the currently active bits, and the whole change will fail in case they are. The logic does *not* get adjusted here to do full error recovery, that is, partially modified state continues to not get unrolled in case of failure. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: Remove _PAGE_NX defintiion (with implicit use of cpu_has_nx).Keir Fraser2011-03-281-0/+2
| | | | | | | | | Most users can use _PAGE_NX_BIT directly. The few genuine users in mm.c can do the cpu_has_nx check more clearly in other ways. Signed-off-by: Keir Fraser <keir@xen.org>
* x86: run-time callers of map_pages_to_xen() must check for errorsJan Beulich2011-03-091-11/+22
| | | | | | | | | | | | | | | Again, (out-of-memory) errors must not cause hypervisor crashes, and hence ought to be propagated. This also adjusts the cache attribute changing loop in get_page_from_l1e() to not go through an unnecessary iteration. While this could be considered mere cleanup, it is actually a requirement for the subsequent now necessary error recovery path. Also make a few functions static, easing the check for potential callers needing adjustment. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: don't BUG() post-boot in alloc_xen_pagetable()Jan Beulich2011-03-091-2/+12
| | | | | | | Instead, propagate the condition to the caller, all of which also get adjusted to check for that situation. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86-64: use PC-relative exception table entriesKeir Fraser2010-12-241-4/+1
| | | | | | | | | | | | | ... thus allowing to make the entries half their current size. Rather than adjusting all instances to the new layout, abstract the construction the table entries via a macro (paralleling a similar one in recent Linux). Also change the name of the section (to allow easier detection of missed cases) and merge the final resulting output sections into .data.read_mostly. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86_64: Make 32-bit-hypercall translate area per-vcpu.Keir Fraser2010-11-161-16/+15
| | | | | | | This is a prerequisite for allowing guest descheduling within a hypercall. Signed-off-by: Keir Fraser <keir@xen.org>
* x86: do away with the boot time low-memory 1:1 mappingKeir Fraser2010-11-091-0/+6
| | | | | | | | | | | | | | | By doing so, we're no longer restricted to be able to place all boot loader modules into the low 1Gb/4Gb (32-/64-bit) of memory, nor is there a dependency anymore on where the boot loader places the modules. We're also no longer restricted to copy the modules into a place below 4Gb, nor to put them all together into a single piece of memory. Further it allows even the 32-bit Dom0 kernel to be loaded anywhere in physical memory (except if it doesn't support PAE-above-4G). Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86_64: Ensure frame-table compression leaves MAX_ORDER alignedKeir Fraser2010-09-011-2/+8
| | | | | | | | | | contiguous ranges of page_info structs. This allows page-pointer arithmetic in places like our buddy allocator. This restriction was already implicitly guaranteed, but it is good to make it explicit in the pdx-related initialisation. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* iommu: New options iommu=dom-strict and iommu=dom0-passthroughKeir Fraser2010-07-091-10/+12
| | | | | | | | The former strips dom0 of its usual 1:1 mapping of all memory, and only provides it with mappings of its own memory, like any other domain. The latter is a new consistent name for iommu=passthrough. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* xen: msr safe cleanupKeir Fraser2010-06-111-3/+3
| | | | Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
* iommu: Specify access permissions to iommu_map_page().Keir Fraser2010-05-281-1/+1
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86: Pull dynamic memory allocation out of do_boot_cpu().Keir Fraser2010-05-181-9/+11
| | | | | | | | | | | | This has two advantages: (a) We can move the allocations to a context where we can handle failure. (b) We can implement matching deallocations on CPU offline. Only the idle vcpu structure is now not freed on CPU offline. This probably does not really matter. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86-64: fix hotplug fault handling for 32-bit domains' M2P rangeKeir Fraser2010-03-031-9/+6
| | | | | | | | - handle only when memory hotplug regions were actually found - fix off-by-one error in fault handler's sanity checking - use first L4 table entry Signed-off-by: Jan Beulich <jbeulich@novell.com>
* mem hotplug: Fix an incorrect sanity check in memory addKeir Fraser2010-02-041-5/+25
| | | | | | | | | | | Current, memory hot-add will fail if the new added memory is bigger than current max_pages. This is really a stupid checking, considering user may hot-add the biggest address riser card firstly. This patch fix this issue. It check if all new added memory is unpopulated, if yes, then it is ok. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* numa: Correct handling node with CPU populated but no memory populatedKeir Fraser2010-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | In changeset 20599, the node that has no memory populated is marked parsed, but not online. However, if there are CPU populated in this node, the corresponding CPU mapping (i.e. the cpu_to_node) is still setup to the offline node, this will cause trouble for memory allocation. This patch changes the init_cpu_to_node() and srant_detect_node(), to considering the node is offlined situation. Now the apicid_to_node is only used to keep the mapping between cpu/node provided by BIOS, and should not be used for memory allocation anymore. One thing left is to update the cpu_to_node mapping after memory populated by memory hot-add. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> This is a reintroduction of 20726:ddb8c5e798f9, which I incorrectly reverted in 20745:d3215a968db9 Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Revert 20726:ddb8c5e798f9Keir Fraser2010-01-041-1/+1
| | | | Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* numa: Correct handling node with CPU populated but no memory populatedKeir Fraser2009-12-281-1/+1
| | | | | | | | | | | | | | | | | | | | In changeset 20599, the node that has no memory populated is marked parsed, but not online. However, if there are CPU populated in this node, the corresponding CPU mapping (i.e. the cpu_to_node) is still setup to the offline node, this will cause trouble for memory allocation. This patch changes the init_cpu_to_node() and srant_detect_node(), to considering the node is offlined situation. Now the apicid_to_node is only used to keep the mapping between cpu/node provided by BIOS, and should not be used for memory allocation anymore. One thing left is to update the cpu_to_node mapping after memory populated by memory hot-add. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* Check m2p/compat m2p table for new added memory.Keir Fraser2009-12-211-1/+27
| | | | | | | | | As we allocate m2p/compat m2p/frametable page tables from new added memory, we want to make sure the new range can hold up the new page tables, this is because m2p/frametable need be aligned and cover more than the new-added range. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* Fix bugs in frame table setup function when memory hot-add.Keir Fraser2009-12-211-2/+4
| | | | Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* Clean up memory hotplug functions.Keir Fraser2009-12-211-24/+46
| | | | | | | | Move the range checking to mem_hotadd_check. Add more error handling, to restore the node information, unmap iommu page tables, destroy xen mapping when error happens. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* memory hotadd 7/7: hypercall supportKeir Fraser2009-12-111-0/+122
| | | | | | | | | | | The basic work flow to handle the memory hotadd is: Update node information Map new pages to xen 1:1 mapping Setup frametable for new memory range Setup m2p table for new memory range Put the new pages to domheap Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* memory hotadd 6/7: Allocate L3 table for whole direct maping range ifKeir Fraser2009-12-111-0/+26
| | | | | | | | | | memory hotplug is supported. Hot-added memory may need a new L4 entry for 1:1 mapping. This patch setup all L4 entry for 1:1 mapping if memory hotadd is needed, so that we don't need sync the guest page table in page fault handler. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>