xen/xen - xen

	Commit message (Collapse)	Author	Age	Files	Lines
*	xen: support RAM at addresses 0 and 4096	Ian Campbell	2013-09-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the mapping from pages to zones causes the page at zero to go into zone -1 and the page at 4096 to go into zone 0, which is the Xen zone (confusing various assertions). Arrange instead for the mapping to be such that zone 0 is always reserved for Xen and all other pages map to a zone >= 1. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Cc: jbeulich@suse.com Acked-by: Tim Deegan <tim@xen.org>
*	xen: gate split heap code on its own config option rather than !X86	Ian Campbell	2013-08-20	1	-1/+1
\| \| \| \| \| \| \| \| \|	I'm going to want to disable this for 64 bit ARM. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	use SMP barrier in common code dealing with shared memory protocols	Ian Campbell	2013-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Xen currently makes no strong distinction between the SMP barriers (smp_mb etc) and the regular barrier (mb etc). In Linux, where we inherited these names from having imported Linux code which uses them, the SMP barriers are intended to be sufficient for implementing shared-memory protocols between processors in an SMP system while the standard barriers are useful for MMIO etc. On x86 with the stronger ordering model there is not much practical difference here but ARM has weaker barriers available which are suitable for use as SMP barriers. Therefore ensure that common code uses the SMP barriers when that is all which is required. On both ARM and x86 both types of barrier are currently identical so there is no actual change. A future patch will change smp_mb to a weaker barrier on ARM. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	hypervisor/xen/tools: Remove the XENMEM_get_oustanding_pages and provide the ↵	Konrad Rzeszutek Wilk	2013-05-14	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	data via xc_phys_info During the review of the patches it was noticed that there exists a race wherein the 'free_memory' value consists of information from two hypercalls. That is the XEN_SYSCTL_physinfo and XENMEM_get_outstanding_pages. The free memory the host has available for guest is the difference between the 'free_pages' (from XEN_SYSCTL_physinfo) and 'outstanding_pages'. As they are two hypercalls many things can happen in between the execution of them. This patch resolves this by eliminating the XENMEM_get_outstanding_pages hypercall and providing the free_pages and outstanding_pages information via the xc_phys_info structure. It also removes the XSM hooks and adds locking as needed. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir.xen@gmail.com>
*	x86: reserve pages when SandyBridge integrated graphics	Xudong Hao	2013-03-26	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SNB graphics devices have a bug that prevent them from accessing certain memory ranges, namely anything below 1M and in the pages listed in the table. Xen does not initialize below 1MB to heap, i.e. below 1MB pages don't be allocated, so it's unnecessary to reserve memory below the 1 MB mark that has not already been reserved. So reserve those pages listed in the table at xen boot if set detect a SNB gfx device on the CPU to avoid GPU hangs. Signed-off-by: Xudong Hao <xudong.hao@intel.com> Acked-by: Keir Fraser <keir@xen.org>
*	mmu: Introduce XENMEM_claim_pages (subop of memory ops)	Dan Magenheimer	2013-03-11	1	-2/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When guests memory consumption is volatile (multiple guests ballooning up/down) we are presented with the problem of being able to determine exactly how much memory there is for allocation of new guests without negatively impacting existing guests. Note that the existing models (xapi, xend) drive the memory consumption from the tool-stack and assume that the guest will eventually hit the memory target. Other models, such as the dynamic memory utilized by tmem, do this differently - the guest drivers the memory consumption (up to the d->max_pages ceiling). With dynamic memory model, the guest frequently can balloon up and down as it sees fit. This presents the problem to the toolstack that it does not know atomically how much free memory there is (as the information gets stale the moment the d->tot_pages information is provided to the tool-stack), and hence when starting a guest can fail during the memory creation process. Especially if the process is done in parallel. In a nutshell what we need is a atomic value of all domains tot_pages during the allocation of guests. Naturally holding a lock for such a long time is unacceptable. Hence the goal of this hypercall is to attempt to atomically and very quickly determine if there are sufficient pages available in the system and, if so, "set aside" that quantity of pages for future allocations by that domain. Unlike an existing hypercall such as increase_reservation or populate_physmap, specific physical pageframes are not assigned to the domain because this cannot be done sufficiently quickly (especially for very large allocations in an arbitrarily fragmented system) and so the existing mechanisms result in classic time-of-check-time-of-use (TOCTOU) races. One can think of claiming as similar to a "lazy" allocation, but subsequent hypercalls are required to do the actual physical pageframe allocation. Note that one of effects of this hypercall is that from the perspective of other running guests - suddenly there is a new guest occupying X amount of pages. This means that when we try to balloon up they will hit the system-wide ceiling of available free memory (if the total sum of the existing d->max_pages >= host memory). This is OK - as that is part of the overcommit. What we DO NOT want to do is dictate their ceiling should be (d->max_pages) as that is risky and can lead to guests OOM-ing. It is something the guest needs to figure out. In order for a toolstack to "get" information about whether a domain has a claim and, if so, how large, and also for the toolstack to measure the total system-wide claim, a second subop has been added and exposed through domctl and libxl (see "xen: XENMEM_claim_pages: xc"). == Alternative solutions == There has been a variety of discussion whether the problem hypercall is solving can be done in user-space, such as: - For all the existing guest, set their d->max_pages temporarily to d->tot_pages and create the domain. This forces those domains to stay at their current consumption level (fyi, this is what the tmem freeze call is doing). The disadvantage of this is that needlessly forces the guests to stay at the memory usage instead of allowing it to decide the optimal target. - Account only using d->max_pages of how much free memory there is. This ignores ballooning changes and any over-commit scenario. This is similar to the scenario where the sum of all d->max_pages (and the one to be allocated now) on the host is smaller than the available free memory. As such it ignores the over-commit problem. - Provide a ring/FIFO along with event channel to notify an userspace daemon of guests memory consumption. This daemon can then provide up-to-date information to the toolstack of how much free memory there is. This duplicates what the hypervisor is already doing and introduced latency issues and catching breath for the toolstack as there might be millions of these updates on heavily used machine. There might not be any quiescent state ever and the toolstack will heavily consume CPU cycles and not ever provide up-to-date information. It has been noted that this claim mechanism solves the underlying problem (slow failure of domain creation) for a large class of domains but not all, specifically not handling (but also not making the problem worse for) PV domains that specify the "superpages" flag, and 32-bit PV domains on large RAM systems. These will be addressed at a later time. Code overview: Though the hypercall simply does arithmetic within locks, some of the semantics in the code may be a bit subtle. The key variables (d->unclaimed_pages and total_unclaimed_pages) starts at zero if no claim has yet been staked for any domain. (Perhaps a better name is "claimed_but_not_yet_possessed" but that's a bit unwieldy.) If no claim hypercalls are executed, there should be no impact on existing usage. When a claim is successfully staked by a domain, it is like a watermark but there is no record kept of the size of the claim. Instead, d->unclaimed_pages is set to the difference between d->tot_pages and the claim. When d->tot_pages increases or decreases, d->unclaimed_pages atomically decreases or increases. Once d->unclaimed_pages reaches zero, the claim is satisfied and d->unclaimed pages stays at zero -- unless a new claim is subsequently staked. The systemwide variable total_unclaimed_pages is always the sum of d->unclaimed_pages, across all domains. A non-domain- specific heap allocation will fail if total_unclaimed_pages exceeds free (plus, on tmem enabled systems, freeable) pages. Claim semantics could be modified by flags. The initial implementation had three flag, which discerns whether the caller would like tmem freeable pages to be considered in determining whether or not the claim can be successfully staked. This in later patches was removed and there are no flags. A claim can be cancelled by requesting a claim with the number of pages being zero. A second subop returns the total outstanding claimed pages systemwide. Note: Save/restore/migrate may need to be modified, else it can be documented that all claims are cancelled. This patch of the proposed XENMEM_claim_pages hypercall/subop, takes into account review feedback from Jan and Keir and IanC and Matthew Daley, plus some fixes found via runtime debugging. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
*	Fix emacs local variable block to use correct C style variable.	David Vrabel	2013-02-21	1	-1/+1
\| \| \| \| \| \| \|	The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
*	xen/arm: move setup_mm right after setup_pagetables	Stefano Stabellini	2013-02-15	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the moment we destroy the DTB mappings we have in setup_pagetables and we restore them only in setup_mm. Move setup_mm right after setup_pagetables. This ensures we have a valid DTB mapping while running the subsequent initialization code. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> [ ijc -- s/atag_paddr/fdt_paddr/ ] Committed-by: Ian Campbell <ian.campbell@citrix.com>
*	x86: debugging code for testing 16Tb support on smaller memory systems	Jan Beulich	2013-02-08	1	-0/+20
\| \| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: support up to 16Tb	Jan Beulich	2013-01-23	1	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This mainly involves adjusting the number of L4 entries needing copying between page tables (which is now different between PV and HVM/idle domains), and changing the cutoff point and method when more than the supported amount of memory is found in a system. Since TMEM doesn't currently cope with the full 1:1 map not always being visible, it gets forcefully disabled in that case. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
*	xen: centralize accounting for domain tot_pages	Dan Magenheimer	2012-12-10	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	Provide and use a common function for all adjustments to a domain's tot_pages counter in anticipation of future and/or out-of-tree patches that must adjust related counters atomically. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Committed-by: Keir Fraser <keir@xen.org>
*	More efficient TLB-flush filtering in alloc_heap_pages().	Keir Fraser	2012-10-15	1	-13/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than per-cpu filtering for every page in a super-page allocation, simply remember the most recent TLB timestamp across all allocated pages, and filter on that, just once, at the end of the function. For large-CPU systems, doing 2MB allocations during domain creation, this cuts down the domain creation time massively. TODO: It may make sense to move the filtering out into some callers, such as memory.c:populate_physmap() and memory.c:increase_reservation(), so that the filtering can be moved outside their loops, too. Signed-off-by: Keir Fraser <keir@xen.org>
*	printk: prefer %#x et at over 0x%x	Jan Beulich	2012-09-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	Performance is not an issue with printk(), so let the function do minimally more work and instead save a byte per affected format specifier. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	In most of the codebase, use CONFIG_X86 in place of __i386__\|\|__x86_64__	Keir Fraser	2012-09-12	1	-1/+1
\| \| \| \|	Signed-off-by: Keir Fraser <keir@xen.org>
*	remove ia64	Jan Beulich	2012-04-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	It retains IA64-specific bits in code imported from elsewhere (e.g. ACPI, EFI) as well as in the public headers. It also doesn't touch the tools, mini-os, and unmodified_drivers sub-trees. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	low-mem virq: Parentheses around ternary operator in check_low_mem_virq()	Keir Fraser	2012-03-09	1	-1/+1
\| \| \| \|	Signed-off-by: Keir Fraser <keir@xen.org>
*	Low mem virq incremental adjustments	Andres Lagar-Cavilla	2012-03-08	1	-2/+5
\| \| \| \| \| \| \| \| \| \|	Consider tmem before firing the virq. Add .gitignore rune. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Committed-by: Keir Fraser <keir@xen.org>
*	Global virq for low memory situations	Andres Lagar-Cavilla	2012-03-01	1	-0/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a low memory threshold on the Xen heap is reached, we fire a global dom0 virq. If someone's listening, they can free up some more memory. The low threshold is configurable via the command line token 'low_mem_virq_limit", and defaults to 64MiB. If the user specifies zero via the command line, the virq is disabled. We define a new virq VIRQ_ENOMEM. Potential listeners include squeezed, xenballoond, or anything else that can be fired through xencommons. We error-check the low mem virq against initial available heap (after dom0 allocation), to avoid firing immediately. Virq issuing is controlled by a hysteresis algorithm: when memory dips below a threshold, the virq is issued and the next virq will fire when memory shrinks another order of magnitude. The virq will not fire again in the current "band" until memory grows over the next higher order of magnitude. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Committed-by: Keir Fraser <keir@xen.org>
*	xenpaging: Fix c/s 23507:0a29c8c3ddf7 ("update machine_to_phys_mapping[] ↵	Keir Fraser	2011-11-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	during page deallocation") This patch clobbers page owner in free_heap_pages() before we are finished using it. This means that a subsequent test to determine whether it is safe to avoid safety TLB flushes incorrectly always determines that it is safe to do so. The fix is simple: we can defer the original patch's work until after we are done with the page-owner field. Thanks to Christian Limpach for spotting this one. Signed-off-by: Keir Fraser <keir@xen.org>
*	ia64: fix the build	Jan Beulich	2011-11-15	1	-0/+5
\| \| \| \| \| \| \|	This addresses all remaining build problems introduced over the last several months. Signed-off-by: Jan Beulich <jbeulich@suse.com>
*	eliminate cpumask accessors referencing NR_CPUS	Jan Beulich	2011-10-21	1	-5/+7
\| \| \| \| \| \| \|	... in favor of using the new, nr_cpumask_bits-based ones. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	X86 MCE: Prevent malicious guest access broken page again	Keir Fraser	2011-09-30	1	-0/+14
\| \| \| \| \| \| \|	To avoid recursive mce. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Committed-by: Keir Fraser <keir@xen.org>
*	xenpaging: update machine_to_phys_mapping[] during page deallocation	Keir Fraser	2011-06-10	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The machine_to_phys_mapping[] array needs updating during page deallocation. If that page is allocated again, a call to get_gpfn_from_mfn() will still return an old gfn from another guest. This will cause trouble because this gfn number has no or different meaning in the context of the current guest. This happens when the entire guest ram is paged-out before xen_vga_populate_vram() runs. Then XENMEM_populate_physmap is called with gfn 0xff000. A new page is allocated with alloc_domheap_pages. This new page does not have a gfn yet. However, in guest_physmap_add_entry() the passed mfn maps still to an old gfn (perhaps from another old guest). This old gfn is in paged-out state in this guests context and has no mfn anymore. As a result, the ASSERT() triggers because p2m_is_ram() is true for p2m_ram_paging* types. If the machine_to_phys_mapping[] array is updated properly, both loops in guest_physmap_add_entry() turn into no-ops for the new page and the mfn/gfn mapping will be done at the end of the function. If XENMEM_add_to_physmap is used with XENMAPSPACE_gmfn, get_gpfn_from_mfn() will return an appearently valid gfn. As a result, guest_physmap_remove_page() is called. The ASSERT in p2m_remove_page triggers because the passed mfn does not match the old mfn for the passed gfn. Signed-off-by: Olaf Hering <olaf@aepfle.de>
*	X86: offline/broken page handler for pod cache	Liu, Jinsong	2011-04-07	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	When offline a page, or, when a broken page occur, the page maybe populated, or, may at pod cache. This patch is to handle the offline/broken page at pod cache. It scan pod cache, if hit, remove and replace it, and then put the offline/broken page to page_offlined_list/page_broken_list Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
*	X86: Fix mce offline page bug	Liu, Jinsong	2011-04-07	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \|	c/s 19913 break mce offline page logic: For page_state_is(pg, free), it's impossible to trigger the case; For page_state_is(pg, offlined), it in fact didn't offline related page; This patch fix the bug, and remove an ambiguous comment. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
*	Revert 22706:ca10302ac285 "xenpaging: update machine_to_phys_mapping[] ↵	Keir Fraser	2011-01-12	1	-5/+1
\| \| \| \| \| \| \| \|	during page deallocation" Crashes during boot, due to m2p tables not being set up early enough. Signed-off-by: Keir Fraser <keir@xen.org>
*	xenpaging: update machine_to_phys_mapping[] during page deallocation	Keir Fraser	2011-01-11	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The machine_to_phys_mapping[] array needs updating during page deallocation. If that page is allocated again, a call to get_gpfn_from_mfn() will still return an old gfn from another guest. This will cause trouble because this gfn number has no or different meaning in the context of the current guest. This happens when the entire guest ram is paged-out before xen_vga_populate_vram() runs. Then XENMEM_populate_physmap is called with gfn 0xff000. A new page is allocated with alloc_domheap_pages. This new page does not have a gfn yet. However, in guest_physmap_add_entry() the passed mfn maps still to an old gfn (perhaps from another old guest). This old gfn is in paged-out state in this guests context and has no mfn anymore. As a result, the ASSERT() triggers because p2m_is_ram() is true for p2m_ram_paging* types. If the machine_to_phys_mapping[] array is updated properly, both loops in guest_physmap_add_entry() turn into no-ops for the new page and the mfn/gfn mapping will be done at the end of the function. If XENMEM_add_to_physmap is used with XENMAPSPACE_gmfn, get_gpfn_from_mfn() will return an appearently valid gfn. As a result, guest_physmap_remove_page() is called. The ASSERT in p2m_remove_page triggers because the passed mfn does not match the old mfn for the passed gfn. Signed-off-by: Olaf Hering <olaf@aepfle.de>
*	Use bool_t for various boolean variables	Keir Fraser	2010-12-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	... decreasing cache footprint. As a prerequisite this requires making cmdline_parse() a little more flexible. Also remove a few variables altogether, and adjust sections annotations for several others. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xen.org>
*	x86: Add -Wredundant-decls to Xen build flags.	Keir Fraser	2010-12-02	1	-0/+1
\| \| \| \| \| \|	Fix up the fallout. Signed-off-by: Keir Fraser <keir@xen.org>
*	page_alloc: Hold heap_lock while adjusting page states to/from PGC_state_free.	Keir Fraser	2010-09-13	1	-4/+4
\| \| \| \| \| \|	This avoids races with buddy-merging logic in free_heap_pages(). Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	page_alloc: Check neighbouring chunks belong to same NUMA node before	Keir Fraser	2010-09-13	1	-23/+10
\| \| \| \| \| \| \|	merging in free_heap_pages(). This avoids more convoluted logic in init_heap_pages(). Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	numa: Attempt more efficient NUMA allocation in hypervisor by default.	Keir Fraser	2010-08-04	1	-21/+47
\| \| \| \| \| \| \| \| \|	1. Try to allocate from nodes containing CPUs which a guest can be scheduled on. 2. Remember which node we allocated from last, and round-robin allocations among above-mentioned nodes. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	Add an exact-node request flag for mem allocations.	Keir Fraser	2010-07-05	1	-0/+8
\| \| \| \|	Signed-off-by : Dulloor <dulloor@gmail.com>
*	tmem: skip special case in alloc_heap_pages() if tmem holds no pages	Keir Fraser	2010-06-23	1	-3/+6
\| \| \| \|	Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
*	x86: fix Dom0 booting time regression	Keir Fraser	2010-05-04	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately the changes in c/s 21035 caused boot time to go up significantly on certain large systems. To rectify this without going back to the old behavior, introduce a new memory allocation flag so that Dom0 allocations can exhaust non-DMA memory before starting to consume DMA memory. For the latter, the behavior introduced in aforementioned c/s gets retained, while for the former we can now even try larger chunks first. This builds on the fact that alloc_chunk() gets called with non- increasing 'max_pages' arguments, end hence it can store locally the allocation order last used (as larger order allocations can't succeed during subsequent invocations if they failed once). Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	Host Numa information in dom0	Keir Fraser	2010-04-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	'xm info' command now also gives the cpu topology & host numa information. This will be later used to build guest numa support. The patch basically changes physinfo sysctl, and adds topology_info & numa_info sysctls, and also changes the python & libxc code accordingly. Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
*	Reduce boot-time memory fragmentation	Keir Fraser	2010-03-15	1	-4/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On certain NUMA configurations having init_node_heap() consume the first few pages of a new node's memory for internal data structures leads to unnecessary memory fragmentation, which can - with sufficiently many nodes - result in there not remaining enough memory below 4G for Dom0 to set up its swiotlb and PCI-consistent buffers. Since alloc_boot_pages() generally consumes from the end of available regions, make init_node_heap() prefer the end of such regions too (so that fragmentation occurs at only one end of a region). (Adjustment from first version: Use the tail of the region when the end addresses alignment is less or equal to the beginning one's, not just when it's less.) Further, in order to prefer allocations from higher memory locations, insert memory regions in reverse order in end_boot_allocator(), with the exception of inserting one region residing on the boot CPU's node first (for the statically allocated structures - used for the first node seen - to be used for this node). Finally, reduce MAX_ORDER on x86 to the maximum useful value (1Gb), so that the reservation of a page on node boundaries (again leading to fragmentation) can be avoided as much as possible (having node boundaries on less the 1Gb aligned addresses is expected to be rare, if found in practice at all). Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	tmem: When failing allocs from "midsize alloc zone", try the tmem	Keir Fraser	2010-02-17	1	-3/+3
\| \| \| \| \| \|	pools rather than fail outright. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	When tmem is enabled, reserve a fraction of memory	Keir Fraser	2010-02-15	1	-1/+18
\| \| \| \| \| \| \|	for allocations of 0<order<9 to avoid fragmentation issues. Signed-off by: Dan Magenheimer <dan.magenheimer@oracle.com>
*	Don't scrub broken pages	Keir Fraser	2010-02-08	1	-0/+3
\| \| \| \| \| \| \|	Don't touch the poison pages when scrub the pages. Consuming poison page will contaminate the CPU context and may cause system crash. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
*	tboot: fix S3 issue for Intel Trusted Execution Technology.	Keir Fraser	2010-02-03	1	-2/+2
\| \| \| \| \| \| \|	Those unmapped pages cause page fault when MACing them and finally cause S3 failure. Signed-off-by: Shane Wang <shane.wang@intel.com>
*	Replace process_pending_timers() with process_pending_softirqs().	Keir Fraser	2009-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \|	This ensures that any critical softirqs are handled in a timely manner (e.g., TIME_CALIBRATE_SOFTIRQ) while still avoiding being preempted by the scheduler (by SCHEDULE_SOFTIRQ), which is the reason for avoiding use of do_softirq() directly. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	This patch defines a new PGT type called PGT_shared_page and a new synthetic	Keir Fraser	2009-12-17	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	domain called 'dom_cow'. In order to share a page, the type needs to be changed to PGT_shared_page and the owner to dom_dow. Only pages with PGT_none, and no type count are allowed to become sharable. Conversly, sharable pages can only be made 'private' if type count equals one. page_make_sharable() and page_make_private() handle these transitions. Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
*	Track free pages live rather than count pages in all nodes/zones	Keir Fraser	2009-12-08	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to fix a livelock condition in tmem that occurs only when the system is totally out of memory requires the ability to easily determine if all zones in all nodes are empty, and this must be checked at a fairly high frequency. So to avoid walking all the zones in all the nodes each time, I'd like a fast way to determine if "free_pages" is zero. This patch tracks the sum of the free pages in all nodes/zones. Since I think the value is modified only when heap_lock is held, it need not be atomic. I don't know this for sure, but suspect this will be useful in other future memory utilization code, e.g. page sharing. This has had limited testing, though I did drive free memory down to zero and up and down a few times with debug on and no asserts were triggered. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
*	tmem: printk too chatty when tmem enabled	Keir Fraser	2009-11-23	1	-2/+4
\| \| \| \| \| \| \| \| \|	Two gdprintk's that are rarely encountered with tmem disabled are frequent but meaningless when tmem is enabled. Printing these tens-to-hundreds of times per second (in certain circumstances even higher) slows down domain execution. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
*	Fix hypervisor crash with unpopulated NUMA nodes	Keir Fraser	2009-10-07	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On NUMA systems with memory-less nodes Xen crashes quite early in the hypervisor (while initializing the heaps). This is not an issue if this happens to be the last node, but "inner" nodes trigger this reliably. On multi-node processors it is much more likely to leave a node unequipped. The attached patch fixes this by enumerating the node via the node_online_map instead of counting from 0 to num_nodes. The resulting NUMA setup is still somewhat strange, but at least it does not crash. In lowlevel/xc/xc.c there is again this enumeration bug, but I suppose we cannot access the HV's node_online_map from this context, so the xm info output is not correct (but xm debug-keys H is). I plan to rework the handling of memory-less nodes later. Signed-off-by: Andre Przywara <andre.przywara@amd.com>
*	Introduce new flavour of map_domain_page()	Keir Fraser	2009-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a variant of map_domain_page() directly getting passed a struct page_info * argument, based on the observation that in many places the argument to this function so far simply was the result of page_to_mfn(). This is meaningful for the x86-64 case where map_domain_page() really just is an invocation of mfn_to_virt(), and hence the combined mfn_to_virt(page_to_mfn()) now represents a needless round trip conversion compressed -> uncompressed -> compressed of the MFN representation. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	Fix an obviously inverted check in offline_page()	Keir Fraser	2009-09-09	1	-1/+1
\| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	properly __initdata-annotate command line option string buffers	Keir Fraser	2009-08-31	1	-1/+1
\| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	Add a single trigger for all diagnostic keyhandlers	Keir Fraser	2009-08-02	1	-3/+14
\| \| \| \| \| \| \| \| \| \|	Add a new keyhandler that triggers all the side-effect-free keyhandlers. This lets automated tests (and users) log the full set of keyhandlers without having to be aware of which ones might reboot the host. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>