aboutsummaryrefslogtreecommitdiffstats
path: root/xen/arch/x86/numa.c
Commit message (Collapse)AuthorAgeFilesLines
* xen: Remove x86_32 build target.Keir Fraser2012-09-121-8/+0
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* xen: enhance dump_numa outputDario Faggioli2012-06-061-2/+3
| | | | | | | | By showing the number of free pages on each node. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* eliminate cpu_set()Jan Beulich2011-11-081-1/+1
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* eliminate direct assignments of CPU masksJan Beulich2011-10-211-1/+1
| | | | | | | | | | | | | | | | Use cpumask_copy() instead of direct variable assignments for copying CPU masks. While direct assignments are not a problem when both sides are variables actually defined as cpumask_t (except for possibly copying *much* more than would actually need to be copied), they must not happen when the original variable is of type cpumask_var_t (which may have lass space allocated to it than a full cpumask_t). Eliminate as many of such assignments as possible (in several cases it's even possible to collapse two operations [copy then clear one bit] into one [cpumask_andnot()]), and thus set the way for reducing the allocation size in alloc_cpumask_var(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* introduce and use nr_cpu_ids and nr_cpumask_bitsJan Beulich2011-10-211-3/+3
| | | | | | | | | | | | | | | The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS, where necessary, get adjusted accordingly), while the latter is for the sole use of determining the allocation size when dynamically allocating CPU masks (done later in this series). Adjust accessors to use either of the two to bound their bitmap operations - which one gets used depends on whether accessing the bits in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more efficient. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Remove __init specifier from function declarations in header files.Keir Fraser2011-04-061-1/+1
| | | | | | | | | The specifier only needs to be added to the function's definition. At the same time, fix init_cpu_to_node() to be __init rather than __devinit (it is only called at boot time). Signed-off-by: Keir Fraser <keir@xen.org>
* Define new <pfn.h> header for PFN_{DOWN,UP} macros.Keir Fraser2011-03-231-0/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* move various bits into .init.* sectionsJan Beulich2011-03-091-1/+1
| | | | | | | | | | This also includes the removal of some entirely unused functions. The patch builds upon the makefile adjustments done in the earlier sent patch titled "move more kernel decompression bits to .init.* sections". Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Walking the page lists needs the page_alloc lockKeir Fraser2010-07-281-0/+2
| | | | | | | | | | | | | | | | | There are a few places in Xen where we walk a domain's page lists without holding the page_alloc lock. They race with updates to the page lists, which are normally rare but can be quite common under PoD when the domain is close to its memory limit and the PoD reclaimer is busy. This patch protects those places by taking the page_alloc lock. I think this is OK for the two debug-key printouts - they don't run from irq context and look deadlock-free. The tboot change seems safe too unless tboot shutdown functions are called from irq context or with the page_alloc lock held. The p2m one is the scariest but there are already code paths in PoD that take the page_alloc lock with the p2m lock held so it's no worse than existing code. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
* x86: Do not include apic.h/io_apic.h from asm/smp.hKeir Fraser2010-06-111-0/+5
| | | | | | ...and fix up the ensuing fall-out of implicit dependencies Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86 numa: Fix i386 to not do bogus mfn_to_virt(alloc_boot_pages(...))Keir Fraser2010-02-251-1/+9
| | | | Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: fix NUMA handling (c/s 20599:e5a757ce7845)Keir Fraser2010-01-081-8/+37
| | | | | | | | | | | | | | | | | | | | | | | | c/s 20599 caused the hash shift to become significantly smaller on systems with an SRAT like this (XEN) SRAT: Node 0 PXM 0 0-a0000 (XEN) SRAT: Node 0 PXM 0 100000-80000000 (XEN) SRAT: Node 1 PXM 1 80000000-d0000000 (XEN) SRAT: Node 1 PXM 1 100000000-130000000 Comined with the static size of the memnodemap[] array, NUMA got therefore disabled on such systems. The backport from Linux was really incomplete, as Linux much earlier had already introduced a dynamcially allocated memnodemap[]. Further, doing to/from pdx translations on addresses just past a valid range is not correct, as it may strip/fail to insert non-zero bits in this case. Finally, using 63 as the cover-it-all shift value is invalid on 32bit, since pdx values are unsigned long. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* numa: Correct handling node with CPU populated but no memory populatedKeir Fraser2010-01-051-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | In changeset 20599, the node that has no memory populated is marked parsed, but not online. However, if there are CPU populated in this node, the corresponding CPU mapping (i.e. the cpu_to_node) is still setup to the offline node, this will cause trouble for memory allocation. This patch changes the init_cpu_to_node() and srant_detect_node(), to considering the node is offlined situation. Now the apicid_to_node is only used to keep the mapping between cpu/node provided by BIOS, and should not be used for memory allocation anymore. One thing left is to update the cpu_to_node mapping after memory populated by memory hot-add. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> This is a reintroduction of 20726:ddb8c5e798f9, which I incorrectly reverted in 20745:d3215a968db9 Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Revert 20726:ddb8c5e798f9Keir Fraser2010-01-041-8/+4
| | | | Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* numa: Correct handling node with CPU populated but no memory populatedKeir Fraser2009-12-281-4/+8
| | | | | | | | | | | | | | | | | | | | In changeset 20599, the node that has no memory populated is marked parsed, but not online. However, if there are CPU populated in this node, the corresponding CPU mapping (i.e. the cpu_to_node) is still setup to the offline node, this will cause trouble for memory allocation. This patch changes the init_cpu_to_node() and srant_detect_node(), to considering the node is offlined situation. Now the apicid_to_node is only used to keep the mapping between cpu/node provided by BIOS, and should not be used for memory allocation anymore. One thing left is to update the cpu_to_node mapping after memory populated by memory hot-add. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* SRAT memory hotplug 2/2: Support overlapped and sparse node memory arrangement.Keir Fraser2009-12-091-28/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently xen hypervisor use nodes to keep start/end address of node. It assume memory among nodes has no overlap, this is not always true, especially if we have memory hotplug support in the system. This patch backport Linux kernel's memblks to support overlapping among node. The memblks will be used both for checking conflict, and caculate memnode_shift. Also, currently if there is no memory populated in a node when system booting, the node will be unparsed later, and the corresponding CPU's numa information will be removed also. This patch will keep the CPU information. One thing need notice is, currently we caculate memnode_shift with all memory, including un-populated ones. This should work if the smallest chuck is not so small. Other option can be flags in the page_info structure, etc. The memnodemap is changed from paddr to pdx, both to save space, and also because currently most access is from pfn. A flag is mem_hotplug added if there is hotplug memory range. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* xen: turn numa=on by defaultKeir Fraser2009-12-011-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I did some benchmark runs (lmbench & kernel compile) with a number of guests running in parallel to compare the performance of numa=on vs. numa=off. As soon as one starts to load the machine, the performance goes down in the numa=off case. The tests were done on an 8-node machine (4 cores each). lmbench (actually copying large amounts of memory) shows a dramatic dropdown, but I even noticed significant performance decrease for a tmpfs based Linux kernel compile. Here a summary of the data: lmbench's rd benchmark (normalized to native Linux (=100)): guests numa=off numa=on avg increase min avg max min avg max 1 78.0 102.3 7 37.4 45.6 62.0 90.6 102.3 110.9 124.4% 15 21.0 25.8 31.7 41.7 48.7 54.1 88.2% 23 13.4 17.5 23.2 25.0 28.0 30.1 60.2% kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time: guests numa=off numa=on increase 1 480.610 464.320 3.4% 7 482.109 461.721 4.2% 15 515.297 477.669 7.3% 23 548.427 495.180 9.7% again with 2 VCPUs and make -j2: 1 264.580 261.690 1.1% 7 279.763 258.907 7.7% 15 330.385 272.762 17.4% 23 463.510 390.547 15.7% (46 VCPUs on 32pCPUs) Selected tests on a 4-node machine showed similar behavior (7.9 % increase with 6 parallel guests on the 2 VCPU kernel compile benchmark). Note that this does not affect non-NUMA machines at all, since NUMA will be turned off again by the code if no NUMA topology is detected. Signed-off-by: Andre Przywara <andre.przywara@amd.com>
* Support physical CPU hot-add in xen hypervisorKeir Fraser2009-11-121-4/+4
| | | | | | | | | | | | This patch add CPU hot-add in system. a) It mark all CPU as possible when booting, if CONFIG_HOTPLUG_CPU is set. BTW, this will increase per_cpu area. b) When a CPU is added through hypercall, the CPU will be marked as present and offline, and the numa information is setup if numa is supported. The CPU will be brought to online by dom0 online explicitly. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
* Miscellaneous data placement adjustmentsKeir Fraser2009-10-281-1/+1
| | | | | | | Make various data items const or __read_mostly where possible/reasonable. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Add a single trigger for all diagnostic keyhandlersKeir Fraser2009-08-021-1/+7
| | | | | | | | | | Add a new keyhandler that triggers all the side-effect-free keyhandlers. This lets automated tests (and users) log the full set of keyhandlers without having to be aware of which ones might reboot the host. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86 numa: Fix left shift overflowsKeir Fraser2009-04-231-3/+3
| | | | Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
* x86-64: use MFNs for linking together pages on listsKeir Fraser2009-01-301-1/+1
| | | | | | | | | | | | | Unless more than 16Tb are going to ever be supported in Xen, this will allow reducing the linked list entries in struct page_info from 16 to 8 bytes. This doesn't modify struct shadow_page_info, yet, so in order to meet the constraints of that 'mirror' structure the list entry gets artificially forced to be 16 bytes in size. That workaround will be removed in a subsequent patch. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: debug key prints memory node info of each domainKeir Fraser2008-08-051-0/+26
| | | | | | | This patch will collect memory location (the domain has how many pages in different node) of each domain and display if you input debug key. Signed-off-by: Zhou Ting <ting.g.zhou@intel.com>
* x86: Make apicid 32 bits in preparation for x2APIC support.Keir Fraser2008-05-011-1/+1
| | | | Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
* x86 numa: Fix the overflow of physical addresses.Keir Fraser2008-03-171-3/+3
| | | | | | | | | If memory address >4G, the address will overflow in some NUMA code if using unsigned long to statement a physical address in PAE arch. Replace "unsigned long" with paddr_t to avoid overflow. Signed-off-by: Duan Ronghui <ronghui.duan@intel.com>
* x86: fix NUMA code for 32bitkfraser@localhost.localdomain2007-09-141-9/+9
| | | | | | | | | | I don't know how significant this is (most of the NUMA node data seems unused at this point), but anyway: enable proper operation of NUMA emulation and the fake NUMA node in case there's no SRAT table on x86-32. This will at least make the "Faking node ..." message not print confusing information anymore. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* [XEN] Use cpumask macros to update numa node masks.kfraser@localhost.localdomain2006-12-131-1/+1
| | | | Signed-off-by: Keir Fraser <keir@xensource.com>
* [POWERPC][XEN] Fix build break by using bitmap field in cpumask_t.Hollis Blanchard2006-12-121-1/+1
| | | | Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
* [XEN] Clean up NUMA stuff and disable by default ('numa=on' enables it).kfraser@localhost.localdomain2006-10-251-21/+27
| | | | Signed-off-by: Keir Fraser <keir@xensource.com>
* [XEN] Add basic NUMA/SRAT support to Xen from Linux 2.6.16.29.kfraser@localhost.localdomain2006-10-251-0/+302
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>