| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
By showing the number of free pages on each node.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use cpumask_copy() instead of direct variable assignments for copying
CPU masks. While direct assignments are not a problem when both sides
are variables actually defined as cpumask_t (except for possibly
copying *much* more than would actually need to be copied), they must
not happen when the original variable is of type cpumask_var_t (which
may have lass space allocated to it than a full cpumask_t). Eliminate
as many of such assignments as possible (in several cases it's even
possible to collapse two operations [copy then clear one bit] into one
[cpumask_andnot()]), and thus set the way for reducing the allocation
size in alloc_cpumask_var().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS,
where necessary, get adjusted accordingly), while the latter is for the
sole use of determining the allocation size when dynamically allocating
CPU masks (done later in this series).
Adjust accessors to use either of the two to bound their bitmap
operations - which one gets used depends on whether accessing the bits
in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more
efficient.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
The specifier only needs to be added to the function's definition.
At the same time, fix init_cpu_to_node() to be __init rather than
__devinit (it is only called at boot time).
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
This also includes the removal of some entirely unused functions.
The patch builds upon the makefile adjustments done in the earlier
sent patch titled "move more kernel decompression bits to .init.*
sections".
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few places in Xen where we walk a domain's page lists
without holding the page_alloc lock. They race with updates to the
page lists, which are normally rare but can be quite common under PoD
when the domain is close to its memory limit and the PoD reclaimer is
busy. This patch protects those places by taking the page_alloc lock.
I think this is OK for the two debug-key printouts - they don't run
from irq context and look deadlock-free. The tboot change seems safe
too unless tboot shutdown functions are called from irq context or
with the page_alloc lock held. The p2m one is the scariest but there
are already code paths in PoD that take the page_alloc lock with the
p2m lock held so it's no worse than existing code.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
|
|
|
|
|
|
| |
...and fix up the ensuing fall-out of implicit dependencies
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
c/s 20599 caused the hash shift to become significantly smaller on
systems with an SRAT like this
(XEN) SRAT: Node 0 PXM 0 0-a0000
(XEN) SRAT: Node 0 PXM 0 100000-80000000
(XEN) SRAT: Node 1 PXM 1 80000000-d0000000
(XEN) SRAT: Node 1 PXM 1 100000000-130000000
Comined with the static size of the memnodemap[] array, NUMA got
therefore disabled on such systems. The backport from Linux was really
incomplete, as Linux much earlier had already introduced a dynamcially
allocated memnodemap[].
Further, doing to/from pdx translations on addresses just past a valid
range is not correct, as it may strip/fail to insert non-zero bits in
this case.
Finally, using 63 as the cover-it-all shift value is invalid on 32bit,
since pdx values are unsigned long.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In changeset 20599, the node that has no memory populated is marked
parsed, but not online. However, if there are CPU populated in this
node, the corresponding CPU mapping (i.e. the cpu_to_node) is still
setup to the offline node, this will cause trouble for memory
allocation.
This patch changes the init_cpu_to_node() and srant_detect_node(), to
considering the node is offlined situation.
Now the apicid_to_node is only used to keep the mapping between
cpu/node provided by BIOS, and should not be used for memory
allocation anymore.
One thing left is to update the cpu_to_node mapping after memory
populated by memory hot-add.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
This is a reintroduction of 20726:ddb8c5e798f9, which I incorrectly
reverted in 20745:d3215a968db9
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In changeset 20599, the node that has no memory populated is marked
parsed, but not online. However, if there are CPU populated in this
node, the corresponding CPU mapping (i.e. the cpu_to_node) is still
setup to the offline node, this will cause trouble for memory
allocation.
This patch changes the init_cpu_to_node() and srant_detect_node(), to
considering the node is offlined situation.
Now the apicid_to_node is only used to keep the mapping between
cpu/node provided by BIOS, and should not be used for memory
allocation anymore.
One thing left is to update the cpu_to_node mapping after memory
populated by memory hot-add.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently xen hypervisor use nodes to keep start/end address of
node. It assume memory among nodes has no overlap, this is not always
true, especially if we have memory hotplug support in the system.
This patch backport Linux kernel's memblks to support overlapping
among node. The memblks will be used both for checking conflict, and
caculate memnode_shift.
Also, currently if there is no memory populated in a node when system
booting, the node will be unparsed later, and the corresponding CPU's
numa information will be removed also. This patch will keep the CPU
information.
One thing need notice is, currently we caculate memnode_shift with all
memory, including un-populated ones. This should work if the smallest
chuck is not so small. Other option can be flags in the page_info
structure, etc.
The memnodemap is changed from paddr to pdx, both to save space, and
also because currently most access is from pfn.
A flag is mem_hotplug added if there is hotplug memory range.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I did some benchmark runs (lmbench & kernel compile) with a number of
guests running in parallel to compare the performance of numa=on vs.
numa=off. As soon as one starts to load the machine, the performance
goes down in the numa=off case. The tests were done on an 8-node
machine (4 cores each). lmbench (actually copying large amounts of
memory) shows a dramatic dropdown, but I even noticed significant
performance decrease for a tmpfs based Linux kernel compile. Here a
summary of the data:
lmbench's rd benchmark (normalized to native Linux (=100)):
guests numa=off numa=on avg increase
min avg max min avg max
1 78.0 102.3
7 37.4 45.6 62.0 90.6 102.3 110.9 124.4%
15 21.0 25.8 31.7 41.7 48.7 54.1 88.2%
23 13.4 17.5 23.2 25.0 28.0 30.1 60.2%
kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time:
guests numa=off numa=on increase
1 480.610 464.320 3.4%
7 482.109 461.721 4.2%
15 515.297 477.669 7.3%
23 548.427 495.180 9.7%
again with 2 VCPUs and make -j2:
1 264.580 261.690 1.1%
7 279.763 258.907 7.7%
15 330.385 272.762 17.4%
23 463.510 390.547 15.7% (46 VCPUs on 32pCPUs)
Selected tests on a 4-node machine showed similar behavior (7.9 %
increase with 6 parallel guests on the 2 VCPU kernel compile
benchmark).
Note that this does not affect non-NUMA machines at all, since NUMA
will be turned off again by the code if no NUMA topology is detected.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch add CPU hot-add in system.
a) It mark all CPU as possible when booting, if CONFIG_HOTPLUG_CPU is
set. BTW, this will increase per_cpu area.
b) When a CPU is added through hypercall, the CPU will be marked as
present and offline, and the numa information is setup if numa is
supported. The CPU will be brought to online by dom0 online explicitly.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
| |
Make various data items const or __read_mostly where
possible/reasonable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
Add a new keyhandler that triggers all the side-effect-free
keyhandlers. This lets automated tests (and users) log the full set
of keyhandlers without having to be aware of which ones might reboot
the host.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unless more than 16Tb are going to ever be supported in Xen, this will
allow reducing the linked list entries in struct page_info from 16 to
8 bytes.
This doesn't modify struct shadow_page_info, yet, so in order to meet
the constraints of that 'mirror' structure the list entry gets
artificially forced to be 16 bytes in size. That workaround will be
removed in a subsequent patch.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
This patch will collect memory location (the domain has how many pages
in different node) of each domain and display if you input debug key.
Signed-off-by: Zhou Ting <ting.g.zhou@intel.com>
|
|
|
|
| |
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
|
|
|
|
|
|
|
|
|
| |
If memory address >4G, the address will overflow in some NUMA code if
using unsigned long to statement a physical address in PAE arch.
Replace "unsigned long" with paddr_t to avoid overflow.
Signed-off-by: Duan Ronghui <ronghui.duan@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
I don't know how significant this is (most of the NUMA node data seems
unused at this point), but anyway: enable proper operation of NUMA
emulation and the fake NUMA node in case there's no SRAT table on
x86-32. This will at least make the "Faking node ..." message not
print confusing information anymore.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
| |
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
|