| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the direct map area can extend all the way up to almost the
end of address space, this is wasteful.
Also fold two almost redundant messages in SRAT parsing into one.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This logic came with the other NUMA logic from Linux 2.6.16 in c/s
11893:f312c2d01d8b. It appears that the Xen memory management
subsystem does not suffer from the expressed problems. Furthermore,
NUMA nodes with no memory are now quite easy to find, and are not BIOS
bugs in the SRAT ACPI table.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is now quite easy to buy servers with incorrectly populated DIMMs,
especially with AMD Magny-Cours and Interlagos systems which have two
NUMA nodes per socket.
Currently, Xen will assign all CPUs on nodes without memory to node 0,
which leads to interestingly wrong NUMA information, causing numa
aware functionality such as alloc_domheap_pages() to get things very
wrong.
This patch splits the current logic to accept NUMA nodes without
memory, which corrects the accounting of CPUs to online NUMA nodes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
Use their proper counterparts in include/acpi/actbl*.h instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS,
where necessary, get adjusted accordingly), while the latter is for the
sole use of determining the allocation size when dynamically allocating
CPU masks (done later in this series).
Adjust accessors to use either of the two to bound their bitmap
operations - which one gets used depends on whether accessing the bits
in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more
efficient.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All logical processors with APIC ID values of 255 and greater will
have their APIC reported through Processor X2APIC structure (type-9
entry type) and all logical processors with APIC ID less than 255 will
have their APIC reported through legacy Processor Local APIC (type-0
entry type) only. This is the same case even for NMI structure
reporting.
The Processor X2APIC Affinity structure provides the association
between the X2APIC ID of a logical processor and the proximity domain
to which the logical processor belongs.
This patch adds 2 new subtables to MADT and one new subtable to SRAT.
This patch also changes x86_acpiid_to_apicid from u8 to u32 for x2APIC
ID, and changes mp_register_lapic to accept 32-bit id. But there are
still some 8-bit apic id hardcode and assumptions in Xen code, it
needs to be fixed in future.
Signed-off-by: Weidong Han <weidong.han@intel.com>
|
|
|
|
|
|
|
| |
Otherwise, pass-through code may call memory allocation functions with
invalid node IDs, causing the allocations to fail.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently xen hypervisor use nodes to keep start/end address of
node. It assume memory among nodes has no overlap, this is not always
true, especially if we have memory hotplug support in the system.
This patch backport Linux kernel's memblks to support overlapping
among node. The memblks will be used both for checking conflict, and
caculate memnode_shift.
Also, currently if there is no memory populated in a node when system
booting, the node will be unparsed later, and the corresponding CPU's
numa information will be removed also. This patch will keep the CPU
information.
One thing need notice is, currently we caculate memnode_shift with all
memory, including un-populated ones. This should work if the smallest
chuck is not so small. Other option can be flags in the page_info
structure, etc.
The memnodemap is changed from paddr to pdx, both to save space, and
also because currently most access is from pfn.
A flag is mem_hotplug added if there is hotplug memory range.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
| |
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch add CPU hot-add in system.
a) It mark all CPU as possible when booting, if CONFIG_HOTPLUG_CPU is
set. BTW, this will increase per_cpu area.
b) When a CPU is added through hypercall, the CPU will be marked as
present and offline, and the numa information is setup if numa is
supported. The CPU will be brought to online by dom0 online explicitly.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
| |
Make various data items const or __read_mostly where
possible/reasonable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduces a virtual space conserving transformation on the MFN thus
far used to index 1:1 mapping and frame table, removing the largest
range of contiguous bits (below the most significant one) which are
zero for all valid MFNs from the MFN representation, to be used to
index into those arrays, thereby cutting the virtual range these
tables must cover approximately by half with each bit removed.
Since this should account for hotpluggable memory (in order to not
requiring a re-write when that gets supported), the determination of
which bits are candidates for removal must not be based on the E820
information, but instead has to use the SRAT. That in turn requires a
change to the ordering of steps done during early boot.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
That change converted the (wrong) assumption of contiguous nodes'
memory to a similarly wrong one of assuming discontiguous memory (i.e.
each node having separate E820 table entries). The code ought to be
able to deal with both, though, and I hope this change makes it so.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alex Williamson <alex.williamson@hp.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently compare the sum of the pages found in the SRAT table to
the address of the highest memory page found via the e820 table to
validate the SRAT. This is completely bogus if there's any kind of
discontiguous memory, where the sum of the pages could be much smaller
than the address of the highest page. I think all that's necessary is
to validate that each usable memory range in the e820 is covered by an
SRAT entry. This might not be the most efficient way to do it, but
there are usually a relatively small number of entries on each side.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memory range
A node's future-hotplug memory range starts from very high end
normally, e.g. 1TB, and is not continuous with its current existing
memory range. It should not be covered by the global variable 'nodes'
as it assumes the node's memory is continuous. Otherwise it can make
nodes' memory ranges become very big and overlapped, and
populate_memnodemap() fails.
We can ignore future-hotplug memory range for now. Physical memory
hotplug support in future will handle it.
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to properly handle SRAT rev 2 extended proximity domain
values.
Also a first step to eliminate the redundant definitions of
ACPI provided table structures (Linux eliminated all of the duplicates
from include/linux/acpi.h in 2.6.21).
Portions based on a Linux patch from Kurt Garloff <garloff@suse.de>
and Alexey Starikovskiy <astarikovskiy@suse.de>.
IA64 build tested only.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
|