| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
... as there doesn't really exists any valid mapping for them.
Particularly in the case of do_page_walk() this also avoids returning
non-NULL for such invalid input.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the direct map area can extend all the way up to almost the
end of address space, this is wasteful.
Also fold two almost redundant messages in SRAT parsing into one.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While boot time calls don't need this, run time uses of the function
which may result in L2 page tables getting populated need to be
serialized to avoid two CPUs populating the same L2 (or L3) entry,
overwriting each other's results.
This is expected to fix what would seem to be a regression from commit
b0581b92 ("x86: make map_domain_page_global() a simple wrapper around
vmap()"), albeit that change only made more readily visible the already
existing issue.
This patch intentionally does not
- add locking to the page table de-allocation logic in
destroy_xen_mappings() (the only user having potential races here,
msix_put_fixmap(), gets converted to use __set_fixmap() instead)
- avoid races between super page splitting and reconstruction in
map_pages_to_xen() (no such uses exist; races between multiple
splitting attempts or between multiple reconstruction attempts are
being taken care of)
If we wanted to take care of these, we'd need to alter the behavior
of virt_to_xen_l?e() - they would need to return with the lock held
then.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
| |
With vcpu->domain->arch.perdomain_l3_pg no longer getting set up for
the idle domain, this creates an invalid L4 entry (due to translating
a NULL struct page_info pointer to a physical address).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... using the new per-domain mapping management functions, adding
destroy_perdomain_mapping() to the previously introduced pair.
Rather than using an order-1 Xen heap allocation, use (currently 2)
individual domain heap pages to populate space in the per-domain
mapping area.
Also fix a benign off-by-one mistake in is_compat_arg_xlat_range().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
The emacs variable to set the C style from a local variable block is
c-file-style, not c-set-style.
Signed-off-by: David Vrabel <david.vrabel@citrix.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mainly involves adjusting the number of L4 entries needing copying
between page tables (which is now different between PV and HVM/idle
domains), and changing the cutoff point and method when more than the
supported amount of memory is found in a system.
Since TMEM doesn't currently cope with the full 1:1 map not always
being visible, it gets forcefully disabled in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
This involves no longer storing virtual addresses of the per-domain
mapping L2 and L3 page tables.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
... to allow frames for up to 16Tb.
At the same time, add the super page frame table coordinates to the
comment describing the address space layout.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... noticed while putting together the 16Tb support patches for x86.
Briefly, this (in order of the changes below)
- fixes an inefficiency in x86's context switch code (translations to/
from struct page are more involved than to/from MFNs)
- drop unnecessary MFM-to-page conversions
- drop a redundant call to destroy_xen_mappings() (an indentical call
is being made a few lines up)
- simplify a VA-to-MFN translation
- drop dead code (several occurrences)
- add a missing __init annotation
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- use the variants not validating the VA range when writing back
structures/fields to the same space that they were previously read
from
- when only a single field of a structure actually changed, copy back
just that field where possible
- consolidate copying back results in a few places
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note: these changes don't make any difference on x86.
Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as
an hypercall argument.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
|
|
|
|
|
|
|
|
| |
... rather than at boot time, removing unnecessary redundancy between
EFI and legacy boot code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
... rather than at boot time, removing unnecessary redundancy between
EFI and legacy boot code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
As a prerequisite for adding an EHCI debug port based console
implementation, set up the page tables needed for (a sub-portion of)
the fixmaps together with other boot time page table construction.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
table
The epfn is being compared to (RDWR_COMPAT_MPT_VIRT_END -
RDWR_COMPAT_MPT_VIRT_START) without a 2 bit shift, resulting in the
epfn being compared to the size of the RDWR_COMPAT_MPT table in bytes
instead of the maximum page frame number that the RDWR_COMPAT_MPT
table can map.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
Use it to replace x86-specific early_boot boolean variable.
Also use it to detect suspend/resume case during cpu offline/online
to avoid unnecessarily breaking vcpu and cpupool affinities.
Signed-off-by: Keir Fraser <keir@xen.org>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove costly mem_sharing audits from the inline path, and instead make them
callable as a memop.
Have the audit function return the number of errors detected.
Update memshrtool to be able to trigger audits.
Set sharing audits as enabled by default.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Adin Scannell <adin@scannell.ca>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Tim Deegan <tim@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per page operations in the paging, sharing, and access tracking subsystems are
all implemented with domctls (e.g. a domctl to evict one page, or to share one
page).
Under heavy load, the domctl path reveals a lack of scalability. The domctl
lock serializes dom0's vcpus in the hypervisor. When performing thousands of
per-page operations on dozens of domains, these vcpus will spin in the
hypervisor. Beyond the aggressive locking, an added inefficiency of blocking
vcpus in the domctl lock is that dom0 is prevented from re-scheduling any of
its other work-starved processes.
We retain the domctl interface for setting up and tearing down
paging/sharing/mem access for a domain. But we migrate all the per page
operations to use the memory_op hypercalls (e.g XENMEM_*).
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla>
Signed-off-by: Adin Scannell <adin@scannell.ca>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Tim Deegan <tim@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch also moves the existing sharing-related memory op to the
correct location, and adds logic to the audit() method that uses the
new information.
This patch only provides the Xen implementation of the domctls.
Signed-off-by: Andres Lagar-Cavilla <andres@scannell.ca>
Signed-off-by: Adin Scannell <adin@scannell.ca>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
|
|
|
|
|
|
|
|
| |
It's not really any more useful than explicitly setting new M2P
entries to the invalid value.
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to eliminate an initial hack in the EFI boot code (where
memory for the trampoline was just "claimed" instead of properly
allocated), the trampoline code must no longer make assumption on the
address at which it would be located. For the time being, the fixed
address is being retained for the traditional multiboot path.
As an additional benefit (at least from my pov) it allows confining
the visibility of the BOOT_TRAMPOLINE definition to just the boot
code.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Besides introducing the relevant code paralleling parts of what is
under xen/arch/x86/boot/, this adjusts the build logic so that with a
single compilation two images (gzip-compressed ELF and EFI
application)
can get created. The EFI part of this depends on a new enough compiler
(supposedly gcc 4.4.x and above, but so far only tested to work with
4.5.x) and a properly configured linker (must support the i386pep
emulation). If either functionality is found to not be available, the
EFI part of the build will simply be skipped.
The patch adds all code to allow Xen and the (accordingly enabled)
Dom0 kernel to boot, but doesn't allow Dom0 to make use of EFI
runtime calls (this will be the subject of the next patch).
Parts of the code were lifted from an earlier never published OS
project of ours - whether respective license information needs to be
added to the respective source file is unclear to me (I was told
internally that adding a GPLv2 license header can be done if needed by
the community).
Open issues (not preventing this from being committed imo):
The trampoline allocation and initialization isn't really nice. This
is due to the trampoline needing to be placed at a fixed address, and
hence making the trampoline relocatable would seem desirable here (as
well as for BIOS-based booting, where the trampoline location needed
to be adjusted a number of time already in the past, due to it
colliding with firmware data).
By excluding mem.S, edd.S, and video.S from copied trampoline (i.e.
moving up wakeup.S? and making sure none of the symbols are used from
EFI code), the effective trampoline size could at least be reduced.
Should the mappings of [__XEN_VIRT_START, mbi.mem_upper) and
[_end, __XEN_VIRT_START+BOOTSTRAP_MAP_BASE) be destroyed, despite
non-EFI code also keeping them?
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
At once, move the common (between 32- and 64-bit) definition of
machine_to_phys_mapping_valid to a common location.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
This is a prerequisite for calling set_gpfn_from_mfn() unconditionally
from free_heap_pages().
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
Fixes the build under gcc-4.6 -Werror=unused-but-set-variable
Signed-off-by: Olaf Hering <olaf@aepfle.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is accomplished by splitting the guest_context member, which by
itself is larger than a page on x86-64. Quite a number of fields of
this structure is completely meaningless for HVM guests, and thus a
new struct pv_vcpu gets introduced, which is being overlaid with
struct hvm_vcpu in struct arch_vcpu. The one member that is mostly
responsible for the large size is trap_ctxt, which now gets allocated
separately (unless fitting on the same page as struct arch_vcpu, as is
currently the case for x86-32), and only for non-hvm, non-idle
domains.
This change pointed out a latent problem in arch_set_info_guest(),
which is permitted to be called on already initialized vCPU-s, but
so far copied the new state into struct arch_vcpu without (in this
case) actually going through all the necessary accounting/validation
steps. The logic gets changed so that the pieces that bypass
accounting
will at least be verified to be no different from the currently active
bits, and the whole change will fail in case they are. The logic does
*not* get adjusted here to do full error recovery, that is, partially
modified state continues to not get unrolled in case of failure.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
| |
Most users can use _PAGE_NX_BIT directly.
The few genuine users in mm.c can do the cpu_has_nx check more clearly
in other ways.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Again, (out-of-memory) errors must not cause hypervisor crashes, and
hence ought to be propagated.
This also adjusts the cache attribute changing loop in
get_page_from_l1e() to not go through an unnecessary iteration. While
this could be considered mere cleanup, it is actually a requirement
for the subsequent now necessary error recovery path.
Also make a few functions static, easing the check for potential
callers needing adjustment.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
Instead, propagate the condition to the caller, all of which also get
adjusted to check for that situation.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... thus allowing to make the entries half their current size. Rather
than adjusting all instances to the new layout, abstract the
construction the table entries via a macro (paralleling a similar one
in recent Linux).
Also change the name of the section (to allow easier detection of
missed cases) and merge the final resulting output sections into
.data.read_mostly.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
| |
This is a prerequisite for allowing guest descheduling within a
hypercall.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By doing so, we're no longer restricted to be able to place all boot
loader modules into the low 1Gb/4Gb (32-/64-bit) of memory, nor is
there a dependency anymore on where the boot loader places the
modules.
We're also no longer restricted to copy the modules into a place below
4Gb, nor to put them all together into a single piece of memory.
Further it allows even the 32-bit Dom0 kernel to be loaded anywhere in
physical memory (except if it doesn't support PAE-above-4G).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
contiguous ranges of page_info structs. This allows page-pointer
arithmetic in places like our buddy allocator.
This restriction was already implicitly guaranteed, but it is good to
make it explicit in the pdx-related initialisation.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
| |
The former strips dom0 of its usual 1:1 mapping of all memory, and
only provides it with mappings of its own memory, like any other
domain. The latter is a new consistent name for iommu=passthrough.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has two advantages:
(a) We can move the allocations to a context where we can handle
failure.
(b) We can implement matching deallocations on CPU offline.
Only the idle vcpu structure is now not freed on CPU offline. This
probably does not really matter.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
| |
- handle only when memory hotplug regions were actually found
- fix off-by-one error in fault handler's sanity checking
- use first L4 table entry
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Current, memory hot-add will fail if the new added memory is bigger
than current max_pages. This is really a stupid checking, considering
user may hot-add the biggest address riser card firstly.
This patch fix this issue. It check if all new added memory is
unpopulated, if yes, then it is ok.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In changeset 20599, the node that has no memory populated is marked
parsed, but not online. However, if there are CPU populated in this
node, the corresponding CPU mapping (i.e. the cpu_to_node) is still
setup to the offline node, this will cause trouble for memory
allocation.
This patch changes the init_cpu_to_node() and srant_detect_node(), to
considering the node is offlined situation.
Now the apicid_to_node is only used to keep the mapping between
cpu/node provided by BIOS, and should not be used for memory
allocation anymore.
One thing left is to update the cpu_to_node mapping after memory
populated by memory hot-add.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
This is a reintroduction of 20726:ddb8c5e798f9, which I incorrectly
reverted in 20745:d3215a968db9
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In changeset 20599, the node that has no memory populated is marked
parsed, but not online. However, if there are CPU populated in this
node, the corresponding CPU mapping (i.e. the cpu_to_node) is still
setup to the offline node, this will cause trouble for memory
allocation.
This patch changes the init_cpu_to_node() and srant_detect_node(), to
considering the node is offlined situation.
Now the apicid_to_node is only used to keep the mapping between
cpu/node provided by BIOS, and should not be used for memory
allocation anymore.
One thing left is to update the cpu_to_node mapping after memory
populated by memory hot-add.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
| |
As we allocate m2p/compat m2p/frametable page tables from new added
memory, we want to make sure the new range can hold up the new page
tables, this is because m2p/frametable need be aligned and cover more
than the new-added range.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
| |
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
| |
Move the range checking to mem_hotadd_check.
Add more error handling, to restore the node information, unmap iommu
page tables, destroy xen mapping when error happens.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The basic work flow to handle the memory hotadd is:
Update node information
Map new pages to xen 1:1 mapping
Setup frametable for new memory range
Setup m2p table for new memory range
Put the new pages to domheap
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
memory hotplug is supported.
Hot-added memory may need a new L4 entry for 1:1 mapping. This patch
setup all L4 entry for 1:1 mapping if memory hotadd is needed, so that
we don't need sync the guest page table in page fault handler.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
|