| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The emacs variable to set the C style from a local variable block is
c-file-style, not c-set-style.
Signed-off-by: David Vrabel <david.vrabel@citrix.com
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a pair of XSM hooks for tmem operations: xsm_tmem_op which
controls any use of tmem, and xsm_tmem_control which allows use of the
TMEM_CONTROL operations. By default, all domains can use tmem while
only IS_PRIV domains can use control operations.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
If the client / pool IDs given to tmemc_save_get_next_page are invalid,
the calculation of pagesize will dereference NULL.
Fix this by moving the calculation below the appropriate NULL check.
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note: these changes don't make any difference on x86.
Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as
an hypercall argument.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
|
|
|
|
|
|
|
|
|
| |
Restore fails when tmem is enabled both in hypervisor and guest. This
is due to spec version mismatch when restoring a pool.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
- one more case of checking for a specific rather than any error
- drop no longer needed first parameter from cli_put_page()
- drop a redundant cast
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
20918:a3fa6d444b25 "Fix domain reference leaks" (in Feb 2010, by Jan)
does some cleanup in addition to the leak fixes. Unfortunately, that
cleanup inadvertently resulted in an incorrect fallthrough in a switch
statement which breaks tmem save/restore.
That broken patch was apparently applied to 4.0-testing and 4.1-testing
so those are broken as well.
What is the process now for requesting back-patches to 4.0 and 4.1?
(Side note: This does not by itself entirely fix save/restore in 4.2.)
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
| |
Otherwise they can be used by a guest to spam the hypervisor log with
all settings at their defaults.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
|
|
|
|
|
|
| |
Reported-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
| |
Also remove a bogus assertion.
Reported-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
| |
This implies adjusting callers to deal with errors other than -EFAULT
and removing some comments which would otherwise become stale.
Reported-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is not permitted, not even for buffers coming from Dom0 (and it
would also break the moment Dom0 runs in HVM mode). An implication from
the changes here is that tmh_copy_page() can't be used anymore for
control operations calling tmh_copy_{from,to}_client() (as those pass
the buffer by virtual address rather than MFN).
Note that tmemc_save_get_next_page() previously didn't set the returned
handle's pool_id field, while the new code does. It need to be
confirmed that this is not a problem (otherwise the copy-out operation
will require further tmh_...() abstractions to be added).
Further note that the patch removes (rather than adjusts) an invalid
call to unmap_domain_page() (no matching map_domain_page()) from
tmh_compress_from_client() and adds a missing one to an error return
path in tmh_copy_from_client().
Finally note that the patch adds a previously missing return statement
to cli_get_page() (without which that function could de-reference a
NULL pointer, triggerable from guest mode).
This is part of XSA-15 / CVE-2012-3497.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
| |
This is part of XSA-15 / CVE-2012-3497.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
| |
This is part of XSA-15 / CVE-2012-3497.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Treating it as an int could allow a malicious guest to provide a
negative pool_Id, by passing the MAX_POOLS_PER_DOMAIN limit check and
allowing access to the negative offsets of the pool array.
This is part of XSA-15 / CVE-2012-3497.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
| |
This is part of XSA-15 / CVE-2012-3497.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A 4.2 changeset forces a preempt_disable/enable with
every lock/unlock.
Tmem has dynamically allocated "objects" that contain a
lock. The lock is held when the object is destroyed.
No reason to unlock something that's about to be destroyed!
But with the preempt_enable/disable in the generic locking code,
and the fact that do_softirq ASSERTs that preempt_count
must be zero, a crash occurs soon after any object is
destroyed.
So force lock to be released before destroying objects.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
Include few missing header files; introduce defined(CONFIG_ARM) where
required.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We still leave behind features we don't need such as tagged nodes.
Other changes:
- Allow callers to define their own node alloc routines.
- Only allocate per-node rcu_head when using the default RCU-safe
alloc routines.
- Keep our own radix_tree_destroy().
In future it may also be worth getting rid of the complex and
pointless special-casing of radix-tree height==0, in which a single
data item can be stored directly in radix_tree_root. It introduces a
whole lot of special cases and complicates RCU handling. If we get rid
of it we can reclaim the 'indirect pointer' tag in bit 0 of every slot
entry.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
Fails current lock checking mechanism in spinlock.c in debug=y builds.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
It would seem possible to fold the two trees into one (making e.g. the
emuirq bits stored in the upper half of the pointer), but I'm not
certain that's worth it as it would make deletion of entries more
cumbersome. Unless pirq-s and emuirq-s were mutually exclusive...
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
These two bugs apparently complement each other enough that
they escaped problems in my testing, but eventually gum
up the works and are obviously horribly wrong.
Found while developing tmem for native Linux.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
| |
Rename to 'creat', which does not conflict.
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After a great deal of discussion and review with linux
kernel developers, it appears there are "next-generation"
filesystems (such as btrfs, xfs, Lustre) that will not
be able to use tmem due to an ABI limitation... a field
that represents a unique file identifier is 64-bits in
the tmem ABI and may need to be as large as 192-bits.
So to support these guest filesystems, the tmem ABI must be
revised, from "v0" to "v1".
I *think* it is still the case that tmem is experimental
and is not used anywhere yet in production.
The tmem ABI is designed to support multiple revisions,
so the Xen tmem implementation could be updated to
handle both v0 and v1. However this is a bit
messy and would require data structures for both v0
and v1 to appear in public Xen header files.
I am inclined to update the Xen tmem implementation
to only support v1 and gracefully fail v0. This would
result in only a performance loss (as if tmem were
disabled) for newly launched tmem-v0-enabled guests,
but live-migration between old tmem-v0 Xen and new
tmem-v1 Xen machines would fail, and saved tmem-v0
guests will not be able to be restored on a tmem-v1
Xen machine. I would plan to update both pre-4.0.2
and unstable (future 4.1) to only support v1.
I believe these restrictions are reasonable at this
point in the tmem lifecycle, though they may not
be reasonable in the near future; should the tmem
ABI need to be revised from v1 to v2, I understand
backwards compatibility will be required.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
| |
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
must still be tmem-enabled to use this functionality (e.g.
won't work for Windows), but upstream Linux tmem (aka
cleancache and frontswap) patches apply cleanly on top
of PV on HVM patches.
Also, fix up some ASSERTS and code used only when bad guest
mfns are passed to tmem. Previous code could crash Xen
if a buggy/malicious guest passes bad gmfns.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
| |
Obtaining a domain reference count is neither necessary nor
sufficient. Instead we simply check whether a domain is already dying
when it first becomes a client of tmem. If it is not then we will
correctly clean up later via tmem_destroy() called from domain_kill().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
| |
which claims a ref once when the tmem client is first associated
with the domain, and puts it once when the tmem client is
destroyed.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
| |
The significant remaining culprits for x86 are credit2, hpet, and
percpu-area subsystems. To be dealt with in a separate patch.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
trailing-zero-elimination
Add "page deduplication" capability (with optional compression
and trailing-zero elimination) to Xen's tmem.
(Transparent to tmem-enabled guests.) Ephemeral pages
that have the exact same content are "combined" so that only
one page frame is needed. Since ephemeral pages are essentially
read-only, no C-O-W (and thus no equivalent of swapping) is
necessary. Deduplication can be combined with compression
or "trailing zero elimination" for even more space savings.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
| |
This classic typo in tmem would result in a false positive
report on a tmem "put" operation if a (unfragmented) page
of memory is completely unavailable.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Besides two unlikely/rarely hit ones in x86 code, the main offender
was tmh_client_from_cli_id(), which didn't even have a counterpart
(albeit it had a comment correctly saying that it causes d->refcnt to
get incremented). Unfortunately(?) this required a bit of code
restructuring (as I needed to change the code anyway, I also fixed
a couple os missing bounds checks which would sooner or later be
reported as security vulnerabilities), so I would hope Dan could give
it his blessing before it gets applied.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
| |
Reduce tmem complaints per Jan's concerns in this thread
http://lists.xensource.com/archives/html/xen-devel/2010-01/msg00155.html
Now complains only if tmem HAS memory to relinquish and
memory request has order>0.
Signed-off by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a single domain is using most/all of tmem memory
for ephemeral pages belonging to the same object, e.g.
when copying a single huge file larger than ephemeral
memory, long lists are traversed looking for a page to
evict that doesn't belong to this object (as pages in
the object for which a page is currently being inserted
are locked and cannot be evicted). This is essentially
a livelock.
Avoid this by proactively ensuring there is a margin
of available memory (1MB) before locks are taken on
the object.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
| |
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
| |
Tmem double-frees a high-level data structure causing memory
corruption under certain circumstances.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
| |
Tmem fails to put_domain so a dying domain never gets
properly shut down. Also, fix race condition when
domain is dying by not allowing any new ops to succeed.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Expose tmem "freeable" memory for use by management tools.
Management tools looking for a machine with available
memory often look at free_memory to determine if there
is enough physical memory to house a new or migrating
guest. Since tmem absorbs much or all free memory,
and since "ephemeral" tmem memory can be synchronously
freed, management tools need more data -- not only how
much memory is "free" but also how much memory is
"freeable" by tmem if tmem is told (via an already
existing tmem hypercall) to relinquish freeable memory.
This patch provides that extra piece of data (in MB).
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Attached patch implements save/restore/migration/livemigration
for transcendent memory ("tmem"). Without this patch, domains
using tmem may in some cases lose data when doing save/restore
or migrate/livemigrate. Also included in this patch is
support for a new (privileged) hypercall for authorizing
domains to share pools; this provides the foundation to
accomodate upstream linux requests for security for shared
pools.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
| |
This patch collects a few additional valuable per-domain
performance stats.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
| |
This implicitly required coverting the tmem_op structure from
anonymous to standard struct/union sub-fields, and extending the
get-fields.sh helper script to deal with typedef-ed guest handles used
as types of translated compound type fields.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tmem can share clean page cache pages for Linux domains
in a virtual cluster (currently only the ocfs2 filesystem
has a patch on the Linux side). So when one domain
"puts" (evicts) a page, any domain in the cluster can
"get" it, thus saving disk reads. This functionality
is already present; these are only bug fixes.
- fix bugs when an SE pool is destroyed
- fixes in parsing tool for xm tmem-list output for SE pools
- incorrect locking in one case for destroying an SE pool
- clearer verbosity for transfer when an SE pool is destroyed
- minor cleanup: merge routines that are mostly duplicate
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When a tmem-enabled domain is destroyed, if the domain was
using a persistent pool, the domain destruction process
to scrubs page races tmem's attempts to gracefully dismantle
data structures. Move tmem_destroy earlier in the domain
destruction process.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
Tmem, when called from a tmem-capable (paravirtualized) guest, makes
use of otherwise unutilized ("fallow") memory to create and manage
pools of pages that can be accessed from the guest either as
"ephemeral" pages or as "persistent" pages. In either case, the pages
are not directly addressible by the guest, only copied to and fro via
the tmem interface. Ephemeral pages are a nice place for a guest to
put recently evicted clean pages that it might need again; these pages
can be reclaimed synchronously by Xen for other guests or other uses.
Persistent pages are a nice place for a guest to put "swap" pages to
avoid sending them to disk. These pages retain data as long as the
guest lives, but count against the guest memory allocation.
Tmem pages may optionally be compressed and, in certain cases, can be
shared between guests. Tmem also handles concurrency nicely and
provides limited QoS settings to combat malicious DoS attempts.
Save/restore and live migration support is not yet provided.
Tmem is primarily targeted for an x86 64-bit hypervisor. On a 32-bit
x86 hypervisor, it has limited functionality and testing due to
limitations of the xen heap. Nearly all of tmem is
architecture-independent; three routines remain to be ported to ia64
and it should work on that architecture too. It is also structured to
be portable to non-Xen environments.
Tmem defaults off (for now) and must be enabled with a "tmem" xen boot
option (and does nothing unless a tmem-capable guest is running). The
"tmem_compress" boot option enables compression which takes about 10x
more CPU but approximately doubles the number of pages that can be
stored.
Tmem can be controlled via several "xm" commands and many interesting
tmem statistics can be obtained. A README and internal specification
will follow, but lots of useful prose about tmem, as well as Linux
patches, can be found at http://oss.oracle.com/projects/tmem .
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
|