aboutsummaryrefslogtreecommitdiffstats
path: root/xen/common/tmem.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-211-1/+1
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* tmem: add XSM hooksDaniel De Graaf2013-01-111-0/+3
| | | | | | | | | | | This adds a pair of XSM hooks for tmem operations: xsm_tmem_op which controls any use of tmem, and xsm_tmem_control which allows use of the TMEM_CONTROL operations. By default, all domains can use tmem while only IS_PRIV domains can use control operations. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Committed-by: Keir Fraser <keir@xen.org>
* tmem: Prevent NULL dereference on error caseMatthew Daley2012-11-121-1/+3
| | | | | | | | | | If the client / pool IDs given to tmemc_save_get_next_page are invalid, the calculation of pagesize will dereference NULL. Fix this by moving the calculation below the appropriate NULL check. Signed-off-by: Matthew Daley <mattjd@gmail.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* xen: replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when appropriateStefano Stabellini2012-10-171-20/+25
| | | | | | | | | | | | Note: these changes don't make any difference on x86. Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as an hypercall argument. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* tmem: bump pool version to 1 to fix restore issue when tmem enabledZhenzhong Duan2012-09-191-1/+2
| | | | | | | | | Restore fails when tmem is enabled both in hypervisor and guest. This is due to spec version mismatch when restoring a pool. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* xen: Remove x86_32 build target.Keir Fraser2012-09-121-14/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* tmem: cleanupJan Beulich2012-09-111-1/+1
| | | | | | | | | - one more case of checking for a specific rather than any error - drop no longer needed first parameter from cli_put_page() - drop a redundant cast Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: fixup 2010 cleanup patch that breaks tmem save/restoreDan Magenheimer2012-09-111-0/+2
| | | | | | | | | | | | | | | | | | 20918:a3fa6d444b25 "Fix domain reference leaks" (in Feb 2010, by Jan) does some cleanup in addition to the leak fixes. Unfortunately, that cleanup inadvertently resulted in an incorrect fallthrough in a switch statement which breaks tmem save/restore. That broken patch was apparently applied to 4.0-testing and 4.1-testing so those are broken as well. What is the process now for requesting back-patches to 4.0 and 4.1? (Side note: This does not by itself entirely fix save/restore in 4.2.) Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* tmem: reduce severity of log messagesJan Beulich2012-09-111-41/+47
| | | | | | | | Otherwise they can be used by a guest to spam the hypervisor log with all settings at their defaults. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
* tmem: properly drop lock on error path in do_tmem_op()Jan Beulich2012-09-111-1/+7
| | | | | | Reported-by: Tim Deegan <tim@xen.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: properly drop lock on error path in do_tmem_get()Jan Beulich2012-09-111-1/+2
| | | | | | | | Also remove a bogus assertion. Reported-by: Tim Deegan <tim@xen.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: detect arithmetic overflow in tmh_copy_{from,to}_client()Jan Beulich2012-09-111-16/+11
| | | | | | | | | This implies adjusting callers to deal with errors other than -EFAULT and removing some comments which would otherwise become stale. Reported-by: Tim Deegan <tim@xen.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: don't access guest memory without using the accessors intended for thisJan Beulich2012-09-111-41/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is not permitted, not even for buffers coming from Dom0 (and it would also break the moment Dom0 runs in HVM mode). An implication from the changes here is that tmh_copy_page() can't be used anymore for control operations calling tmh_copy_{from,to}_client() (as those pass the buffer by virtual address rather than MFN). Note that tmemc_save_get_next_page() previously didn't set the returned handle's pool_id field, while the new code does. It need to be confirmed that this is not a problem (otherwise the copy-out operation will require further tmh_...() abstractions to be added). Further note that the patch removes (rather than adjusts) an invalid call to unmap_domain_page() (no matching map_domain_page()) from tmh_compress_from_client() and adds a missing one to an error return path in tmh_copy_from_client(). Finally note that the patch adds a previously missing return statement to cli_get_page() (without which that function could de-reference a NULL pointer, triggerable from guest mode). This is part of XSA-15 / CVE-2012-3497. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: check for a valid client ("domain") in the save subopsIan Campbell2012-09-111-0/+8
| | | | | | | | | This is part of XSA-15 / CVE-2012-3497. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* tmem: check the pool_id is valid when destroying a tmem poolIan Campbell2012-09-111-0/+2
| | | | | | | This is part of XSA-15 / CVE-2012-3497. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* tmem: consistently make pool_id a uint32_tIan Campbell2012-09-111-3/+3
| | | | | | | | | | | Treating it as an int could allow a malicious guest to provide a negative pool_Id, by passing the MAX_POOLS_PER_DOMAIN limit check and allowing access to the negative offsets of the pool array. This is part of XSA-15 / CVE-2012-3497. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* tmem: only allow tmem control operations from privileged domainsIan Campbell2012-09-111-4/+2
| | | | | | | This is part of XSA-15 / CVE-2012-3497. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* tmem: add matching unlock for an about-to-be-destroyed objectDan Magenheimer2012-08-311-0/+1
| | | | | | | | | | | | | | | | | | A 4.2 changeset forces a preempt_disable/enable with every lock/unlock. Tmem has dynamically allocated "objects" that contain a lock. The lock is held when the object is destroyed. No reason to unlock something that's about to be destroyed! But with the preempt_enable/disable in the generic locking code, and the fact that do_softirq ASSERTs that preempt_count must be zero, a crash occurs soon after any object is destroyed. So force lock to be released before destroying objects. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Committed-by: Keir Fraser <keir@xen.org>
* arm: compile tmemStefano Stabellini2012-02-091-1/+2
| | | | | | | | Include few missing header files; introduce defined(CONFIG_ARM) where required. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* Update radix-tree.[ch] from upstream Linux to gain RCU awareness.Keir Fraser2011-05-091-13/+7
| | | | | | | | | | | | | | | | | | | We still leave behind features we don't need such as tagged nodes. Other changes: - Allow callers to define their own node alloc routines. - Only allocate per-node rcu_head when using the default RCU-safe alloc routines. - Keep our own radix_tree_destroy(). In future it may also be worth getting rid of the complex and pointless special-casing of radix-tree height==0, in which a single data item can be stored directly in radix_tree_root. It introduces a whole lot of special cases and complicates RCU handling. If we get rid of it we can reclaim the 'indirect pointer' tag in bit 0 of every slot entry. Signed-off-by: Keir Fraser <keir@xen.org>
* Revert 23295:4891f1f41ba5 and 23296:24346f749826Keir Fraser2011-05-021-0/+1
| | | | | | Fails current lock checking mechanism in spinlock.c in debug=y builds. Signed-off-by: Keir Fraser <keir@xen.org>
* x86: replace nr_irqs sized per-domain arrays with radix treesJan Beulich2011-05-011-1/+0
| | | | | | | | | It would seem possible to fold the two trees into one (making e.g. the emuirq bits stored in the upper half of the pointer), but I'm not certain that's worth it as it would make deletion of entries more cumbersome. Unless pirq-s and emuirq-s were mutually exclusive... Signed-off-by: Jan Beulich <jbeulich@novell.com>
* tmem: two wrongs (or three lefts and a wrong) make a rightKeir Fraser2010-12-151-4/+4
| | | | | | | | | | These two bugs apparently complement each other enough that they escaped problems in my testing, but eventually gum up the works and are obviously horribly wrong. Found while developing tmem for native Linux. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: Use of 'new' clashes with C++ reserved namespace.Keir Fraser2010-12-101-6/+6
| | | | | | Rename to 'creat', which does not conflict. Signed-off-by: Keir Fraser <keir@xen.org>
* tmem (hv): move to new ABI version to handle long object-idsKeir Fraser2010-09-131-123/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a great deal of discussion and review with linux kernel developers, it appears there are "next-generation" filesystems (such as btrfs, xfs, Lustre) that will not be able to use tmem due to an ABI limitation... a field that represents a unique file identifier is 64-bits in the tmem ABI and may need to be as large as 192-bits. So to support these guest filesystems, the tmem ABI must be revised, from "v0" to "v1". I *think* it is still the case that tmem is experimental and is not used anywhere yet in production. The tmem ABI is designed to support multiple revisions, so the Xen tmem implementation could be updated to handle both v0 and v1. However this is a bit messy and would require data structures for both v0 and v1 to appear in public Xen header files. I am inclined to update the Xen tmem implementation to only support v1 and gracefully fail v0. This would result in only a performance loss (as if tmem were disabled) for newly launched tmem-v0-enabled guests, but live-migration between old tmem-v0 Xen and new tmem-v1 Xen machines would fail, and saved tmem-v0 guests will not be able to be restored on a tmem-v1 Xen machine. I would plan to update both pre-4.0.2 and unstable (future 4.1) to only support v1. I believe these restrictions are reasonable at this point in the tmem lifecycle, though they may not be reasonable in the near future; should the tmem ABI need to be revised from v1 to v2, I understand backwards compatibility will be required. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: skip special case in alloc_heap_pages() if tmem holds no pagesKeir Fraser2010-06-231-0/+5
| | | | Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* Enable tmem functionality for PV on HVM guests. Guest kernelKeir Fraser2010-06-211-12/+12
| | | | | | | | | | | | | must still be tmem-enabled to use this functionality (e.g. won't work for Windows), but upstream Linux tmem (aka cleancache and frontswap) patches apply cleanly on top of PV on HVM patches. Also, fix up some ASSERTS and code used only when bad guest mfns are passed to tmem. Previous code could crash Xen if a buggy/malicious guest passes bad gmfns. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: Fix domain lifecycle synchronisation.Keir Fraser2010-06-101-5/+11
| | | | | | | | | Obtaining a domain reference count is neither necessary nor sufficient. Instead we simply check whether a domain is already dying when it first becomes a client of tmem. If it is not then we will correctly clean up later via tmem_destroy() called from domain_kill(). Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Tmem: fix domain refcount leak by returning to simpler modelKeir Fraser2010-06-101-39/+7
| | | | | | | | which claims a ref once when the tmem client is first associated with the domain, and puts it once when the tmem client is destroyed. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* Remove many uses of cpu_possible_map and iterators over NR_CPUS.Keir Fraser2010-05-141-2/+5
| | | | | | | The significant remaining culprits for x86 are credit2, hpet, and percpu-area subsystems. To be dealt with in a separate patch. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* tmem: add page deduplication with optional compression or ↵Keir Fraser2010-04-061-74/+398
| | | | | | | | | | | | | | | | trailing-zero-elimination Add "page deduplication" capability (with optional compression and trailing-zero elimination) to Xen's tmem. (Transparent to tmem-enabled guests.) Ephemeral pages that have the exact same content are "combined" so that only one page frame is needed. Since ephemeral pages are essentially read-only, no C-O-W (and thus no equivalent of swapping) is necessary. Deduplication can be combined with compression or "trailing zero elimination" for even more space savings. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: typo causes incorrect return on out-of-memoryKeir Fraser2010-03-091-1/+1
| | | | | | | | This classic typo in tmem would result in a false positive report on a tmem "put" operation if a (unfragmented) page of memory is completely unavailable. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: Quieten noisy printk in non-debug buildKeir Fraser2010-02-221-0/+2
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Fix domain reference leaksKeir Fraser2010-02-101-59/+105
| | | | | | | | | | | | | Besides two unlikely/rarely hit ones in x86 code, the main offender was tmh_client_from_cli_id(), which didn't even have a counterpart (albeit it had a comment correctly saying that it causes d->refcnt to get incremented). Unfortunately(?) this required a bit of code restructuring (as I needed to change the code anyway, I also fixed a couple os missing bounds checks which would sooner or later be reported as security vulnerabilities), so I would hope Dan could give it his blessing before it gets applied. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* tmem: Reduce verbosity on failed memory allocations.Keir Fraser2010-01-081-2/+2
| | | | | | | | | Reduce tmem complaints per Jan's concerns in this thread http://lists.xensource.com/archives/html/xen-devel/2010-01/msg00155.html Now complains only if tmem HAS memory to relinquish and memory request has order>0. Signed-off by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: reclaim minimal memory proactivelyKeir Fraser2009-12-091-0/+20
| | | | | | | | | | | | | | | | | When a single domain is using most/all of tmem memory for ephemeral pages belonging to the same object, e.g. when copying a single huge file larger than ephemeral memory, long lists are traversed looking for a page to evict that doesn't belong to this object (as pages in the object for which a page is currently being inserted are locked and cannot be evicted). This is essentially a livelock. Avoid this by proactively ensuring there is a margin of available memory (1MB) before locks are taken on the object. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: Fix another race in tmem on domain destroy.Keir Fraser2009-11-241-1/+9
| | | | Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: fix double-free bugKeir Fraser2009-11-231-1/+1
| | | | | | | Tmem double-frees a high-level data structure causing memory corruption under certain circumstances. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: fix domain shutdown problem/raceKeir Fraser2009-11-141-0/+12
| | | | | | | | Tmem fails to put_domain so a dying domain never gets properly shut down. Also, fix race condition when domain is dying by not allowing any new ops to succeed. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: expose freeable memoryKeir Fraser2009-08-101-4/+7
| | | | | | | | | | | | | | | | | Expose tmem "freeable" memory for use by management tools. Management tools looking for a machine with available memory often look at free_memory to determine if there is enough physical memory to house a new or migrating guest. Since tmem absorbs much or all free memory, and since "ephemeral" tmem memory can be synchronously freed, management tools need more data -- not only how much memory is "free" but also how much memory is "freeable" by tmem if tmem is told (via an already existing tmem hypercall) to relinquish freeable memory. This patch provides that extra piece of data (in MB). Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: Remove bogus variable decl, fixing build.Keir Fraser2009-08-061-1/+0
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* tmem: save/restore/migrate/livemigrate and shared pool authenticationKeir Fraser2009-08-061-67/+467
| | | | | | | | | | | | | Attached patch implements save/restore/migration/livemigration for transcendent memory ("tmem"). Without this patch, domains using tmem may in some cases lose data when doing save/restore or migrate/livemigrate. Also included in this patch is support for a new (privileged) hypercall for authorizing domains to share pools; this provides the foundation to accomodate upstream linux requests for security for shared pools. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: No noise when disabled and not configuredKeir Fraser2009-07-161-6/+0
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* tmem: extra statsKeir Fraser2009-06-271-10/+24
| | | | | | | This patch collects a few additional valuable per-domain performance stats. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: fix 32-on-64 supportKeir Fraser2009-06-171-11/+18
| | | | | | | | | This implicitly required coverting the tmem_op structure from anonymous to standard struct/union sub-fields, and extending the get-fields.sh helper script to deal with typedef-ed guest handles used as types of translated compound type fields. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* tmem: shared ephemeral (SE) pool (clustering) fixesKeir Fraser2009-06-011-44/+25
| | | | | | | | | | | | | | | | | Tmem can share clean page cache pages for Linux domains in a virtual cluster (currently only the ocfs2 filesystem has a patch on the Linux side). So when one domain "puts" (evicts) a page, any domain in the cluster can "get" it, thus saving disk reads. This functionality is already present; these are only bug fixes. - fix bugs when an SE pool is destroyed - fixes in parsing tool for xm tmem-list output for SE pools - incorrect locking in one case for destroying an SE pool - clearer verbosity for transfer when an SE pool is destroyed - minor cleanup: merge routines that are mostly duplicate Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* tmem: fix corner case crash on forcible domain destructionKeir Fraser2009-06-011-10/+6
| | | | | | | | | | | When a tmem-enabled domain is destroyed, if the domain was using a persistent pool, the domain destruction process to scrubs page races tmem's attempts to gracefully dismantle data structures. Move tmem_destroy earlier in the domain destruction process. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* tmem: build fixKeir Fraser2009-05-271-10/+13
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Transcendent memory ("tmem") for Xen.Keir Fraser2009-05-261-0/+2109
Tmem, when called from a tmem-capable (paravirtualized) guest, makes use of otherwise unutilized ("fallow") memory to create and manage pools of pages that can be accessed from the guest either as "ephemeral" pages or as "persistent" pages. In either case, the pages are not directly addressible by the guest, only copied to and fro via the tmem interface. Ephemeral pages are a nice place for a guest to put recently evicted clean pages that it might need again; these pages can be reclaimed synchronously by Xen for other guests or other uses. Persistent pages are a nice place for a guest to put "swap" pages to avoid sending them to disk. These pages retain data as long as the guest lives, but count against the guest memory allocation. Tmem pages may optionally be compressed and, in certain cases, can be shared between guests. Tmem also handles concurrency nicely and provides limited QoS settings to combat malicious DoS attempts. Save/restore and live migration support is not yet provided. Tmem is primarily targeted for an x86 64-bit hypervisor. On a 32-bit x86 hypervisor, it has limited functionality and testing due to limitations of the xen heap. Nearly all of tmem is architecture-independent; three routines remain to be ported to ia64 and it should work on that architecture too. It is also structured to be portable to non-Xen environments. Tmem defaults off (for now) and must be enabled with a "tmem" xen boot option (and does nothing unless a tmem-capable guest is running). The "tmem_compress" boot option enables compression which takes about 10x more CPU but approximately doubles the number of pages that can be stored. Tmem can be controlled via several "xm" commands and many interesting tmem statistics can be obtained. A README and internal specification will follow, but lots of useful prose about tmem, as well as Linux patches, can be found at http://oss.oracle.com/projects/tmem . Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>