aboutsummaryrefslogtreecommitdiffstats
path: root/xen/common/domain.c
Commit message (Collapse)AuthorAgeFilesLines
* console: buffer and show origin of guest PV writesDaniel De Graaf2013-09-101-0/+8
| | | | | | | | | | | | | Guests other than domain 0 using the console output have previously been controlled by the VERBOSE #define, but with no designation of which guest's output was on the console. This patch converts the HVM output buffering to be used by all domains except the hardware domain (dom0): stripping non-printable characters, line buffering the output, and prefixing it with the domain ID. This is especially useful for debugging stub domains during early boot. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
* xen: remove evtchn_upcall_mask from interface on ARMIan Campbell2013-08-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | On ARM event-channel upcalls are masked using the hardware's interrupt mask bit and not by a software bit. Leaving this field present in the interface has caused some confusion already and is liable to mean it gets inadvertently used in the future. So arrange for this field to be turned into a padding field on ARM by introducing a XEN_HAVE_PV_UPCALL_MASK define. This bit is also unused for x86 PV-on-HVM guests, but we can't realistically distinguish those from x86 PV guests in the headers. Add a per-arch vcpu_event_delivery_is_enabled function to replace an open coded use of evtchn_upcall_mask in common code (in a debug keyhandler). The existing local_event_delivery_is_enabled, which operates only on current, was unimplemented on ARM and unused on x86, so remove it. ifdef the use of evtchn_upcall_mask when setting up a new vcpu info page. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
* use SMP barrier in common code dealing with shared memory protocolsIan Campbell2013-07-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Xen currently makes no strong distinction between the SMP barriers (smp_mb etc) and the regular barrier (mb etc). In Linux, where we inherited these names from having imported Linux code which uses them, the SMP barriers are intended to be sufficient for implementing shared-memory protocols between processors in an SMP system while the standard barriers are useful for MMIO etc. On x86 with the stronger ordering model there is not much practical difference here but ARM has weaker barriers available which are suitable for use as SMP barriers. Therefore ensure that common code uses the SMP barriers when that is all which is required. On both ARM and x86 both types of barrier are currently identical so there is no actual change. A future patch will change smp_mb to a weaker barrier on ARM. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: make map_domain_page_global() a simple wrapper around vmap()Jan Beulich2013-07-041-1/+2
| | | | | | | | | | | | | | | This is in order to reduce the number of fundamental mapping mechanisms as well as to reduce the amount of code to be maintained. In the course of this the virtual space available to vmap() is being grown from 16Gb to 64Gb. Note that this requires callers of unmap_domain_page_global() to no longer pass misaligned pointers - map_domain_page_global() returns page size aligned pointers, so unmappinmg should be done accordingly. unmap_vcpu_info() violated this and is being adjusted here. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: move VCPUOP_register_runstate_memory_area to common codeStefano Stabellini2013-05-081-0/+28
| | | | | Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: move VCPUOP_register_vcpu_info to common codeStefano Stabellini2013-05-081-0/+111
| | | | | | | | | | | | | | | | Move the implementation of VCPUOP_register_vcpu_info from x86 specific to commmon code. Move vcpu_info_mfn from an arch specific vcpu sub-field to the common vcpu struct. Move the initialization of vcpu_info_mfn to common code. Move unmap_vcpu_info and the call to unmap_vcpu_info at domain destruction time to common code. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* common: remove rcu_lock_target_domain_by_idDaniel De Graaf2013-05-071-34/+0
| | | | | | | | | | This function (and rcu_lock_remote_target_domain_by_id) has no remaining users, having been replaced with XSM hooks and the other rcu_lock_* functions. Remove it. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> (for 4.3 release) Acked-by: Keir Fraser <keir@xen.org>
* x86: make arch_set_info_guest() preemptibleJan Beulich2013-05-021-0/+5
| | | | | | | | | | .. as the root page table validation (and the dropping of an eventual old one) can require meaningful amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* x86: make vcpu_reset() preemptibleJan Beulich2013-05-021-2/+10
| | | | | | | | | | ... as dropping the old page tables may take significant amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* xen: allow for explicitly specifying node-affinityDario Faggioli2013-04-171-4/+56
| | | | | | | | | | | | | | | Make it possible to pass the node-affinity of a domain to the hypervisor from the upper layers, instead of always being computed automatically. Note that this also required generalizing the Flask hooks for setting and getting the affinity, so that they now deal with both vcpu and node affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/S3: Restore broken vcpu affinity on resumeBen Guthro2013-04-021-0/+2
| | | | | | | | | | | | | When in SYS_STATE_suspend, and going through the cpu_disable_scheduler path, save a copy of the current cpu affinity, and mark a flag to restore it later. Later, in the resume process, when enabling nonboot cpus restore these affinities. Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* mmu: Introduce XENMEM_claim_pages (subop of memory ops)Dan Magenheimer2013-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When guests memory consumption is volatile (multiple guests ballooning up/down) we are presented with the problem of being able to determine exactly how much memory there is for allocation of new guests without negatively impacting existing guests. Note that the existing models (xapi, xend) drive the memory consumption from the tool-stack and assume that the guest will eventually hit the memory target. Other models, such as the dynamic memory utilized by tmem, do this differently - the guest drivers the memory consumption (up to the d->max_pages ceiling). With dynamic memory model, the guest frequently can balloon up and down as it sees fit. This presents the problem to the toolstack that it does not know atomically how much free memory there is (as the information gets stale the moment the d->tot_pages information is provided to the tool-stack), and hence when starting a guest can fail during the memory creation process. Especially if the process is done in parallel. In a nutshell what we need is a atomic value of all domains tot_pages during the allocation of guests. Naturally holding a lock for such a long time is unacceptable. Hence the goal of this hypercall is to attempt to atomically and very quickly determine if there are sufficient pages available in the system and, if so, "set aside" that quantity of pages for future allocations by that domain. Unlike an existing hypercall such as increase_reservation or populate_physmap, specific physical pageframes are not assigned to the domain because this cannot be done sufficiently quickly (especially for very large allocations in an arbitrarily fragmented system) and so the existing mechanisms result in classic time-of-check-time-of-use (TOCTOU) races. One can think of claiming as similar to a "lazy" allocation, but subsequent hypercalls are required to do the actual physical pageframe allocation. Note that one of effects of this hypercall is that from the perspective of other running guests - suddenly there is a new guest occupying X amount of pages. This means that when we try to balloon up they will hit the system-wide ceiling of available free memory (if the total sum of the existing d->max_pages >= host memory). This is OK - as that is part of the overcommit. What we DO NOT want to do is dictate their ceiling should be (d->max_pages) as that is risky and can lead to guests OOM-ing. It is something the guest needs to figure out. In order for a toolstack to "get" information about whether a domain has a claim and, if so, how large, and also for the toolstack to measure the total system-wide claim, a second subop has been added and exposed through domctl and libxl (see "xen: XENMEM_claim_pages: xc"). == Alternative solutions == There has been a variety of discussion whether the problem hypercall is solving can be done in user-space, such as: - For all the existing guest, set their d->max_pages temporarily to d->tot_pages and create the domain. This forces those domains to stay at their current consumption level (fyi, this is what the tmem freeze call is doing). The disadvantage of this is that needlessly forces the guests to stay at the memory usage instead of allowing it to decide the optimal target. - Account only using d->max_pages of how much free memory there is. This ignores ballooning changes and any over-commit scenario. This is similar to the scenario where the sum of all d->max_pages (and the one to be allocated now) on the host is smaller than the available free memory. As such it ignores the over-commit problem. - Provide a ring/FIFO along with event channel to notify an userspace daemon of guests memory consumption. This daemon can then provide up-to-date information to the toolstack of how much free memory there is. This duplicates what the hypervisor is already doing and introduced latency issues and catching breath for the toolstack as there might be millions of these updates on heavily used machine. There might not be any quiescent state ever and the toolstack will heavily consume CPU cycles and not ever provide up-to-date information. It has been noted that this claim mechanism solves the underlying problem (slow failure of domain creation) for a large class of domains but not all, specifically not handling (but also not making the problem worse for) PV domains that specify the "superpages" flag, and 32-bit PV domains on large RAM systems. These will be addressed at a later time. Code overview: Though the hypercall simply does arithmetic within locks, some of the semantics in the code may be a bit subtle. The key variables (d->unclaimed_pages and total_unclaimed_pages) starts at zero if no claim has yet been staked for any domain. (Perhaps a better name is "claimed_but_not_yet_possessed" but that's a bit unwieldy.) If no claim hypercalls are executed, there should be no impact on existing usage. When a claim is successfully staked by a domain, it is like a watermark but there is no record kept of the size of the claim. Instead, d->unclaimed_pages is set to the difference between d->tot_pages and the claim. When d->tot_pages increases or decreases, d->unclaimed_pages atomically decreases or increases. Once d->unclaimed_pages reaches zero, the claim is satisfied and d->unclaimed pages stays at zero -- unless a new claim is subsequently staked. The systemwide variable total_unclaimed_pages is always the sum of d->unclaimed_pages, across all domains. A non-domain- specific heap allocation will fail if total_unclaimed_pages exceeds free (plus, on tmem enabled systems, freeable) pages. Claim semantics could be modified by flags. The initial implementation had three flag, which discerns whether the caller would like tmem freeable pages to be considered in determining whether or not the claim can be successfully staked. This in later patches was removed and there are no flags. A claim can be cancelled by requesting a claim with the number of pages being zero. A second subop returns the total outstanding claimed pages systemwide. Note: Save/restore/migrate may need to be modified, else it can be documented that all claims are cancelled. This patch of the proposed XENMEM_claim_pages hypercall/subop, takes into account review feedback from Jan and Keir and IanC and Matthew Daley, plus some fixes found via runtime debugging. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-211-1/+1
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* xen/xsm: Add xsm_default parameter to XSM hooksDaniel De Graaf2013-01-111-1/+1
| | | | | | | | | | | | | | Include the default XSM hook action as the first argument of the hook to facilitate quick understanding of how the call site is expected to be used (dom0-only, arbitrary guest, or device model). This argument does not solely define how a given hook is interpreted, since any changes to the hook's default action need to be made identically to all callers of a hook (if there are multiple callers; most hooks only have one), and may also require changing the arguments of the hook. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* arch/x86: Add missing mem_sharing XSM hooksDaniel De Graaf2013-01-111-0/+15
| | | | | | | | | | | | This patch adds splits up the mem_sharing and mem_event XSM hooks to better cover what the code is doing. It also changes the utility function get_mem_event_op_target to rcu_lock_live_remote_domain_by_id because there is no mm-specific logic in there. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
* xen: remove nr_irqs_gsi from generic codeIan Campbell2012-12-191-2/+2
| | | | | | | | | | | | | | | | The concept is X86 specific. AFAICT the generic concept here is the number of static physical IRQs which the current hardware has, so call this nr_static_irqs. Also using "defined NR_IRQS" as a standin for x86 might have made sense at one point but its just cleaner to push the necessary definitions into asm/irq.h. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* VCPU/timers: Prevent overflow in calculations, leading to DoS vulnerabilityIan Jackson2012-11-141-0/+3
| | | | | | | | | | | | | | The timer action for a vcpu periodic timer is to calculate the next expiry time, and to reinsert itself into the timer queue. If the deadline ends up in the past, Xen never leaves __do_softirq(). The affected PCPU will stay in an infinite loop until Xen is killed by the watchdog (if enabled). This is a security problem, XSA-20 / CVE-2012-4535. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
* xen: replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when appropriateStefano Stabellini2012-10-171-1/+1
| | | | | | | | | | | | Note: these changes don't make any difference on x86. Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as an hypercall argument. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* xen: Add versions of rcu_lock_*_domain without IS_PRIVDaniel De Graaf2012-10-151-0/+21
| | | | | | | | | | These functions will be used to avoid duplication of IS_PRIV calls that will be introduced in XSM hooks. This also fixes a build error with XSM enabled introduced by 25925:d1c3375c3f11 which depends on this patch. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
* make domain_create() return a proper error codeJan Beulich2012-09-031-11/+13
| | | | | | | | | | | | | | While triggered by the XSA-9 fix, this really is of more general use; that fix just pointed out very sharply that the current situation with all domain creation failures reported to user (tools) space as -ENOMEM is very unfortunate (actively misleading users _and_ support personnel). Pull over the pointer <-> error code conversion infrastructure from Linux, and use it in domain_create() and all it callers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* arm: fix build after c/s 25477:e12e0b038219Jan Beulich2012-06-281-0/+2
| | | | | | | | | | | Only x86 currently has a struct vcpu field arch.gdbsx_vcpu_event. But as the whole function domain_pause_for_debugger() is pointless to be compiled when there's no arch support, simply introduce another HAS_* macro, enabled only on x86. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* gdbsx: send virq to guest if gdbsx_vcpu_event is not activeMukesh Rathor2012-06-111-1/+3
| | | | | | | | gdbsx got broken along the way. During domain pause, don't send VIRQ_DEBUGGER to guest if gdbsx is active on that guest. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Committed-by: Keir Fraser <keir@xen.org>
* x86/mm: Clean up mem event structures on domain destructionTim Deegan2012-03-081-0/+3
| | | | | | | | | Otherwise we wind up with zombie domains, still holding onto refs to the mem event ring pages. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
* xen: allow global VIRQ handlers to be delegated to other domainsDaniel De Graaf2012-01-281-4/+4
| | | | | | | | | | | | | | | | | This patch sends global VIRQs to a domain designated as the VIRQ handler instead of sending all global VIRQ events to dom0. This is required in order to run xenstored in a stubdom, because VIRQ_DOM_EXC must be sent to xenstored for domain destruction to work properly. This patch was inspired by the xenstored stubdomain patch series sent to xen-devel by Alex Zeffertt in 2009. Signed-off-by: Diego Ongaro <diego.ongaro@citrix.com> Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* reflect cpupool in numa node affinityJuergen Gross2012-01-241-1/+15
| | | | | | | | | In order to prefer node local memory for a domain the numa node locality info must be built according to the cpus belonging to the cpupool of the domain. Signed-off-by: juergen.gross@ts.fujitsu.com Committed-by: Keir Fraser <keir@xen.org>
* switch to dynamically allocated cpumask in domain_update_node_affinity()Juergen Gross2012-01-241-4/+9
| | | | | | | cpumasks should rather be allocated dynamically. Signed-off-by: juergen.gross@ts.fujitsu.com Committed-by: Keir Fraser <keir@xen.org>
* A collection of fixes to Xen common filesStefano Stabellini2012-01-231-0/+2
| | | | | | | | | | | | | | | | - call free_xenoprof_pages only ifdef CONFIG_XENOPROF; - define PRI_stime as PRId64 in an header file; - respect boundaries in is_kernel_*; - implement is_kernel_rodata; - guest_physmap_add_page should be ((void)0). - fix guest_physmap_add_page; - introduce CONFIG_XENOPROF; - define _srodata and _erodata as const char*. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* Move cpufreq option parsing to cpufreq.cStefano Stabellini2012-01-231-33/+2
| | | | | | | | Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
* x86/hvm: No need to arch_set_info_guest() before restoring per-vcpu HVM state.Keir Fraser2012-01-221-13/+3
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* xen: Simplify callers of boot_vcpu(). In VCPUOP_up, checkKeir Fraser2012-01-201-12/+15
| | | | | | is_initialised under the per-domain lock. Signed-off-by: Keir Fraser <keir@xen.org>
* Free d->mem_event on domain destruction.Keir Fraser2011-11-301-0/+2
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* mem_event: move mem_event_domain out of struct domainOlaf Hering2011-11-301-0/+5
| | | | | | | | | | | | An upcoming change may increase the size of mem_event_domain. The result is a build failure because struct domain gets larger than a page. Allocate the room for the three mem_event_domain members at runtime. v2: - remove mem_ prefix from members of new struct Signed-off-by: Olaf Hering <olaf@aepfle.de> Committed-by: Keir Fraser <keir@xen.org>
* eliminate cpus_xyz()Jan Beulich2011-11-081-1/+1
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* eliminate cpumask accessors referencing NR_CPUSJan Beulich2011-10-211-1/+2
| | | | | | | ... in favor of using the new, nr_cpumask_bits-based ones. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* introduce and use nr_cpu_ids and nr_cpumask_bitsJan Beulich2011-10-211-1/+1
| | | | | | | | | | | | | | | The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS, where necessary, get adjusted accordingly), while the latter is for the sole use of determining the allocation size when dynamically allocating CPU masks (done later in this series). Adjust accessors to use either of the two to bound their bitmap operations - which one gets used depends on whether accessing the bits in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more efficient. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* replace d->nr_pirqs sized arrays with radix treeJan Beulich2011-06-231-11/+32
| | | | | | | | | | | | | | | With this it is questionable whether retaining struct domain's nr_pirqs is actually necessary - the value now only serves for bounds checking, and this boundary could easily be nr_irqs. Note that ia64, the build of which is broken currently anyway, is only being partially fixed up. v2: adjustments for split setup/teardown of translation data v3: re-sync with radix tree implementation changes Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Simplify preempt.h dependencies by moving in_atomic() to preempt.cKeir Fraser2011-06-231-0/+1
| | | | | | ..clean up the ensuing fallout. Signed-off-by: Keir Fraser <keir@xen.org>
* xen: Include headers that are actually needed, drop everything else.Christoph Egger2011-05-201-0/+2
| | | | Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
* Revert 23295:4891f1f41ba5 and 23296:24346f749826Keir Fraser2011-05-021-17/+11
| | | | | | Fails current lock checking mechanism in spinlock.c in debug=y builds. Signed-off-by: Keir Fraser <keir@xen.org>
* replace d->nr_pirqs sized arrays with radix treeJan Beulich2011-05-011-11/+17
| | | | | | | | | | | | | | | With this it is questionable whether retaining struct domain's nr_pirqs is actually necessary - the value now only serves for bounds checking, and this boundary could easily be nr_irqs. Another thing to consider is whether it's worth storing the pirq number in struct pirq, to avoid passing the number and a pointer to quite a number of functions. Note that ia64, the build of which is broken currently anyway, is only partially fixed up. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86: introduce alloc_vcpu_guest_context()Jan Beulich2011-04-051-3/+3
| | | | | | | | | | | | This is necessary because on x86-64 struct vcpu_guest_context is larger than PAGE_SIZE, and hence not suitable for a general purpose runtime allocation. On x86-32, FIX_PAE_HIGHMEM_* fixmap entries are being re-used, whiule on x86-64 new per-CPU fixmap entries get introduced. The implication of using per-CPU fixmaps is that these allocations have to happen from non-preemptable hypercall context (which they all do). Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Remove direct cpumask_t members from struct vcpu and struct domainJan Beulich2011-04-051-7/+23
| | | | | | | | | | | | | | | The CPU masks embedded in these structures prevent NR_CPUS-independent sizing of these structures. Basic concept (in xen/include/cpumask.h) taken from recent Linux. For scalability purposes, many other uses of cpumask_t should be replaced by cpumask_var_t, particularly local variables of functions. This implies that no functions should have by-value cpumask_t parameters, and that the whole old cpumask interface (cpus_...()) should go away in favor of the new (cpumask_...()) one. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Remove unmaintained Access Control Module (ACM) from hypervisor.Keir Fraser2011-03-251-1/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* Introduce rcu_lock_remote_target_domain_by_id().Keir Fraser2011-02-071-0/+14
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* Use bool_t for various boolean variablesKeir Fraser2010-12-241-1/+1
| | | | | | | | | | | ... decreasing cache footprint. As a prerequisite this requires making cmdline_parse() a little more flexible. Also remove a few variables altogether, and adjust sections annotations for several others. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xen.org>
* rcu_lock(current->domain) does not need to disable preemption.Keir Fraser2010-11-181-3/+6
| | | | | | | | | | If the guest sleeps in hypervisor context, it should not be destroyed until execution reaches a safe point (i.e., guest context). This is not implemented yet. :-) But the next patch will rely on it, to allow an HVM guest to execute hypercalls that indirectly invoke __hvm_copy() within an rcu_lock_current_domain() region. Signed-off-by: Keir Fraser <keir@xen.org>
* Wait queues, allowing conditional sleep in hypervisor context.Keir Fraser2010-11-171-0/+5
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* Make multicall state per-vcpu rather than per-cpuKeir Fraser2010-11-161-13/+0
| | | | | | | This is a prerequisite for allowing guest descheduling within a hypercall. Signed-off-by: Keir Fraser <keir@xen.org>
* x86: eliminate bogus IRQ restrictionsKeir Fraser2010-08-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As pointed out in http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00077.html the limits introduced in c/s 20072 are at least questionable. Eliminate them in favor of a more dynamic approach: There's no real need for an upper limit on nr_irqs (as anything beyond nr_irqs_gsi isn't visible to domains anyway), and the split point (and hence ratio) between GSI and MSI/MSI-X IRQs doesn't need to be hard coded, but can instead be controlled on the command line in case there are *very* many GSIs. The default used for nr_irqs will be rather large with this patch, so it may not be acceptable without also switching to a sparse irq_desc[] as was done not so lomg ago in Linux. The added capping of any domain's nr_pirqs is based on the observation that no domain can possibly have more than the system wide number of IRQs. The opposite case may in fact also require some adjustment: Defaulting the number of non-GSI IRQs available (namely to Dom0) to a fixed value may not be the best choice going forward, since if there indeed are very many non-GSI interrupt sources, it won't be possible for the kernel to make use of them without giving "extra_guest_irqs=" on the command line (but the goal should be to allow things to work right by default even on large systems). Signed-off-by: Jan Beulich <jbeulich@novell.com>
* numa: Small tweaks to domain_update_node_affinity() and its callers.Keir Fraser2010-08-041-4/+0
| | | | | From: Andrew Jones <drjones@redhat.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>