aboutsummaryrefslogtreecommitdiffstats
path: root/xen/common/schedule.c
Commit message (Collapse)AuthorAgeFilesLines
* evtchn: use a per-domain variable for the max number of event channelsDavid Vrabel2013-10-141-1/+1
| | | | | | | | | Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead. This avoids having to repeatedly check the ABI type. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: refactor low-level event channel port opsDavid Vrabel2013-10-141-1/+2
| | | | | | | | | | | | | Use functions for the low-level event channel port operations (set/clear pending, unmask, is_pending and is_masked). Group these functions into a struct evtchn_port_op so they can be replaced by alternate implementations (for different ABIs) on a per-domain basis. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* sched: fix race between sched_move_domain() and vcpu_wake()David Vrabel2013-10-141-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: David Vrabel <david.vrabel@citrix.com> sched_move_domain() changes v->processor for all the domain's VCPUs. If another domain, softirq etc. triggers a simultaneous call to vcpu_wake() (e.g., by setting an event channel as pending), then vcpu_wake() may lock one schedule lock and try to unlock another. vcpu_schedule_lock() attempts to handle this but only does so for the window between reading the schedule_lock from the per-CPU data and the spin_lock() call. This does not help with sched_move_domain() changing v->processor between the calls to vcpu_schedule_lock() and vcpu_schedule_unlock(). Fix the race by taking the schedule_lock for v->processor in sched_move_domain(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Use vcpu_schedule_lock_irq() (which now returns the lock) to properly retry the locking should the to be used lock have changed in the course of acquiring it (issue pointed out by George Dunlap). Add a comment explaining the state after the v->processor adjustment. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* scheduler: adjust internal locking interfaceJan Beulich2013-10-141-30/+31
| | | | | | | | | | | | | | | Make the locking functions return the lock pointers, so they can be passed to the unlocking functions (which in turn can check that the lock is still actually providing the intended protection, i.e. the parameters determining which lock is the right one didn't change). Further use proper spin lock primitives rather than open coded local_irq_...() constructs, so that interrupts can be re-enabled as appropriate while spinning. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/idle: Fix get_cpu_idle_time()'s interaction with offline pcpusAndrew Cooper2013-10-041-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checking for "idle_vcpu[cpu] != NULL" is insufficient protection against offline pcpus. From a hypercall, vcpu_runstate_get() will determine "v != current", and try to take the vcpu_schedule_lock(). This will try to look up per_cpu(schedule_data, v->processor) and promptly suffer a NULL structure deference as v->processors' __per_cpu_offset is INVALID_PERCPU_AREA. One example might look like this: ... Xen call trace: [<ffff82c4c0126ddb>] vcpu_runstate_get+0x50/0x113 [<ffff82c4c0126ec6>] get_cpu_idle_time+0x28/0x2e [<ffff82c4c012b5cb>] do_sysctl+0x3db/0xeb8 [<ffff82c4c023280d>] compat_hypercall+0xbd/0x116 Pagetable walk from 0000000000000040: L4[0x000] = 0000000186df8027 0000000000028207 L3[0x000] = 0000000188e36027 00000000000261c9 L2[0x000] = 0000000000000000 ffffffffffffffff **************************************** Panic on CPU 11: ... get_cpu_idle_time() has been updated to correctly deal with offline pcpus itself by returning 0, in the same way as it would if it was missing the idle_vcpu[] pointer. In doing so, XENPF_getidletime needed updating to correctly retain its described behaviour of clearing bits in the cpumap for offline pcpus. As this crash can only be triggered with toolstack hypercalls, it is not a security issue and just a simple bug. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: re-enable VCPUOP_register_vcpu_time_memory_areaJan Beulich2013-05-271-2/+0
| | | | | | | | | | By moving the call to update_vcpu_system_time() out of schedule() into arch-specific context switch code, the original problem of the function accessing the wrong domain's address space goes away (obvious even from patch context, as update_runstate_area() does similar copying). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: introduce vcpu_blockStefano Stabellini2013-04-301-5/+8
| | | | | | | | | | | | | Rename do_block to vcpu_block. Move the call to local_event_delivery_enable out of vcpu_block, to a new static function called vcpu_block_enable_events. Use vcpu_block_enable_events instead of do_block throughout in schedule.c Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: allow for explicitly specifying node-affinityDario Faggioli2013-04-171-0/+5
| | | | | | | | | | | | | | | Make it possible to pass the node-affinity of a domain to the hypervisor from the upper layers, instead of always being computed automatically. Note that this also required generalizing the Flask hooks for setting and getting the affinity, so that they now deal with both vcpu and node affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/S3: Restore broken vcpu affinity on resumeBen Guthro2013-04-021-3/+42
| | | | | | | | | | | | | When in SYS_STATE_suspend, and going through the cpu_disable_scheduler path, save a copy of the current cpu affinity, and mark a flag to restore it later. Later, in the resume process, when enabling nonboot cpus restore these affinities. Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* sched: always ask the scheduler to re-place the vcpu when the affinity changesGeorge Dunlap2013-03-081-3/+4
| | | | | | | | | | | | | | It's probably a good idea to re-evaluate placement whenever the affinity changes. This additionally has the benefit of removing scheduler-specific exceptions introduced in git c/s e6a6fd63. The conditionals surrounding vcpu_migrate() are left pending a re-work of the logic to avoid the common case calling vcpu_migrate() twice (once here, and once in context_saved(). Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* change arguments of do_kexec_op and compat_set_timer_op prototypesRobbie VanVossen2013-03-061-0/+1
| | | | | | | | | | | | | ... to match the actual functions. Signed-off-by: Robbie VanVossen <robert.vanvossen@dornerworks.com> Also make sure the source files defining these symbols include the header declaring them (had we done so, the problem would have been noticed long ago). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* SEDF: avoid gathering vCPU-s on pCPU0Jan Beulich2013-03-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was incompatible with the SEDF scheduler: Any vCPU using VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux guests) ends up on pCPU0 after that call. Obviously, running all PV guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those guests rather sooner than later. So the main thing that was clearly wrong (and bogus from the beginning) was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced by a construct that prefers to put back the vCPU on the pCPU that it got launched on. However, there's one more glitch: When reducing the affinity of a vCPU temporarily, and then widening it again to a set that includes the pCPU that the vCPU was last running on, the generic scheduler code would not force a migration of that vCPU, and hence it would forever stay on the pCPU it last ran on. Since that can again create a load imbalance, the SEDF scheduler wants a migration to happen regardless of it being apparently unnecessary. Of course, an alternative to checking for SEDF explicitly in vcpu_set_affinity() would be to introduce a flags field in struct scheduler, and have SEDF set a "always-migrate-on-affinity-change" flag. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Avoid stale pointer when moving domain to another cpupoolJuergen Gross2013-02-281-6/+14
| | | | | | | | | | When a domain is moved to another cpupool the scheduler private data pointers in vcpu and domain structures must never point to an already freed memory area. While at it, simplify sched_init_vcpu() by using DOM2OP instead VCPU2OP. Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-211-1/+1
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* xen: Introduce ASSERT_NOT_IN_ATOMIC() to give more info on in_atomic() crash.Keir Fraser2013-01-141-1/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* xen/xsm: distinguish scheduler get/set operationsDaniel De Graaf2013-01-111-1/+9
| | | | | | | | Add getscheduler and setscheduler permissions to replace the monolithic scheduler permission in the scheduler_op domctl and sysctl. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
* xen/xsm: Add xsm_default parameter to XSM hooksDaniel De Graaf2013-01-111-1/+1
| | | | | | | | | | | | | | Include the default XSM hook action as the first argument of the hook to facilitate quick understanding of how the call site is expected to be used (dom0-only, arbitrary guest, or device model). This argument does not solely define how a given hook is interpreted, since any changes to the hook's default action need to be made identically to all callers of a hook (if there are multiple callers; most hooks only have one), and may also require changing the arguments of the hook. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* xen: use XSM instead of IS_PRIV where duplicatedDaniel De Graaf2013-01-111-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Xen hypervisor has two basic access control function calls: IS_PRIV and the xsm_* functions. Most privileged operations currently require that both checks succeed, and many times the checks are at different locations in the code. This patch eliminates the explicit and implicit IS_PRIV checks that are duplicated in XSM hooks. When XSM_ENABLE is not defined or when the dummy XSM module is used, this patch should not change any functionality. Because the locations of privilege checks have sometimes moved below argument validation, error returns of some functions may change from EPERM to EINVAL or ESRCH if called with invalid arguments and from a domain without permission to perform the operation. Some checks are removed due to non-obvious duplicates in their callers: * acpi_enter_sleep is checked in XENPF_enter_acpi_sleep * map_domain_pirq has IS_PRIV_FOR checked in its callers: * physdev_map_pirq checks when acquiring the RCU lock * ioapic_guest_write is checked in PHYSDEVOP_apic_write * PHYSDEVOP_{manage_pci_add,manage_pci_add_ext,pci_device_add} are checked by xsm_resource_plug_pci in pci_add_device * PHYSDEVOP_manage_pci_remove is checked by xsm_resource_unplug_pci in pci_remove_device * PHYSDEVOP_{restore_msi,restore_msi_ext} are checked by xsm_resource_setup_pci in pci_restore_msi_state * do_console_io has changed to IS_PRIV from an explicit domid==0 Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
* scheduler: fix rate limit range checkingJan Beulich2012-12-101-0/+12
| | | | | | | | | | | | For one, neither of the two checks permitted for the documented value of zero (disabling the functionality altogether). Second, the range checking of the command line parameter was done by the credit scheduler's initialization code, despite it being a generic scheduler option. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: sched: generalize scheduling related perfcounter macrosDario Faggioli2012-10-231-3/+5
| | | | | | | | | | | | | | Moving some of them from sched_credit.c to generic scheduler code. This also allows the other schedulers to use perf counters equally easy. This change is mainly preparatory work for what stated above. In fact, it mostly does s/CSCHED_STAT/SCHED_STAT/, and, in general, it implies no functional changes. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* xen: replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when appropriateStefano Stabellini2012-10-171-1/+1
| | | | | | | | | | | | Note: these changes don't make any difference on x86. Replace XEN_GUEST_HANDLE with XEN_GUEST_HANDLE_PARAM when it is used as an hypercall argument. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* make domain_create() return a proper error codeJan Beulich2012-09-031-2/+2
| | | | | | | | | | | | | | While triggered by the XSA-9 fix, this really is of more general use; that fix just pointed out very sharply that the current situation with all domain creation failures reported to user (tools) space as -ENOMEM is very unfortunate (actively misleading users _and_ support personnel). Pull over the pointer <-> error code conversion infrastructure from Linux, and use it in domain_create() and all it callers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen, cpupools: Fix cpupool-move to make more consistentGeorge Dunlap2012-04-101-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | The full order for creating new private data structures when moving from one pool to another is now: * Allocate all new structures - Allocate a new private domain structure (but don't point there yet) - Allocate per-vcpu data structures (but don't point there yet) * Remove old structures - Remove each vcpu, freeing the associated data structure - Free the domain data structure * Switch to the new structures - Set the domain to the new cpupool, with the new private domain structure - Set each vcpu to the respective new structure, and insert This is in line with a (fairly reasonable) assumption in credit2 that the private structure of the domain will be the private structure pointed to by the per-vcpu private structure. Also fix a bug, in which insert_vcpu was called with the *old* vcpu ops rather than the new ones. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* xen: Fix schedule()'s grabbing of the schedule lockGeorge Dunlap2012-04-101-3/+4
| | | | | | | | | | | | Because the location of the lock can change between the time you read it and the time you grab it, the per-cpu schedule locks need to check after lock acquisition that the lock location hasn't changed, and release and re-try if so. This change was effected throughout the source code, but one very important place was apparently missed: in schedule() itself. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* Introduce system_state variable.Keir Fraser2012-03-221-1/+1
| | | | | | | | | | Use it to replace x86-specific early_boot boolean variable. Also use it to detect suspend/resume case during cpu offline/online to avoid unnecessarily breaking vcpu and cpupool affinities. Signed-off-by: Keir Fraser <keir@xen.org> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* scheduler: Add a #define for the default ratelimitGeorge Dunlap2012-02-231-1/+1
| | | | | Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* reflect cpupool in numa node affinityJuergen Gross2012-01-241-7/+3
| | | | | | | | | In order to prefer node local memory for a domain the numa node locality info must be built according to the cpus belonging to the cpupool of the domain. Signed-off-by: juergen.gross@ts.fujitsu.com Committed-by: Keir Fraser <keir@xen.org>
* introduce and use common macros for selecting cpupool based cpumasksJuergen Gross2012-01-241-4/+2
| | | | | | | | There are several instances of the same construct finding the cpumask for a cpupool. Use macros instead. Signed-off-by: juergen.gross@ts.fujitsu.com Committed-by: Keir Fraser <keir@xen.org>
* sched_credit: Use delay to control scheduling frequencyHui Lv2012-01-171-0/+5
| | | | | | | | | | | | | | | | | | | | This patch can improve Xen performance: 1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler. 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance) 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn't bring very much response time increase. 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting) Signed-off-by: Hui Lv <hui.lv@intel.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* Rework locking for sched_adjust.Dario Faggioli2012-01-041-32/+2
| | | | | | | | | | | | | | | | | The main idea is to move (as much as possible) locking logic from generic code to the various pluggable schedulers. While at it, the following is also accomplished: - pausing all the non-current VCPUs of a domain while changing its scheduling parameters is not effective in avoiding races and it is prone to deadlock, so that is removed. - sedf needs a global lock for preventing races while adjusting domains' scheduling parameters (as it is for credit and credit2), so that is added. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* sched_sedf: Avoid panic when adjusting sedf parametersJuergen Gross2011-11-181-4/+1
| | | | | | | | | | When using sedf scheduler in a cpupool the system might panic when setting sedf scheduling parameters for a domain. Introduces for_each_domain_in_cpupool macro as it is usable 4 times now. Add appropriate locking in cpupool_unassign_cpu(). Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Committed-by: Keir Fraser <keir@xen.org>
* eliminate first_cpu() etcJan Beulich2011-11-081-1/+1
| | | | | | | | This includes the conversion from for_each_cpu_mask() to for_each-cpu(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* eliminate cpus_xyz()Jan Beulich2011-11-081-1/+1
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* cpupools: allocate CPU masks dynamicallyJan Beulich2011-10-211-7/+7
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* eliminate cpumask accessors referencing NR_CPUSJan Beulich2011-10-211-2/+2
| | | | | | | ... in favor of using the new, nr_cpumask_bits-based ones. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* introduce and use nr_cpu_ids and nr_cpumask_bitsJan Beulich2011-10-211-1/+1
| | | | | | | | | | | | | | | The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS, where necessary, get adjusted accordingly), while the latter is for the sole use of determining the allocation size when dynamically allocating CPU masks (done later in this series). Adjust accessors to use either of the two to bound their bitmap operations - which one gets used depends on whether accessing the bits in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more efficient. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Return -EINVAL when trying to kick/kill a nonexistent domain watchdogKeir Fraser2011-10-141-2/+2
| | | | | | | | | | | | | | | | ... to be more in-line with the NR_DOMAIN_WATCHDOG_TIMERS check at the top of domain_watchdog(), and also to follow the timer_(delete|settime) POSIX API's EINVAL return value. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Also, replace EEXIST with ENOSPC when failing to allocate a new domain watchdog. Signed-off-by: Keir Fraser <keir@xen.org> Committed-by: Keir Fraser <keir@xen.org>
* constify vcpu_set_affinity()'s second parameterJan Beulich2011-10-131-4/+2
| | | | | | | | | | | None of the callers actually make use of the function's returning of the old affinity through its second parameter, and eliminating this capability allows some callers to no longer use a local variable here, reducing their stack footprint significantly when building with large NR_CPUS. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* use xzalloc in common codeJan Beulich2011-10-041-2/+1
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Avoid race in schedule() when switching schedulersJuergen Gross2011-09-171-1/+2
| | | | | | | | Selecting the scheduler to call must be done under lock. Otherwise a race might occur when switching schedulers in a cpupool Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
* xen: remove more declarations from C files.Tim Deegan2011-05-271-4/+0
| | | | | | | | | | This patch moves some more, mostly data, extern declarations into header files. I haven't been as strict as I was with functions; in particular there are a number of declarations of assembler labels that are only used in one place. I've also left a few compat-mode tricks, and all the magic in symbols.c Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
* credit2: add a callback to migrate to a new cpuGeorge Dunlap2011-04-271-1/+5
| | | | | | | | | | | | | | In credit2, there needs to be a strong correlation between v->processor and the runqueue to which a vcpu is assigned; much of the code relies on this invariant. Allow credit2 to manage the actual migration itself. This fixes the most recent credit2 bug reported on the list (Xen BUG at sched_credit2.c:1606) in Xen 4.1, as well as the bug at sched_credit2.c:811 in -unstable (which catches the same condition earlier). Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* Remove direct cpumask_t members from struct vcpu and struct domainJan Beulich2011-04-051-11/+11
| | | | | | | | | | | | | | | The CPU masks embedded in these structures prevent NR_CPUS-independent sizing of these structures. Basic concept (in xen/include/cpumask.h) taken from recent Linux. For scalability purposes, many other uses of cpumask_t should be replaced by cpumask_var_t, particularly local variables of functions. This implies that no functions should have by-value cpumask_t parameters, and that the whole old cpumask interface (cpus_...()) should go away in favor of the new (cpumask_...()) one. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* move register_cpu_notifier() into .init.textJan Beulich2011-04-021-9/+16
| | | | | | | With no modular drivers, all CPU notifier setup is supposed to happen during boot. There also is a respective comment in the function.=20 Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Avoid endless loop for vcpu migration.Juergen Gross2011-03-151-1/+21
| | | | | | | | Only call SCHED_OP(pick_cpu) if a previously picked cpu is not suitable on the current iteration of the retry loop. Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Signed-off-by: Keir Fraser <keir@xen.org>
* Avoid possible live-lock in vcpu_migrateJuergen Gross2011-02-281-27/+44
| | | | | | | | | | | | If vcpu_migrate is called for two vcpus active on different cpus resulting in swapping the cpus, a live-lock could occur as both instances try to take the scheduling lock of the physical cpus in opposite order. To avoid this problem the locks are always taken in the same order (sorted by the address of the lock). Signed-off-by: Jueregn Gross <juergen.gross@ts.fujitsu.com>
* cpupool: Avoid race when moving cpu between cpupoolsJuergen Gross2011-02-251-7/+36
| | | | | | | | | | | | Moving cpus between cpupools is done under the schedule lock of the moved cpu. When checking a cpu being member of a cpupool this must be done with the lock of that cpu being held. Hot-unplugging of physical cpus might encounter the same problems, but this should happen only very rarely. Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
* Cpupools: vcpu affinity handlingJuergen Gross2011-02-101-1/+3
| | | | | | | | | If a vcpu is pinned to multiple physical cpus, the pinning is not removed if all those physical cpus are removed from the cpupool. When disabling the scheduler on a cpu, the affinity mask must be checked against the cpumask of the cpupool. Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* cpupool: Check for memory allocation failure on switching schedulersKeir Fraser2011-02-061-2/+11
| | | | | | | | When switching schedulers on a physical cpu due to a cpupool operation check for a potential memory allocation failure and stop the operation gracefully. Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* Use bool_t for various boolean variablesKeir Fraser2010-12-241-1/+1
| | | | | | | | | | | ... decreasing cache footprint. As a prerequisite this requires making cmdline_parse() a little more flexible. Also remove a few variables altogether, and adjust sections annotations for several others. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xen.org>