aboutsummaryrefslogtreecommitdiffstats
path: root/xen/include/xen/sched.h
Commit message (Collapse)AuthorAgeFilesLines
...
* relax vCPU pinned checksKeir Fraser2011-01-051-0/+2
| | | | | | | | | | | Both writing of certain MSRs and VCPUOP_get_physid make sense also for dynamically (perhaps temporarily) pinned vcpus. Likely a couple of other MSR writes (MSR_K8_HWCR, MSR_AMD64_NB_CFG, MSR_FAM10H_MMIO_CONF_BASE) would make sense to be restricted by an is_pinned() check too, possibly also some MSR reads. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Use bool_t for various boolean variablesKeir Fraser2010-12-241-1/+1
| | | | | | | | | | | ... decreasing cache footprint. As a prerequisite this requires making cmdline_parse() a little more flexible. Also remove a few variables altogether, and adjust sections annotations for several others. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xen.org>
* Move IDLE_DOMAIN_ID defn to public header, and change DOMID_INVALID to fix ↵Keir Fraser2010-12-091-2/+1
| | | | | | clash. Signed-off-by: Keir Fraser <keir@xen.org>
* rcu_lock(current->domain) does not need to disable preemption.Keir Fraser2010-11-181-3/+5
| | | | | | | | | | If the guest sleeps in hypervisor context, it should not be destroyed until execution reaches a safe point (i.e., guest context). This is not implemented yet. :-) But the next patch will rely on it, to allow an HVM guest to execute hypercalls that indirectly invoke __hvm_copy() within an rcu_lock_current_domain() region. Signed-off-by: Keir Fraser <keir@xen.org>
* Wait queues, allowing conditional sleep in hypervisor context.Keir Fraser2010-11-171-0/+4
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* Make multicall state per-vcpu rather than per-cpuKeir Fraser2010-11-161-0/+4
| | | | | | | This is a prerequisite for allowing guest descheduling within a hypercall. Signed-off-by: Keir Fraser <keir@xen.org>
* mem_event: Clean up and remove over-sized paused_vcpus[] array.Keir Fraser2010-09-151-9/+3
| | | | | | | This cuts the size of the domain structure by around 30kB! It is now a little over a page in size. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* numa: Attempt more efficient NUMA allocation in hypervisor by default.Keir Fraser2010-08-041-0/+9
| | | | | | | | | 1. Try to allocate from nodes containing CPUs which a guest can be scheduled on. 2. Remember which node we allocated from last, and round-robin allocations among above-mentioned nodes. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* xen: use s_time_t for periodic timer deadlines.Keir Fraser2010-07-081-2/+2
| | | | | | | | | Otherwise vcpu_periodic_timer_work() can think the next timer is in the future (and re-issue it unchanged) while timer_softirq_action() thinks it's in the past (and fires it immediately), leading to livelock. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
* Fix sched_adjust_global() and clean up surrounding code.Keir Fraser2010-06-171-3/+4
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86: IRQ affinity should track vCPU affinityKeir Fraser2010-06-171-1/+8
| | | | | | | | | | | | | | With IRQs getting bound to the CPU the binding vCPU currently runs on there can result quite a bit of extra cross CPU traffic as soon as that vCPU moves to a different pCPU. Likewise, when a domain re-binds an event channel associated with a pIRQ, that IRQ's affinity should also be adjusted. The open issue is how to break ties for interrupts shared by multiple domains - currently, the last request (at any point in time) is being honored. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* rcu: Update all rcu_read_lock() users to implement a dummy RCU read lock.Keir Fraser2010-06-101-4/+4
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Watchdog timers for domainsKeir Fraser2010-06-041-1/+10
| | | | | | | | Each domain is allowed to set, reset and disable its timers; when any timer runs out the domain is killed. Patch from Christian Limpach <Christian.Limpach@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
* notify_via_xen_event_channel() takes explicit domain parameter.Keir Fraser2010-06-041-2/+0
| | | | | | Also remove pointless tasklet from mem_event notify path. Signed-off-by: John Byrne <john.l.byrne@hp.com>
* cpupool: Clean up unused prorotype and duplicate trace call.Keir Fraser2010-05-261-1/+0
| | | | Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* cpupool: Fix CPU hotplug after recent changes.Keir Fraser2010-05-171-2/+2
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86: Implement cpu hotplug notifiers. Use them.Keir Fraser2010-05-141-3/+0
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* tasklet: Improve scheduler interaction.Keir Fraser2010-05-111-2/+4
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* scheduler: Add a global parameter adjustment to the switchable scheduler ↵Keir Fraser2010-05-041-0/+1
| | | | | | | | | | | | interface ...along with a new sysctl to call it directly. This is in order to support DornerWorks' new ARINC653 scheduler. Based on code from Josh Holtrop and Kathy Hadley at DornerWorks, Ltd Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Keir Fraser <keir.fraser@eu.citrix.com>
* cpupool: Control interface should be a sysctl rather than a domctl.Keir Fraser2010-05-041-1/+2
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* cpupools [1/6]: hypervisor changesKeir Fraser2010-04-211-1/+21
| | | | Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* stopmachine: Implement using tasklets rather than a softirq.Keir Fraser2010-04-191-0/+3
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Implement tasklets as running in VCPU context (sepcifically, idle-VCPU context)Keir Fraser2010-04-191-0/+7
| | | | | | | | ...rather than in softirq context. This is expected to avoid a lot of subtle deadlocks relating to the fact that softirqs can interrupt a scheduled vcpu. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Move tasklet implementation into its own source files.Keir Fraser2010-04-191-0/+1
| | | | | | | | This is preparation for implementing tasklets in vcpu context rather than softirq context. There is no change to the implementation of tasklets in this patch. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Improvements and bug fixes to continue_hypercall_on_cpu().Keir Fraser2010-04-141-8/+0
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Architecture-independent, and tasklet-based, continue_hypercall_on_cpu().Keir Fraser2010-04-141-6/+4
| | | | | Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* cpuidle: do not enter deep C state if there is urgent VCPUKeir Fraser2010-02-161-0/+2
| | | | | | | | | | | | | when VCPU is polling on event channel, it usually has urgent task running, e.g. spin_lock, in this case, it is better for cpuidle driver not to enter deep C state. This patch fix the issue that SLES 11 SP1 domain0 hangs in the box of large number of CPUs (>= 64 CPUs). Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* evtchn: Do not free d->poll_mask until domain is being deallocated.Keir Fraser2009-12-241-2/+3
| | | | | | Avoids crash on dereference of poll_mask after domain_kill(). Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Domctls defined for all relevant memory sharing operations.Keir Fraser2009-12-171-0/+1
| | | | Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
* Base domain structure and public interface to support memory events.Keir Fraser2009-12-171-0/+30
| | | | Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
* x86: fix MCE/NMI injectionKeir Fraser2009-12-011-13/+9
| | | | | | | | | | | | | | | | | | | | This attempts to address all the concerns raised in http://lists.xensource.com/archives/html/xen-devel/2009-11/msg01195.html, but I'm nevertheless still not convinced that all aspects of the injection handling really work reliably. In particular, while the patch here on top of the fixes for the problems menioned in the referenced mail also adds code to keep send_guest_trap() from injecting multiple events at a time, I don't think the is the right mechanism - it should be possible to handle NMI/MCE nested within each other. Another fix on top of the ones for the earlier described problems is that the vCPU affinity restore logic didn't account for software injected NMIs - these never set cpu_affinity_tmp, but due to it most likely being different from cpu_affinity it would have got restored (to a potentially random value) nevertheless. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Fix nomigrate option implementation so that Xen builds.Keir Fraser2009-10-201-0/+3
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Per-domain switch to disable oos shadow page tablesKeir Fraser2009-10-191-0/+3
| | | | Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* Spinlock profiling (enable in build with lock_profile=y)Keir Fraser2009-10-141-0/+2
| | | | | | | Adds new tool xenlockprof to run from dom0. From: Juergen Gross <juergen.gross@ts.fujitsu.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* [IOMMU] dynamic VTd page table for HVM guestKeir Fraser2009-09-031-1/+1
| | | | | | | | This patch makes HVM's VTd page table dynamic just like what PV guest does, so that avoid the overhead of maintaining page table until a PCI device is truly assigned to the HVM guest. Signed-Off-By: Zhai, Edwin <edwin.zhai@intel.com>
* x86_64: allow more vCPU-s per guestKeir Fraser2009-06-181-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Since the shared info layout is fixed, guests are required to use VCPUOP_register_vcpu_info prior to booting any vCPU beyond the traditional limit of 32. MAX_VIRT_CPUS, being an implemetation detail of the hypervisor, is no longer being exposed in the public headers. The tools changes are clearly incomplete (and done only so things would build again), and the current state of the tools (using scalar variables all over the place to represent vCPU bitmaps) very likely doesn't permit booting DomU-s with more than the traditional number of vCPU-s. Testing of the extended functionality was done with Dom0 (96 vCPU-s, as well as 128 vCPU-s out of which the kernel elected - by way of a simple kernel side patch - to use only some, resulting in a sparse bitmap). ia64 changes only to make things build, and build-tested only (and the tools part only as far as the build would go without encountering unrelated problems in the blktap code). Signed-off-by: Jan Beulich <jbeulich@novell.com>
* x86 hvm: move dirty_vram into struct hvm_domainKeir Fraser2009-06-051-3/+0
| | | | Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* x86: eliminate hard-coded NR_IRQSKeir Fraser2009-05-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... splitting it into global nr_irqs (determined at boot time) and per- domain nr_pirqs (derived from nr_irqs and a possibly command line specified value, which probably should later become a per-domain config setting). This has the (desirable imo) side effect of reducing the size of struct hvm_irq_dpci from requiring an order-3 page to order-2 (on x86-64), which nevertheless still is too large. However, there is now a variable size bit array on the stack in pt_irq_time_out() - while for the moment this probably is okay, it certainly doesn't look nice. However, replacing this with a static (pre-)allocation also seems less than ideal, because that would require at least min(d->nr_pirqs, NR_VECTORS) bit arrays of d->nr_pirqs bits, since this bit array is used outside of the serialized code region in that function, and keeping the domain's event lock acquired across pirq_guest_eoi() doesn't look like a good idea either. The IRQ- and vector-indexed arrays hanging off struct hvm_irq_dpci could in fact be changed further to dynamically use the smaller of the two ranges for indexing, since there are other assumptions about a one-to-one relationship between IRQs and vectors here and elsewhere. Additionally, it seems to me that struct hvm_mirq_dpci_mapping's digl_list and gmsi fields could really be overlayed, which would yield significant savings since this structure gets always instanciated in form of d->nr_pirqs (as per the above could also be the smaller of this and NR_VECTORS) dimensioned arrays. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Transcendent memory ("tmem") for Xen.Keir Fraser2009-05-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tmem, when called from a tmem-capable (paravirtualized) guest, makes use of otherwise unutilized ("fallow") memory to create and manage pools of pages that can be accessed from the guest either as "ephemeral" pages or as "persistent" pages. In either case, the pages are not directly addressible by the guest, only copied to and fro via the tmem interface. Ephemeral pages are a nice place for a guest to put recently evicted clean pages that it might need again; these pages can be reclaimed synchronously by Xen for other guests or other uses. Persistent pages are a nice place for a guest to put "swap" pages to avoid sending them to disk. These pages retain data as long as the guest lives, but count against the guest memory allocation. Tmem pages may optionally be compressed and, in certain cases, can be shared between guests. Tmem also handles concurrency nicely and provides limited QoS settings to combat malicious DoS attempts. Save/restore and live migration support is not yet provided. Tmem is primarily targeted for an x86 64-bit hypervisor. On a 32-bit x86 hypervisor, it has limited functionality and testing due to limitations of the xen heap. Nearly all of tmem is architecture-independent; three routines remain to be ported to ia64 and it should work on that architecture too. It is also structured to be portable to non-Xen environments. Tmem defaults off (for now) and must be enabled with a "tmem" xen boot option (and does nothing unless a tmem-capable guest is running). The "tmem_compress" boot option enables compression which takes about 10x more CPU but approximately doubles the number of pages that can be stored. Tmem can be controlled via several "xm" commands and many interesting tmem statistics can be obtained. A README and internal specification will follow, but lots of useful prose about tmem, as well as Linux patches, can be found at http://oss.oracle.com/projects/tmem . Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
* xenpm: Set scheduler vcpu_migration_delay by xenpmKeir Fraser2009-04-061-0/+3
| | | | | Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
* cpuidle: suspend/resume scheduler tick timer during cpu idle state entry/exitKeir Fraser2009-03-311-0/+2
| | | | | | | | | | | | cpuidle can collaborate with scheduler to reduce unnecessary timer interrupt. For example, credit scheduler accounting timer doesn't need to be active at idle time, so it can be stopped at cpuidle entry and resumed at cpuidle exit. This patch implements this function by adding two ops in scheduler: tick_suspend/tick_resume, and implement them for credit scheduler Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com>
* xenpm: Add a small scheduler knob "sched_smt_power_savings"Keir Fraser2009-03-201-0/+2
| | | | | | | | | Current scheduler only care performance, thus always picks pCPU from the most idle package. This knob provides another option to pick pCPU from least idle package, for user who want performance power balance. Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com>
* Fix MAX_EVTCHNS() definition.Keir Fraser2009-03-171-1/+1
| | | | | | Pointed out by Jan Beulich. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* Improve vcpu_migration_delay handling.Keir Fraser2009-03-111-0/+3
| | | | Signed-off-by: Xiaowei Yang <xiaowei.yang@intel.com>
* x86: Fix event-channel access for 32-bit HVM guests.Keir Fraser2009-03-031-8/+3
| | | | | | Based on a patch by Joe Jin <joe.jin@oracle.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* txt: Xen per-domain S3 integrity configKeir Fraser2009-03-031-6/+10
| | | | | | | | | | | | | | This patch adds a per-domain flag to specify whether a domain will be S3 integrity protected when Xen is launched using tboot/TXT. The tools now support an integer domain configuration parameter called 's3_integrity', which defaults to 1, to enable S3 integrity protection. The struct arch_domain structure has been extended to have an 's3_integrity' field that represents this setting. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
* ia64: fix compilation errorKeir Fraser2009-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | This patch fixes the following compilation error. Since struct page_list_head is defined in mm.h, sched.h needs mm.h. Other circular inclusions are sorted out. > In file included from xen/include/asm-ia64/linux-xen/asm/smp.h:50, > from xen/include/linux/smp.h:5, > from xen/include/asm-ia64/linux/topology.h:33, > from xen/include/asm-ia64/linux-xen/linux/gfp.h:6, > from xen/include/asm/mm.h:11, > from xen/include/xen/mm.h:90, > from viosapic.c:35: > xen/include/xen/sched.h:174: error: field page_list has incomplete > type > xen/include/xen/sched.h:175: error: field xenpage_list has > incomplete type Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
* x86-64: use MFNs for linking together pages on listsKeir Fraser2009-01-301-2/+2
| | | | | | | | | | | | | Unless more than 16Tb are going to ever be supported in Xen, this will allow reducing the linked list entries in struct page_info from 16 to 8 bytes. This doesn't modify struct shadow_page_info, yet, so in order to meet the constraints of that 'mirror' structure the list entry gets artificially forced to be 16 bytes in size. That workaround will be removed in a subsequent patch. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Fix ia64 build.Keir Fraser2009-01-231-7/+0
| | | | Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
* x86: Support booting a bzImage format domain 0 kernel.Keir Fraser2009-01-221-1/+2
| | | | | | | | This requires a bzImage v2.08 or later kernel. xen/common/inflate.c is taken unmodified from Linux v2.6.28. Signed-off-by: Ian Campbell <ian.campbell@citrix.com>