aboutsummaryrefslogtreecommitdiffstats
path: root/xen/include/xen/sched.h
Commit message (Collapse)AuthorAgeFilesLines
* xen/evtchn: Fix build on ARMJulien Grall2013-10-151-0/+1
| | | | | | | | | The recent event channel changes introduced by commit a77eb86 and before... break the compilation on Xen ARM. This commit adds missing includes in common/event_fifo.c and include/xen/sched.h. Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* Add DOMCTL to limit the number of event channels a domain may useDavid Vrabel2013-10-141-0/+1
| | | | | | | | | | | | | | | Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to set the maximum event channel port a domain may use. This may be used to limit the amount of Xen resources (global mapping space and xenheap) that a domain may use for event channels. A domain that does not have a limit set may use all the event channels supported by the event channel ABI in use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: add FIFO-based event channel hypercalls and port opsDavid Vrabel2013-10-141-1/+5
| | | | | | | | | | Add the implementation for the FIFO-based event channel ABI. The new hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array) and the required evtchn_ops (set_pending, unmask, etc.). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: add FIFO-based event channel ABIDavid Vrabel2013-10-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Add the event channel hypercall sub-ops and the definitions for the shared data structures for the FIFO-based event channel ABI. The design document for this new ABI is available here: http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf In summary, events are reported using a per-domain shared event array of event words. Each event word has PENDING, LINKED and MASKED bits and a LINK field for pointing to the next event in the event queue. There are 16 event queues (with different priorities) per-VCPU. Key advantages of this new ABI include: - Support for over 100,000 events (2^17). - 16 different event priorities. - Improved fairness in event latency through the use of FIFOs. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: allow many more evtchn objects to be allocated per domainDavid Vrabel2013-10-141-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expand the number of event channels that can be supported internally by altering now struct evtchn's are allocated. The objects are indexed using a two level scheme of groups and buckets (instead of only buckets). Each group is a page of bucket pointers. Each bucket is a page-sized array of struct evtchn's. The optimal number of evtchns per bucket is calculated at compile time. If XSM is not enabled, struct evtchn is 16 bytes and each bucket contains 256, requiring only 1 group of 512 pointers for 2^17 (131,072) event channels. With XSM enabled, struct evtchn is 24 bytes, each bucket contains 128 and 2 groups are required. For the common case of a domain with only a few event channels, instead of requiring an additional allocation for the group page, the first bucket is indexed directly. As a consequence of this, struct domain shrinks by at least 232 bytes as 32 bucket pointers are replaced with 1 bucket pointer and (at most) 2 group pointers. [ Based on a patch from Wei Liu with improvements from Malcolm Crossley. ] Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: use a per-domain variable for the max number of event channelsDavid Vrabel2013-10-141-1/+1
| | | | | | | | | Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead. This avoids having to repeatedly check the ABI type. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: refactor low-level event channel port opsDavid Vrabel2013-10-141-0/+4
| | | | | | | | | | | | | Use functions for the low-level event channel port operations (set/clear pending, unmask, is_pending and is_masked). Group these functions into a struct evtchn_port_op so they can be replaced by alternate implementations (for different ABIs) on a per-domain basis. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* sched: Correct function prototypesAndrew Cooper2013-10-141-3/+3
| | | | | | | struct vcpu pointers are traditionally v rather than d. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Improve information from domain_crash_synchronousAndrew Cooper2013-10-041-0/+7
| | | | | | | | | | | | | | | | | | | | | As it currently stands, the string "domain_crash_sync called from entry.S" is not helpful at identifying why the domain was crashed, and a debug build of Xen doesn't help the matter This patch improves the information printed, by pointing to where the crash decision was made. Specific improvements include: * Moving the ascii string "domain_crash_sync called from entry.S\n" away from some semi-hot code cache lines. * Moving the printk into C code (especially as this_cpu() is miserable to use in assembly code) * Undo the previous confusing situation of having the domain_crash_synchronous() as a macro in C code, yet a global symbol in assembly code. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* console: buffer and show origin of guest PV writesDaniel De Graaf2013-09-101-0/+6
| | | | | | | | | | | | | Guests other than domain 0 using the console output have previously been controlled by the VERBOSE #define, but with no designation of which guest's output was on the console. This patch converts the HVM output buffering to be used by all domains except the hardware domain (dom0): stripping non-printable characters, line buffering the output, and prefixing it with the domain ID. This is especially useful for debugging stub domains during early boot. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
* xen: move VCPUOP_register_vcpu_info to common codeStefano Stabellini2013-05-081-0/+3
| | | | | | | | | | | | | | | | Move the implementation of VCPUOP_register_vcpu_info from x86 specific to commmon code. Move vcpu_info_mfn from an arch specific vcpu sub-field to the common vcpu struct. Move the initialization of vcpu_info_mfn to common code. Move unmap_vcpu_info and the call to unmap_vcpu_info at domain destruction time to common code. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* rename IS_PRIV to is_hardware_domainDaniel De Graaf2013-05-071-2/+10
| | | | | | | | | | | Since the remaining uses of IS_PRIV are actually concerned with the domain having control of the hardware (i.e. being the initial domain), clarify this by renaming IS_PRIV to is_hardware_domain. This also removes IS_PRIV_FOR since the only remaining user was xsm/dummy.h. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> (for 4.3 release) Acked-by: Keir Fraser <keir@xen.org>
* common: remove rcu_lock_target_domain_by_idDaniel De Graaf2013-05-071-14/+0
| | | | | | | | | | This function (and rcu_lock_remote_target_domain_by_id) has no remaining users, having been replaced with XSM hooks and the other rcu_lock_* functions. Remove it. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> (for 4.3 release) Acked-by: Keir Fraser <keir@xen.org>
* x86: make vcpu_reset() preemptibleJan Beulich2013-05-021-0/+3
| | | | | | | | | | ... as dropping the old page tables may take significant amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
* xen: introduce vcpu_blockStefano Stabellini2013-04-301-0/+1
| | | | | | | | | | | | | Rename do_block to vcpu_block. Move the call to local_event_delivery_enable out of vcpu_block, to a new static function called vcpu_block_enable_events. Use vcpu_block_enable_events instead of do_block throughout in schedule.c Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: allow for explicitly specifying node-affinityDario Faggioli2013-04-171-1/+8
| | | | | | | | | | | | | | | Make it possible to pass the node-affinity of a domain to the hypervisor from the upper layers, instead of always being computed automatically. Note that this also required generalizing the Flask hooks for setting and getting the affinity, so that they now deal with both vcpu and node affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/S3: Restore broken vcpu affinity on resumeBen Guthro2013-04-021-0/+6
| | | | | | | | | | | | | When in SYS_STATE_suspend, and going through the cpu_disable_scheduler path, save a copy of the current cpu affinity, and mark a flag to restore it later. Later, in the resume process, when enabling nonboot cpus restore these affinities. Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: correct BITS_PER_EVTCHN_WORD on armIan Campbell2013-03-121-2/+2
| | | | | | | This is always 64-bit on ARM, not BITS_PER_LONG Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* mmu: Introduce XENMEM_claim_pages (subop of memory ops)Dan Magenheimer2013-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When guests memory consumption is volatile (multiple guests ballooning up/down) we are presented with the problem of being able to determine exactly how much memory there is for allocation of new guests without negatively impacting existing guests. Note that the existing models (xapi, xend) drive the memory consumption from the tool-stack and assume that the guest will eventually hit the memory target. Other models, such as the dynamic memory utilized by tmem, do this differently - the guest drivers the memory consumption (up to the d->max_pages ceiling). With dynamic memory model, the guest frequently can balloon up and down as it sees fit. This presents the problem to the toolstack that it does not know atomically how much free memory there is (as the information gets stale the moment the d->tot_pages information is provided to the tool-stack), and hence when starting a guest can fail during the memory creation process. Especially if the process is done in parallel. In a nutshell what we need is a atomic value of all domains tot_pages during the allocation of guests. Naturally holding a lock for such a long time is unacceptable. Hence the goal of this hypercall is to attempt to atomically and very quickly determine if there are sufficient pages available in the system and, if so, "set aside" that quantity of pages for future allocations by that domain. Unlike an existing hypercall such as increase_reservation or populate_physmap, specific physical pageframes are not assigned to the domain because this cannot be done sufficiently quickly (especially for very large allocations in an arbitrarily fragmented system) and so the existing mechanisms result in classic time-of-check-time-of-use (TOCTOU) races. One can think of claiming as similar to a "lazy" allocation, but subsequent hypercalls are required to do the actual physical pageframe allocation. Note that one of effects of this hypercall is that from the perspective of other running guests - suddenly there is a new guest occupying X amount of pages. This means that when we try to balloon up they will hit the system-wide ceiling of available free memory (if the total sum of the existing d->max_pages >= host memory). This is OK - as that is part of the overcommit. What we DO NOT want to do is dictate their ceiling should be (d->max_pages) as that is risky and can lead to guests OOM-ing. It is something the guest needs to figure out. In order for a toolstack to "get" information about whether a domain has a claim and, if so, how large, and also for the toolstack to measure the total system-wide claim, a second subop has been added and exposed through domctl and libxl (see "xen: XENMEM_claim_pages: xc"). == Alternative solutions == There has been a variety of discussion whether the problem hypercall is solving can be done in user-space, such as: - For all the existing guest, set their d->max_pages temporarily to d->tot_pages and create the domain. This forces those domains to stay at their current consumption level (fyi, this is what the tmem freeze call is doing). The disadvantage of this is that needlessly forces the guests to stay at the memory usage instead of allowing it to decide the optimal target. - Account only using d->max_pages of how much free memory there is. This ignores ballooning changes and any over-commit scenario. This is similar to the scenario where the sum of all d->max_pages (and the one to be allocated now) on the host is smaller than the available free memory. As such it ignores the over-commit problem. - Provide a ring/FIFO along with event channel to notify an userspace daemon of guests memory consumption. This daemon can then provide up-to-date information to the toolstack of how much free memory there is. This duplicates what the hypervisor is already doing and introduced latency issues and catching breath for the toolstack as there might be millions of these updates on heavily used machine. There might not be any quiescent state ever and the toolstack will heavily consume CPU cycles and not ever provide up-to-date information. It has been noted that this claim mechanism solves the underlying problem (slow failure of domain creation) for a large class of domains but not all, specifically not handling (but also not making the problem worse for) PV domains that specify the "superpages" flag, and 32-bit PV domains on large RAM systems. These will be addressed at a later time. Code overview: Though the hypercall simply does arithmetic within locks, some of the semantics in the code may be a bit subtle. The key variables (d->unclaimed_pages and total_unclaimed_pages) starts at zero if no claim has yet been staked for any domain. (Perhaps a better name is "claimed_but_not_yet_possessed" but that's a bit unwieldy.) If no claim hypercalls are executed, there should be no impact on existing usage. When a claim is successfully staked by a domain, it is like a watermark but there is no record kept of the size of the claim. Instead, d->unclaimed_pages is set to the difference between d->tot_pages and the claim. When d->tot_pages increases or decreases, d->unclaimed_pages atomically decreases or increases. Once d->unclaimed_pages reaches zero, the claim is satisfied and d->unclaimed pages stays at zero -- unless a new claim is subsequently staked. The systemwide variable total_unclaimed_pages is always the sum of d->unclaimed_pages, across all domains. A non-domain- specific heap allocation will fail if total_unclaimed_pages exceeds free (plus, on tmem enabled systems, freeable) pages. Claim semantics could be modified by flags. The initial implementation had three flag, which discerns whether the caller would like tmem freeable pages to be considered in determining whether or not the claim can be successfully staked. This in later patches was removed and there are no flags. A claim can be cancelled by requesting a claim with the number of pages being zero. A second subop returns the total outstanding claimed pages systemwide. Note: Save/restore/migrate may need to be modified, else it can be documented that all claims are cancelled. This patch of the proposed XENMEM_claim_pages hypercall/subop, takes into account review feedback from Jan and Keir and IanC and Matthew Daley, plus some fixes found via runtime debugging. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
* Fix emacs local variable block to use correct C style variable.David Vrabel2013-02-211-1/+1
| | | | | | | The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
* xen: move XEN_SYSCTL_physinfo, XEN_SYSCTL_numainfo and ↵Stefano Stabellini2013-02-151-0/+2
| | | | | | | | | | | | | | | | | | | | | XEN_SYSCTL_topologyinfo to common code Move XEN_SYSCTL_physinfo, XEN_SYSCTL_numainfo and XEN_SYSCTL_topologyinfo from x86/sysctl.c to common/sysctl.c. The implementation of XEN_SYSCTL_physinfo is mostly generic but needs to fill in few arch specific details: introduce arch_do_physinfo to do that. The implementation of XEN_SYSCTL_physinfo relies on two global variables: total_pages and cpu_khz. Make them available on ARM. Implement node_spanned_pages and __node_distance on ARM, assuming 1 numa node for now. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
* arch/x86: Add missing mem_sharing XSM hooksDaniel De Graaf2013-01-111-0/+6
| | | | | | | | | | | | This patch adds splits up the mem_sharing and mem_event XSM hooks to better cover what the code is doing. It also changes the utility function get_mem_event_op_target to rcu_lock_live_remote_domain_by_id because there is no mm-specific logic in there. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
* xen: sched: generalize scheduling related perfcounter macrosDario Faggioli2012-10-231-0/+13
| | | | | | | | | | | | | | Moving some of them from sched_credit.c to generic scheduler code. This also allows the other schedulers to use perf counters equally easy. This change is mainly preparatory work for what stated above. In fact, it mostly does s/CSCHED_STAT/SCHED_STAT/, and, in general, it implies no functional changes. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* xen: Add versions of rcu_lock_*_domain without IS_PRIVDaniel De Graaf2012-10-151-0/+11
| | | | | | | | | | These functions will be used to avoid duplication of IS_PRIV calls that will be introduced in XSM hooks. This also fixes a build error with XSM enabled introduced by 25925:d1c3375c3f11 which depends on this patch. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
* x86/mm: wire up sharing ringTim Deegan2012-03-081-0/+3
| | | | | | | | | Now that we have an interface close to finalizing, do the necessary plumbing to set up a ring for reporting failed allocations in the unshare path. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
* Use a reserved pfn in the guest address space to store mem event ringsTim Deegan2012-03-081-0/+1
| | | | | | | | | | | | | | | | | | This solves a long-standing issue in which the pages backing these rings were pages belonging to dom0 user-space processes. Thus, if the process would die unexpectedly, Xen would keep posting events to a page now belonging to some other process. We update all API-consumers in tree (xenpaging and xen-access). This is an API/ABI change, so please speak up if it breaks your accumptions. The patch touches tools, hypervisor x86/hvm bits, and hypervisor x86/mm bits. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Tim Deegan <tim@xen.org>
* Tools: Remove shared page from mem_event/access/paging interfacesTim Deegan2012-03-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | Don't use the superfluous shared page, return the event channel directly as part of the domctl struct, instead. In-tree consumers (xenpaging, xen-access) updated. This is an ABI/API change, so please voice any concerns. Known pending issues: - pager could die and its ring page could be used by some other process, yet Xen retains the mapping to it. - use a saner interface for the paging_load buffer. This change also affects the x86/mm bits in the hypervisor that process the mem_event setup domctl. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Tim Deegan <tim@xen.org>
* xen: make need_iommu == 0 if !HAS_PASSTHROUGHIan Campbell2012-02-151-0/+6
| | | | | | Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: keir@xen.org Committed-by: Ian Campbell <Ian.Campbell@citrix.com>
* Include some header files that are not automatically included on all archsStefano Stabellini2012-01-231-0/+4
| | | | | | | Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* x86/mm: Improve ring management for memory events. Do not lose guest eventsAndres Lagar-Cavilla2012-01-191-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is an amalgamation of the work done by Olaf Hering <olaf@aepfle.de> and our work. It combines logic changes to simplify the memory event API, as well as leveraging wait queues to deal with extreme conditions in which too many events are generated by a guest vcpu. In order to generate a new event, a slot in the ring is claimed. If a guest vcpu is generating the event and there is no space, it is put on a wait queue. If a foreign vcpu is generating the event and there is no space, the vcpu is expected to retry its operation. If an error happens later, the function returns the claimed slot via a cancel operation. Thus, the API has only four calls: claim slot, cancel claimed slot, put request in the ring, consume the response. With all these mechanisms, no guest events are lost. Our testing includes 1. ballooning down 512 MiBs; 2. using mem access n2rwx, in which every page access in a four-vCPU guest results in an event, with no vCPU pausing, and the four vCPUs touching all RAM. No guest events were lost in either case, and qemu-dm had no mapping problems. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
* force inclusion of xen/config.h through compiler optionJan Beulich2012-01-131-1/+0
| | | | | | | | | | | | | As we expect all source files to include the header as the first thing anyway, stop doing this by repeating the inclusion in each and every source file (and in many headers), but rather enforce this uniformly through the compiler command line. As a first cleanup step, remove the explicit inclusion from all common headers. Further cleanup can be done incrementally. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* Create a generic callback mechanism for Xen-bound event channelsAndres Lagar-Cavilla2011-12-061-1/+1
| | | | | | | | | | | | For event channels for which Xen is the consumer, there currently is a single action. With this patch, we allow event channel creators to specify a generic callback (or no callback). Because the expectation is that there will be few callbacks, they are stored in a small table. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Keir Fraser <keir@xen.org> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Committed-by: Tim Deegan <tim@xen.org>
* mem_event: move mem_event_domain out of struct domainOlaf Hering2011-11-301-6/+12
| | | | | | | | | | | | An upcoming change may increase the size of mem_event_domain. The result is a build failure because struct domain gets larger than a page. Allocate the room for the three mem_event_domain members at runtime. v2: - remove mem_ prefix from members of new struct Signed-off-by: Olaf Hering <olaf@aepfle.de> Committed-by: Keir Fraser <keir@xen.org>
* sched_sedf: Avoid panic when adjusting sedf parametersJuergen Gross2011-11-181-0/+20
| | | | | | | | | | When using sedf scheduler in a cpupool the system might panic when setting sedf scheduling parameters for a domain. Introduces for_each_domain_in_cpupool macro as it is usable 4 times now. Add appropriate locking in cpupool_unassign_cpu(). Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Committed-by: Keir Fraser <keir@xen.org>
* Hypercall continuation cancelation in compat mode for XENMEM_get/set_pod_targetJean Guyader2011-11-111-0/+1
| | | | | | | | | | | If copy_to_guest failed in the compat code after a continuation as been done in the native code we need to cancel it so we won't reexecute the hypercall but return from the hypercall with the appropriate error. Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
* cpupools: allocate CPU masks dynamicallyJan Beulich2011-10-211-1/+1
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* constify vcpu_set_affinity()'s second parameterJan Beulich2011-10-131-1/+1
| | | | | | | | | | | None of the callers actually make use of the function's returning of the old affinity through its second parameter, and eliminating this capability allows some callers to no longer use a local variable here, reducing their stack footprint significantly when building with large NR_CPUS. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xenpaging: track number of paged pages in struct domainOlaf Hering2011-09-261-0/+1
| | | | | | | | | | | | | | | The toolstack should know how many pages are paged-out at a given point in time so it could make smarter decisions about how many pages should be paged or ballooned. Add a new member to xen_domctl_getdomaininfo and bump interface version. Use the new member in xc_dominfo_t. The SONAME of libxc should be changed if this patch gets applied. Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
* mem_event: use different ringbuffers for share, paging and accessOlaf Hering2011-09-161-1/+5
| | | | | | | | | | | | | | | | | | | | | | | Up to now a single ring buffer was used for mem_share, xenpaging and xen-access. Each helper would have to cooperate and pull only its own requests from the ring. Unfortunately this was not implemented. And even if it was, it would make the whole concept fragile because a crash or early exit of one helper would stall the others. What happend up to now is that active xenpaging + memory_sharing would push memsharing requests in the buffer. xenpaging is not prepared for such requests. This patch creates an independet ring buffer for mem_share, xenpaging and xen-access and adds also new functions to enable xenpaging and xen-access. The xc_mem_event_enable/xc_mem_event_disable functions will be removed. The various XEN_DOMCTL_MEM_EVENT_* macros were cleaned up. Due to the removal the API changed, so the SONAME will be changed too. Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Tim Deegan <tim@xen.org>
* mem_event: add ref counting for free requestslotsOlaf Hering2011-09-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | If mem_event_check_ring() is called by many vcpus at the same time before any of them called also mem_event_put_request(), all of the callers must assume there are enough free slots available in the ring. Record the number of request producers in mem_event_check_ring() to keep track of available free slots. Add a new mem_event_put_req_producers() function to release a request attempt made in mem_event_check_ring(). Its required for p2m_mem_paging_populate() because that function can only modify the p2m type if there are free request slots. But in some cases p2m_mem_paging_populate() does not actually have to produce another request when it is known that the same request was already made earlier by a different vcpu. mem_event_check_ring() can not return a reference to a free request slot because there could be multiple references for different vcpus and the order of mem_event_put_request() calls is not known. As a result, incomplete requests could be consumed by the ring user. Signed-off-by: Olaf Hering <olaf@aepfle.de>
* replace d->nr_pirqs sized arrays with radix treeJan Beulich2011-06-231-5/+4
| | | | | | | | | | | | | | | With this it is questionable whether retaining struct domain's nr_pirqs is actually necessary - the value now only serves for bounds checking, and this boundary could easily be nr_irqs. Note that ia64, the build of which is broken currently anyway, is only being partially fixed up. v2: adjustments for split setup/teardown of translation data v3: re-sync with radix tree implementation changes Signed-off-by: Jan Beulich <jbeulich@novell.com>
* xen: remove extern function declarations from C files.Tim Deegan2011-05-261-0/+6
| | | | | | | | Move all extern declarations into appropriate header files. This also fixes up a few places where the caller and the definition had different signatures. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
* xen: Include headers that are actually needed, drop everything else.Christoph Egger2011-05-201-12/+5
| | | | Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
* Revert 23295:4891f1f41ba5 and 23296:24346f749826Keir Fraser2011-05-021-4/+5
| | | | | | Fails current lock checking mechanism in spinlock.c in debug=y builds. Signed-off-by: Keir Fraser <keir@xen.org>
* replace d->nr_pirqs sized arrays with radix treeJan Beulich2011-05-011-5/+4
| | | | | | | | | | | | | | | With this it is questionable whether retaining struct domain's nr_pirqs is actually necessary - the value now only serves for bounds checking, and this boundary could easily be nr_irqs. Another thing to consider is whether it's worth storing the pirq number in struct pirq, to avoid passing the number and a pointer to quite a number of functions. Note that ia64, the build of which is broken currently anyway, is only partially fixed up. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Remove direct cpumask_t members from struct vcpu and struct domainJan Beulich2011-04-051-5/+5
| | | | | | | | | | | | | | | The CPU masks embedded in these structures prevent NR_CPUS-independent sizing of these structures. Basic concept (in xen/include/cpumask.h) taken from recent Linux. For scalability purposes, many other uses of cpumask_t should be replaced by cpumask_var_t, particularly local variables of functions. This implies that no functions should have by-value cpumask_t parameters, and that the whole old cpumask interface (cpus_...()) should go away in favor of the new (cpumask_...()) one. Signed-off-by: Jan Beulich <jbeulich@novell.com>
* Remove unmaintained Access Control Module (ACM) from hypervisor.Keir Fraser2011-03-251-2/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* Introduce rcu_lock_remote_target_domain_by_id().Keir Fraser2011-02-071-0/+7
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* cpupool: Check for memory allocation failure on switching schedulersKeir Fraser2011-02-061-1/+1
| | | | | | | | When switching schedulers on a physical cpu due to a cpupool operation check for a potential memory allocation failure and stop the operation gracefully. Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
* PoD: Fix two code commentsLiu, Jinsong2011-01-141-1/+1
| | | | | Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Acked-by: George Dunlap <george.dunlap@citrix.com>