xen/xen - xen

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add DOMCTL to limit the number of event channels a domain may use	David Vrabel	2013-10-14	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to set the maximum event channel port a domain may use. This may be used to limit the amount of Xen resources (global mapping space and xenheap) that a domain may use for event channels. A domain that does not have a limit set may use all the event channels supported by the event channel ABI in use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
*	evtchn: add FIFO-based event channel ABI	David Vrabel	2013-10-14	2	-2/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the event channel hypercall sub-ops and the definitions for the shared data structures for the FIFO-based event channel ABI. The design document for this new ABI is available here: http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf In summary, events are reported using a per-domain shared event array of event words. Each event word has PENDING, LINKED and MASKED bits and a LINK field for pointing to the next event in the event queue. There are 16 event queues (with different priorities) per-VCPU. Key advantages of this new ABI include: - Support for over 100,000 events (2^17). - 16 different event priorities. - Improved fairness in event latency through the use of FIFOs. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	public/hvm_xs_strings.h: Fix ABI regression for OEM SMBios strings	Andrew Cooper	2013-08-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old code for OEM SMBios strings was: char path[20] = "bios-strings/oem-XX"; path[(sizeof path) - 3] = '0' + ((i < 10) ? i : i / 10); path[(sizeof path) - 2] = (i < 10) ? '\0' : '0' + (i % 10); Where oem-1 thru 9 specifically had no leading 0. However, the definition of HVM_XS_OEM_STRINGS specifically requires leading 0s. This regression was introduced by the combination of c/s 4d23036e709627 and e64c3f71ceb662 I realise that this patch causes a change to the public headers. However I feel it is justified as: * All toolstacks used to have to embed the magic string (and almost certainly still do) * If by some miriacle a new toolstack has started using the new define will continue to work. * The only intree consumer of the define is hvmloader itself. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: correct public header's documentation of PAT MSR settings	Jan Beulich	2013-08-23	1	-9/+9
\| \| \| \| \| \| \| \|	The first (PAT6) column was wrong across the board, and the column for PAT7 was missing altogether. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	xen: remove evtchn_upcall_mask from interface on ARM	Ian Campbell	2013-08-20	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On ARM event-channel upcalls are masked using the hardware's interrupt mask bit and not by a software bit. Leaving this field present in the interface has caused some confusion already and is liable to mean it gets inadvertently used in the future. So arrange for this field to be turned into a padding field on ARM by introducing a XEN_HAVE_PV_UPCALL_MASK define. This bit is also unused for x86 PV-on-HVM guests, but we can't realistically distinguish those from x86 PV guests in the headers. Add a per-arch vcpu_event_delivery_is_enabled function to replace an open coded use of evtchn_upcall_mask in common code (in a debug keyhandler). The existing local_event_delivery_is_enabled, which operates only on current, was unimplemented on ARM and unused on x86, so remove it. ifdef the use of evtchn_upcall_mask when setting up a new vcpu info page. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
*	xen: arm: include public/xen.h in foreign interface checking	Ian Campbell	2013-08-20	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mkheader.py doesn't cope with struct foo { }; so add a newline. Define unsigned long and long to a non-existent type on ARM so as to catch their use. Teach mkheader.py to cope with structs which are ifdef'd. This cannot cope with #defines between the #ifdef and the struct definitions, so move MAX_GUEST_CMDLINE to be next to its only usage. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
*	xen: only expose start_info on architectures which have a PV boot path	Ian Campbell	2013-08-20	2	-1/+4
\| \| \| \| \| \| \| \| \|	Most of this struct is PV MMU specific and it is not used on ARM at all. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
*	xen: arm: document which hypercalls (and subops) are supported on ARM	Ian Campbell	2013-08-08	1	-2/+64
\| \| \| \| \| \| \| \| \| \| \|	There are many hypercalls which make no sense or which are not supported on ARM systems but it's not all that obvious which ones we do support. So lets try and document the hypercalls which are useful on ARM. I'm not sure this is the best way to go about this, I'm open to other ideas. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>?
*	pciif: add multi-vector-MSI command	Jan Beulich	2013-08-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	The requested vector count is to be passed in struct xen_pci_op's info field. Upon failure, if a smaller vector count might work, the backend will pass that smaller count in the value field (which so far is always being set to zero in the error path). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: enable multi-vector MSI	Jan Beulich	2013-08-08	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implies - extending the public interface to have a way to request a block of MSIs - allocating a block of contiguous pIRQ-s for the target domain (but note that the Xen IRQs allocated have no need of being contiguous) - repeating certain operations for all involved IRQs - fixing multi_msi_enable() - adjusting the mask bit accesses for maskable MSIs Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	xen: arm: Handle SMC from 64-bit guests	Ian Campbell	2013-07-29	1	-0/+3
\| \| \| \| \| \| \| \|	Similarly to arm32 guests handle it by injecting an undefined instruction trap. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
*	xen: arm: support building a 64-bit dom0 domain	Ian Campbell	2013-07-29	1	-2/+0
\| \| \| \| \|	Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	netif.h: fix typo, BLKIF -> NETIF	Wei Liu	2013-07-08	1	-1/+1
\| \| \| \| \|	Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
*	libxl,hvmloader: Don't relocate memory for MMIO hole	George Dunlap	2013-06-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the moment, qemu-xen can't handle memory being relocated by hvmloader. This may happen if a device with a large enough memory region is passed through to the guest. At the moment, if this happens, then at some point in the future qemu will crash and the domain will hang. (qemu-traditional is fine.) It's too late in the release to do a proper fix, so we try to do damage control. hvmloader already has mechanisms to relocate memory to 64-bit space if it can't make a big enough MMIO hole. By default this is 2GiB; if we just refuse to make the hole bigger if it will overlap with guest memory, then the relocation will happen by default. v5: - Update comment to not refer to "this series". v4: - Wrap long line in libxl_dm.c - Fix comment v3: - Fix polarity of comparison - Move diagnostic messages to another patch - Tested with xen platform pci device hacked to have different BAR sizes {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory configurations - Add comment explaining why we default to "allow" - Remove cast to bool v2: - style fixes - fix and expand comment on the MMIO hole loop - use "%d" rather than "%s" -> (...)?"1":"0" - use bool instead of uint8_t - Move 64-bit bar relocate detection to another patch - Add more diagnostic messages Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> CC: Keir Fraser <keir@xen.org>
*	tpmif: fix identifier prefixes	Jan Beulich	2013-06-21	1	-15/+15
\| \| \| \| \| \| \| \| \| \|	The definitions here shouldn't use vtpm_ or VPTM_ as their prefixes, the interface should instead make use of tpmif_ and TPMIF_. This fixes a build failure after syncing the public headers to linux-2.6.18-xen.hg (where a struct vtpm_state already exists). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
*	io/ring.h: drop unused and broken *_RING_ATTACH() macros	Jan Beulich	2013-06-12	1	-15/+0
\| \| \| \| \| \| \| \| \| \|	Initializing r_prod_pvt and r_cons from independent shared ring fields is broken, as other macros in this header rely on them being coupled. Furthermore using the backend variant would also imply a security vulnerability. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	io/ring.h: new macro to detect whether there are too many requests on the ring	Jan Beulich	2013-06-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backends may need to protect themselves against an insane number of produced requests stored by a frontend, in case they iterate over requests until reaching the req_prod value. There can't be more requests on the ring than the difference between produced requests and produced (but possibly not yet published) responses. This is a more strict alternative to a patch previously posted by Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	gnttab: drop unused GNTCOPY_can_fail4.3.0-rc4	Jan Beulich	2013-06-10	1	-2/+0
\| \| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	blkif.h: Document the physical-sector-size extension	Stefan Bader	2013-05-31	1	-4/+10
\| \| \| \|	Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
*	netif: document feature-split-event-channel	Wei Liu	2013-05-22	1	-0/+12
\| \| \| \| \| \| \| \| \|	This is a new feature to separate TX and RX notification. Document it in canonical header for future reference. For reference implementation, please see Xen network driver in Linux kernel. Signed-off-by: Wei Liu <wei.liu2@citrix.com>
*	hypervisor/xen/tools: Remove the XENMEM_get_oustanding_pages and provide the ↵	Konrad Rzeszutek Wilk	2013-05-14	2	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	data via xc_phys_info During the review of the patches it was noticed that there exists a race wherein the 'free_memory' value consists of information from two hypercalls. That is the XEN_SYSCTL_physinfo and XENMEM_get_outstanding_pages. The free memory the host has available for guest is the difference between the 'free_pages' (from XEN_SYSCTL_physinfo) and 'outstanding_pages'. As they are two hypercalls many things can happen in between the execution of them. This patch resolves this by eliminating the XENMEM_get_outstanding_pages hypercall and providing the free_pages and outstanding_pages information via the xc_phys_info structure. It also removes the XSM hooks and adds locking as needed. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir.xen@gmail.com>
*	xen/arm: basic PSCI support, implement cpu_on and cpu_off	Stefano Stabellini	2013-05-08	1	-0/+2
\| \| \| \| \| \| \| \|	Implement support for ARM Power State Coordination Interface, PSCI in short. Support only HVC calls. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
*	xen/arm: trap SMC instructions and inject an UND exception	Ian Campbell	2013-05-08	1	-0/+1
\| \| \| \| \| \| \| \|	Currently only handles 32 bit guests. The 64-bit exception model is considerably different. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	netif: define XEN_NETIF_NR_SLOTS_MIN in public header	Wei Liu	2013-05-07	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	Xen network protocol has implicit dependency on MAX_SKB_FRAGS. In order to remove dependency on MAX_SKB_FRAGS, we derive a constant from historical MAX_SKB_FRAGS for future reference. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	netif: define XEN_NETIF_MAX_TX_SIZE in public header	Wei Liu	2013-05-07	1	-0/+1
\| \| \| \| \| \| \| \| \|	This is the maximum supported size of a packet. It comes from the size of netif_tx_request.size. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
*	xen: arm: correct platform detection in public header.	Ian Campbell	2013-04-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	These headers cannot use the CONFIG_FOO defines provided when building Xen (since they aren't provided when building tools or by external components) and need to use the compiler provided architecture defines. This manifested itself as a failure to build xenctx.c on ARM64 due to the missing symbols contains . Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	x86/EFI: pass boot services variable info to runtime code	Jan Beulich	2013-04-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EFI variables can be flagged as being accessible only within boot services. This makes it awkward for us to figure out how much space they use at runtime. In theory we could figure this out by simply comparing the results from QueryVariableInfo() to the space used by all of our variables, but that fails if the platform doesn't garbage collect on every boot. Thankfully, calling QueryVariableInfo() while still inside boot services gives a more reliable answer. This patch passes that information from the EFI boot stub up to the efi platform code. Based on a similarly named Linux patch by Matthew Garrett <matthew.garrett@nebula.com>. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
*	xen: allow for explicitly specifying node-affinity	Dario Faggioli	2013-04-17	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make it possible to pass the node-affinity of a domain to the hypervisor from the upper layers, instead of always being computed automatically. Note that this also required generalizing the Flask hooks for setting and getting the affinity, so that they now deal with both vcpu and node affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Keir Fraser <keir@xen.org>
*	xen, libxc: rename xenctl_cpumap to xenctl_bitmap	Dario Faggioli	2013-04-17	4	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	More specifically: 1. replaces xenctl_cpumap with xenctl_bitmap 2. provides bitmap_to_xenctl_bitmap and the reverse; 3. re-implement cpumask_to_xenctl_bitmap with bitmap_to_xenctl_bitmap and the reverse; Other than #3, no functional changes. Interface only slightly afected. This is in preparation of introducing NUMA node-affinity maps. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Keir Fraser <keir@xen.org>
*	mini-os/tpm{back, front}: Change shared page ABI	Daniel De Graaf	2013-04-12	1	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes the vTPM shared page ABI from a copy of the Xen network interface to a single-page interface that better reflects the expected behavior of a TPM: only a single request packet can be sent at any given time, and every packet sent generates a single response packet. This protocol change should also increase efficiency as it avoids mapping and unmapping grants when possible. The vtpm xenbus device now requires a feature-protocol-v2 node in xenstore to avoid conflicts with existing (xen-patched) kernels supporting the old interface. While the contents of the shared page have been defined to allow packets larger than a single page (actually 4088 bytes) by allowing the client to add extra grant references, the mapping of these extra references has not been implemented; a feature node in xenstore may be used in the future to indicate full support for the multi-page protocol. Most uses of the TPM should not require this feature. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Cc: Jan Beulich <JBeulich@suse.com>
*	xen: arm: remove PSR_MODE_MASK from public interface.	Ian Campbell	2013-04-11	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	This is also defined in sys/ptrace.h on arm64 which breaks the tools build due to multiple definitions. I expect this is really a bug in the kernel and/or glibc but we don't really need this symbol in the public headers, at least not right now, so move it into include/asm instead. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	xen: arm64 uses the same I/O ABI as arm32	Ian Campbell	2013-04-11	1	-1/+1
\| \| \| \| \|	Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	xen: arm: define 64-bit guest hypercall calling convention.	Ian Campbell	2013-04-11	1	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \|	As well as using x<N> rather than r<N> registers for passing arguments/results as mandate the use of x16 as the hypercall number. Add some pedantry about struct alignment layout referencing the ARM Procedure Calling Standard to avoid confusion with the previous "OABI" convention. While at it also mandate that hypercall argument structs are always little endian. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	docs: Document the XenBus structure.	Konrad Rzeszutek Wilk	2013-03-26	1	-1/+4
\| \| \| \| \| \| \|	Mark-up for inclusion of generated docs. Acked-by: Ian Campbell <ian.campbel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	docs: Document start_info changes in Xen 4.2.	Konrad Rzeszutek Wilk	2013-03-26	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 25833:bb85bbccb1c9. "x86/32-on-64: adjust Dom0 initial page table layout" fixes a bug in the reported value of pt_base versus where the page tables actually start. This documents this in the start of the world header note. This clarifies the implied understanding that the page table space is pointed by pt_base. As in it is ".. implied that the range of page-tables is the range [pt_base, pt_base + nr_pt_frames), whereas that that range here indeed is [pt_base - 2, pt_base -2 + nr_pt_frames)" (Jan Beulich). Also make it crystal clear that pt_base == %cr3. Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	docs: Document the dom0_vga_console_info structure.	Konrad Rzeszutek Wilk	2013-03-26	1	-1/+8
\| \| \| \| \| \| \| \|	Mark-up for inclusion of generated docs. Acked-by: Ian Campbell <ian.campbel@citrix.com> [v2: s/dom9/dom0/] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	docs: Document the shared structure.	Konrad Rzeszutek Wilk	2013-03-26	1	-0/+1
\| \| \| \| \| \| \|	Mark-up for inclusion of generated docs. Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	docs: Add some extra details to the ELF note.	Konrad Rzeszutek Wilk	2013-03-26	1	-0/+6
\| \| \| \| \| \| \|	Such as how the string values MUST be NULL terminated ASCII. Acked-by: Ian Campbell <ian.campbel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	docs: Document the ELF_FEATURES entry	Konrad Rzeszutek Wilk	2013-03-26	1	-0/+14
\| \| \| \| \| \| \|	Mark-up for inclusion of generated docs. Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	docs: Document the ELF notes	Konrad Rzeszutek Wilk	2013-03-26	1	-0/+2
\| \| \| \| \| \| \|	Mark-up for inclusion of generated docs. Acked-by: Ian Campbell <ian.campbel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*	mmu: Introduce XENMEM_claim_pages (subop of memory ops)	Dan Magenheimer	2013-03-11	2	-3/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When guests memory consumption is volatile (multiple guests ballooning up/down) we are presented with the problem of being able to determine exactly how much memory there is for allocation of new guests without negatively impacting existing guests. Note that the existing models (xapi, xend) drive the memory consumption from the tool-stack and assume that the guest will eventually hit the memory target. Other models, such as the dynamic memory utilized by tmem, do this differently - the guest drivers the memory consumption (up to the d->max_pages ceiling). With dynamic memory model, the guest frequently can balloon up and down as it sees fit. This presents the problem to the toolstack that it does not know atomically how much free memory there is (as the information gets stale the moment the d->tot_pages information is provided to the tool-stack), and hence when starting a guest can fail during the memory creation process. Especially if the process is done in parallel. In a nutshell what we need is a atomic value of all domains tot_pages during the allocation of guests. Naturally holding a lock for such a long time is unacceptable. Hence the goal of this hypercall is to attempt to atomically and very quickly determine if there are sufficient pages available in the system and, if so, "set aside" that quantity of pages for future allocations by that domain. Unlike an existing hypercall such as increase_reservation or populate_physmap, specific physical pageframes are not assigned to the domain because this cannot be done sufficiently quickly (especially for very large allocations in an arbitrarily fragmented system) and so the existing mechanisms result in classic time-of-check-time-of-use (TOCTOU) races. One can think of claiming as similar to a "lazy" allocation, but subsequent hypercalls are required to do the actual physical pageframe allocation. Note that one of effects of this hypercall is that from the perspective of other running guests - suddenly there is a new guest occupying X amount of pages. This means that when we try to balloon up they will hit the system-wide ceiling of available free memory (if the total sum of the existing d->max_pages >= host memory). This is OK - as that is part of the overcommit. What we DO NOT want to do is dictate their ceiling should be (d->max_pages) as that is risky and can lead to guests OOM-ing. It is something the guest needs to figure out. In order for a toolstack to "get" information about whether a domain has a claim and, if so, how large, and also for the toolstack to measure the total system-wide claim, a second subop has been added and exposed through domctl and libxl (see "xen: XENMEM_claim_pages: xc"). == Alternative solutions == There has been a variety of discussion whether the problem hypercall is solving can be done in user-space, such as: - For all the existing guest, set their d->max_pages temporarily to d->tot_pages and create the domain. This forces those domains to stay at their current consumption level (fyi, this is what the tmem freeze call is doing). The disadvantage of this is that needlessly forces the guests to stay at the memory usage instead of allowing it to decide the optimal target. - Account only using d->max_pages of how much free memory there is. This ignores ballooning changes and any over-commit scenario. This is similar to the scenario where the sum of all d->max_pages (and the one to be allocated now) on the host is smaller than the available free memory. As such it ignores the over-commit problem. - Provide a ring/FIFO along with event channel to notify an userspace daemon of guests memory consumption. This daemon can then provide up-to-date information to the toolstack of how much free memory there is. This duplicates what the hypervisor is already doing and introduced latency issues and catching breath for the toolstack as there might be millions of these updates on heavily used machine. There might not be any quiescent state ever and the toolstack will heavily consume CPU cycles and not ever provide up-to-date information. It has been noted that this claim mechanism solves the underlying problem (slow failure of domain creation) for a large class of domains but not all, specifically not handling (but also not making the problem worse for) PV domains that specify the "superpages" flag, and 32-bit PV domains on large RAM systems. These will be addressed at a later time. Code overview: Though the hypercall simply does arithmetic within locks, some of the semantics in the code may be a bit subtle. The key variables (d->unclaimed_pages and total_unclaimed_pages) starts at zero if no claim has yet been staked for any domain. (Perhaps a better name is "claimed_but_not_yet_possessed" but that's a bit unwieldy.) If no claim hypercalls are executed, there should be no impact on existing usage. When a claim is successfully staked by a domain, it is like a watermark but there is no record kept of the size of the claim. Instead, d->unclaimed_pages is set to the difference between d->tot_pages and the claim. When d->tot_pages increases or decreases, d->unclaimed_pages atomically decreases or increases. Once d->unclaimed_pages reaches zero, the claim is satisfied and d->unclaimed pages stays at zero -- unless a new claim is subsequently staked. The systemwide variable total_unclaimed_pages is always the sum of d->unclaimed_pages, across all domains. A non-domain- specific heap allocation will fail if total_unclaimed_pages exceeds free (plus, on tmem enabled systems, freeable) pages. Claim semantics could be modified by flags. The initial implementation had three flag, which discerns whether the caller would like tmem freeable pages to be considered in determining whether or not the claim can be successfully staked. This in later patches was removed and there are no flags. A claim can be cancelled by requesting a claim with the number of pages being zero. A second subop returns the total outstanding claimed pages systemwide. Note: Save/restore/migrate may need to be modified, else it can be documented that all claims are cancelled. This patch of the proposed XENMEM_claim_pages hypercall/subop, takes into account review feedback from Jan and Keir and IanC and Matthew Daley, plus some fixes found via runtime debugging. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
*	x86/MSI: add mechanism to fully protect MSI-X table from PV guest accesses	Jan Beulich	2013-03-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds two new physdev operations for Dom0 to invoke when resource allocation for devices is known to be complete, so that the hypervisor can arrange for the respective MMIO ranges to be marked read-only before an eventual guest getting such a device assigned even gets started, such that it won't be able to set up writable mappings for these MMIO ranges before Xen has a chance to protect them. This also addresses another issue with the code being modified here, in that so far write protection for the address ranges in question got set up only once during the lifetime of a device (i.e. until either system shutdown or device hot removal), while teardown happened when the last interrupt was disposed of by the guest (which at least allowed the tables to be writable when the device got assigned to a second guest [instance] after the first terminated). Signed-off-by: Jan Beulich <jbeulich@suse.com>
*	xen: arm: separate guest user regs from internal guest state.	Ian Campbell	2013-02-22	1	-45/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	struct cpu_user_regs is currently used as both internal state (specifically at the base of the stack) and a guest/toolstack visible API (via struct vcpu_guest_context used by XEN_DOMCTL_{g,s}etvcpucontext and VCPUOP_initialise). This causes problems when we want to make the API 64-bit clean since we don't really want to change the size of the on-stack struct. So split into vcpu_guest_core_regs which is the API facing struct and keep cpu_user_regs purely internal, translate between the two. In the user API arrange for both 64- and 32-bit registers to be included in a layout which does not differ depending on toolstack architecture. Also switch to using the more formal banked register names (e.g. with the _usr suffix) for clarity. This is an ABI change. Note that the kernel doesn't currently use this data structure so it affects the tools interface only. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
*	xen: arm64: initial build + config changes, start of day code	Ian Campbell	2013-02-22	3	-2/+16
\| \| \| \| \|	Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
*	xen: event channel arrays are xen_ulong_t and not unsigned long	Ian Campbell	2013-02-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	On ARM we want these to be the same size on 32- and 64-bit. This is an ABI change on ARM. X86 does not change. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	ACPI: support v5 (reduced HW) sleep interface	Jan Beulich	2013-02-22	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Note that this also fixes a broken input check in acpi_enter_sleep() (previously validating the sleep->pm1[ab]_cnt_val relationship based on acpi_sinfo.pm1b_cnt_val, which however gets set only subsequently). Also adjust a few minor issues with the pre-v5 handling in acpi_fadt_parse_sleep_info(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	gcov: Implement code to read coverage informations	Frediano Ziglio	2013-02-21	2	-0/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Operations are handled by a sysctl specific operation. Implement 4 operations - check if coverage is compiled in - read size of coverage information - read coverage information - reset coverage counters Information are stored in a single blob in a format specific to Xen designed to make code that generate coverage as small as possible. This patch add a public header with the structure of blob exported by Xen. This avoid problems distributing header which is GPL2. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
*	Fix emacs local variable block to use correct C style variable.	David Vrabel	2013-02-21	38	-38/+38
\| \| \| \| \| \| \|	The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
*	hvm: Allow triple fault to imply crash rather than reboot	Andrew Cooper	2013-02-15	2	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While the triple fault action on native hardware will result in a system reset, any modern operating system can and will make use of less violent reboot methods. As a result, the most likely cause of a triple fault is a fatal software bug. This patch allows the toolstack to indicate that a triple fault should mean a crash rather than a reboot. The default of reboot still remains the same. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
*	x86/EFI: simplify PCI option ROM retrieval	Jan Beulich	2013-02-04	1	-2/+2
\| \| \| \| \| \| \| \| \|	While putting together the kernel side of this I realized that c/s 26397:d9c7b82aa7b1 went a little too far in requiring a buffer for the option ROM contents - all that is really needed is handing Dom0 physical address and size of the data block. Signed-off-by: Jan Beulich <jbeulich@suse.com>