xen/xen - xen

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86: don't allow Dom0 access to the HT address range	Jan Beulich	2013-08-27	1	-0/+4
\| \| \| \| \| \|	In particular, MMIO assignments should not be done using this area. Signed-off-by: Jan Beulich <jbeulich@suse.com>
*	x86: don't allow Dom0 access to the MSI address range	Jan Beulich	2013-08-27	1	-0/+4
\| \| \| \| \| \| \|	In particular, MMIO assignments should not be done using this area. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by Xiantao Zhang <xiantao.zhang@intel.com>
*	libelf: Make all callers call elf_check_broken	Ian Jackson	2013-06-14	1	-7/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This arranges that if the new pointer reference error checking tripped, we actually get a message about it. In this patch these messages do not change the actual return values from the various functions: so pointer reference errors do not prevent loading. This is for fear that some existing kernels might cause the code to make these wild references, which would then break, which is not a good thing in a security patch. In xen/arch/x86/domain_build.c we have to introduce an "out" label and change all of the "return rc" beyond the relevant point into "goto out". This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> v5: Fix two whitespace errors. v3.1: Add error check to xc_dom_parse_elf_kernel. Move check in xc_hvm_build_x86.c:setup_guest to right place. v2 was Acked-by: Ian Campbell <ian.campbell@citrix.com> v2 was Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> v2: Style fixes.
*	libelf: check all pointer accesses	Ian Jackson	2013-06-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We change the ELF_PTRVAL and ELF_HANDLE types and associated macros: * PTRVAL becomes a uintptr_t, for which we provide a typedef elf_ptrval. This means no arithmetic done on it can overflow so the compiler cannot do any malicious invalid pointer arithmetic "optimisations". It also means that any places where we dereference one of these pointers without using the appropriate macros or functions become a compilation error. So we can be sure that we won't miss any memory accesses. All the PTRVAL variables were previously void* or char, so the actual address calculations are unchanged. ELF_HANDLE becomes a union, one half of which keeps the pointer value and the other half of which is just there to record the type. The new type is not a pointer type so there can be no address calculations on it whose meaning would change. Every assignment or access has to go through one of our macros. * The distinction between const and non-const pointers and chars and voids in libelf goes away. This was not important (and anyway libelf tended to cast away const in various places). * The fields elf->image and elf->dest are renamed. That proves that we haven't missed any unchecked uses of these actual pointer values. * The caller may fill in elf->caller_xdest_base and _size to specify another range of memory which is safe for libelf to access, besides the input and output images. * When accesses fail due to being out of range, we mark the elf "broken". This will be checked and used for diagnostics in a following patch. We do not check for write accesses to the input image. This is because libelf actually does this in a number of places. So we simply permit that. * Each caller of libelf which used to set dest now sets dest_base and dest_size. * In xc_dom_load_elf_symtab we provide a new actual-pointer value hdr_ptr which we get from mapping the guest's kernel area and use (checking carefully) as the caller_xdest area. * The STAR(h) macro in libelf-dominfo.c now uses elf_access_unsigned. * elf-init uses the new elf_uval_3264 accessor to access the 32-bit fields, rather than an unchecked field access (ie, unchecked pointer access). * elf_uval has been reworked to use elf_uval_3264. Both of these macros are essentially new in this patch (although they are derived from the old elf_uval) and need careful review. * ELF_ADVANCE_DEST is now safe in the sense that you can use it to chop parts off the front of the dest area but if you chop more than is available, the dest area is simply set to be empty, preventing future accesses. * We introduce some #defines for memcpy, memset, memmove and strcpy: - We provide elf_memcpy_safe and elf_memset_safe which take PTRVALs and do checking on the supplied pointers. - Users inside libelf must all be changed to either elf_mem_unchecked (which are just like mem), or elf_mem_safe (which take PTRVALs) and are checked. Any unchanged call sites become compilation errors. We do _not_ at this time fix elf_access_unsigned so that it doesn't make unaligned accesses. We hope that unaligned accesses are OK on every supported architecture. But it does check the supplied pointer for validity. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> v7: Remove a spurious whitespace change. v5: Use allow_size value from xc_dom_vaddr_to_ptr to set xdest_size correctly. If ELF_ADVANCE_DEST advances past the end, mark the elf broken. Always regard NULL allowable region pointers (e.g. dest_base) as invalid (since NULL pointers don't point anywhere). v4: Fix ELF_UNSAFE_PTR to work on 32-bit even when provided 64-bit values. Fix xc_dom_load_elf_symtab not to call XC_DOM_PAGE_SIZE unnecessarily if load is false. This was a regression. v3.1: Introduce a change to elf_store_field to undo the effects of the v3.1 change to the previous patch (the definition there is not compatible with the new types). v3: Fix a whitespace error. v2 was Acked-by: Ian Campbell <ian.campbell@citrix.com> v2: BUGFIX: elf_strval: Fix loop termination condition to actually work. BUGFIX: elf_strval: Fix return value to not always be totally wild. BUGFIX: xc_dom_load_elf_symtab: do proper check for small header size. xc_dom_load_elf_symtab: narrow scope of `hdr_ptr'. xc_dom_load_elf_symtab: split out uninit'd symtab.class ref fix. More comments on the lifetime/validity of elf-> dest ptrs etc. libelf.h: write "obsolete" out in full libelf.h: rename "dontuse" to "typeonly" and add doc comment elf_ptrval_in_range: Document trustedness of arguments. Style and commit message fixes.
*	x86: allow Dom0 read-only access to IO-APICs	Jan Beulich	2013-05-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are BIOSes that want to map the IO-APIC MMIO region from some ACPI method(s), and there is at least one BIOS flavor that wants to use this mapping to clear an RTE's mask bit. While we can't allow the latter, we can permit reads and simply drop write attempts, leveraging the already existing infrastructure introduced for dealing with AMD IOMMUs' representation as PCI devices. This fixes an interrupt setup problem on a system where _CRS evaluation involved the above described BIOS/ACPI behavior, and is expected to also deal with a boot time crash of pv-ops Linux upon encountering the same kind of system. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: fix various issues with handling guest IRQs	Jan Beulich	2013-04-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	- properly revoke IRQ access in map_domain_pirq() error path - don't permit replacing an in use IRQ - don't accept inputs in the GSI range for MAP_PIRQ_TYPE_MSI - track IRQ access permission in host IRQ terms, not guest IRQ ones (and with that, also disallow Dom0 access to IRQ0) This is CVE-2013-1919 / XSA-46. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
*	Fix emacs local variable block to use correct C style variable.	David Vrabel	2013-02-21	1	-1/+1
\| \| \| \| \| \| \|	The emacs variable to set the C style from a local variable block is c-file-style, not c-set-style. Signed-off-by: David Vrabel <david.vrabel@citrix.com
*	x86: consolidate initialization of PV guest L4 page tables	Jan Beulich	2013-01-23	1	-7/+1
\| \| \| \| \| \| \|	So far this has been repeated in 3 places, requiring to remember to update all of them if a change is being made. Signed-off-by: Jan Beulich <jbeulich@suse.com>
*	x86: properly use map_domain_page() when building Dom0	Jan Beulich	2013-01-23	1	-25/+56
\| \| \| \| \| \| \| \| \|	This requires a minor hack to allow the correct page tables to be used while running on Dom0's page tables (as they can't be determined from "current" at that time). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	printk: prefer %#x et at over 0x%x	Jan Beulich	2012-09-21	1	-2/+2
\| \| \| \| \| \| \| \| \|	Performance is not an issue with printk(), so let the function do minimally more work and instead save a byte per affected format specifier. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: We can assume CONFIG_PAGING_LEVELS==4.	Keir Fraser	2012-09-12	1	-25/+9
\| \| \| \|	Signed-off-by: Keir Fraser <keir@xen.org>
*	xen: Remove x86_32 build target.	Keir Fraser	2012-09-12	1	-171/+3
\| \| \| \|	Signed-off-by: Keir Fraser <keir@xen.org>
*	x86: make the dom0_max_vcpus option more flexible	David Vrabel	2012-09-11	1	-12/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The dom0_max_vcpus command line option only allows the exact number of VCPUs for dom0 to be set. It is not possible to say "up to N VCPUs but no more than the number physically present." Allow a range for the option to set a minimum number of VCPUs, and a maximum which does not exceed the number of PCPUs. For example, with "dom0_max_vcpus=4-8": PCPUs Dom0 VCPUs 2 4 4 4 6 6 8 8 10 8 Existing command lines with "dom0_max_vcpus=N" still work as before (and are equivalent to dom0_max_vcpus=N-N). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
*	x86/32-on-64: adjust Dom0 initial page table layout	Jan Beulich	2012-09-07	1	-5/+11
\| \| \| \| \| \| \| \| \| \|	Drop the unnecessary reservation of the L4 page for 32on64 Dom0, and allocate its L3 first (to match behavior when running identical bit- width hypervisor and Dom0 kernel). Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	libelf-loader: introduce elf_load_image	Stefano Stabellini	2012-01-23	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a new function, called elf_load_image, to perform the actually copy of the elf image and clearing the padding. The function is implemented as memcpy and memset when the library is built as part of the tools, but it is implemented as raw_copy_to_guest and raw_clear_guest when built as part of Xen, so that it can be safely called with an HVM style dom0. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
*	cpupools: allocate CPU masks dynamically	Jan Beulich	2011-10-21	1	-2/+2
\| \| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	use xzalloc in x86 code	Jan Beulich	2011-10-04	1	-2/+1
\| \| \| \| \| \| \|	This includes the removal of a redundant memset() from microcode_amd.c. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
*	x86: don't limit dom0's maximum reservation by the available memory	David Vrabel	2011-08-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Set dom0's initial maximum reservation using the max value supplied in the dom0_mem command line option without limiting it by the available memory. This allows dom0 to make use of any hotplugged memory without having to also adjust the maximum reservation. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <jbeulich@novell.com>
*	x86: use 'dom0_mem' to limit the number of pages for dom0	David Vrabel	2011-08-22	1	-0/+2
\| \| \| \| \| \| \| \| \|	Use the 'dom0_mem' command line option to set the maximum number of pages for dom0. dom0 can use then use the XENMEM_maximum_reservation memory op to automatically find this limit and reduce the size of any page tables etc. Signed-off-by: David Vrabel <david.vrabel@citrix.com>
*	fix regression from c/s 23735:537918f518ee	Jan Beulich	2011-07-25	1	-1/+1
\| \| \| \| \| \| \| \|	This was checking presence of the wrong (old) ELF note. I don't really understand how this failed consistently only for one of the xen-boot tests... Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	add privileged (dom0) kernel feature indication	Jan Beulich	2011-07-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With our switching away from supporting 32-bit Dom0 operation, users complained that attempts (perhaps due to lack of knowledge of that change) to boot the no longer privileged kernel in Dom0 resulted in apparently silent failure. To make the mismatch explicit and visible, add dom0 feature flag that the kernel can set to indicate operation as dom0 is supported. Due to the way elf_xen_parse_features() worked up to now (getting fixed here), adding features indications to the old, string based ELF note would make the respective kernel unusable on older hypervisors. For that reason, a new ELF Note is being introduced that allows specifying supported features as a bit array instead (with features unknown to the hypervisor simply ignored, as now also done by elf_xen_parse_features(), whereas here unknown kernel-required features still keep the kernel [and hence VM] from booting). Introduce and use elf_note_numeric_array() to be forward compatible (or else an old hypervisor wouldn't be able to parse kernel specified features occupying more than 64 bits - thanks, Ian!). Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	Revert 23664:3e3c0a8be9f9	Keir Fraser	2011-07-08	1	-8/+0
\|
*	add privileged/unprivileged kernel feature indication	Jan Beulich	2011-07-08	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	With our switching away from supporting 32-bit Dom0 operation, users complained that attempts (perhaps due to lack of knowledge of that change) to boot the no longer privileged kernel in Dom0 resulted in apparently silent failure. To make the mismatch explicit and visible, add feature flags that the kernel can set to indicate operation in what modes it supports. For backward compatibility, absence of both feature flags is taken to indicate a kernel that may be capable of operating in both modes. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: split struct vcpu	Jan Beulich	2011-04-05	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is accomplished by splitting the guest_context member, which by itself is larger than a page on x86-64. Quite a number of fields of this structure is completely meaningless for HVM guests, and thus a new struct pv_vcpu gets introduced, which is being overlaid with struct hvm_vcpu in struct arch_vcpu. The one member that is mostly responsible for the large size is trap_ctxt, which now gets allocated separately (unless fitting on the same page as struct arch_vcpu, as is currently the case for x86-32), and only for non-hvm, non-idle domains. This change pointed out a latent problem in arch_set_info_guest(), which is permitted to be called on already initialized vCPU-s, but so far copied the new state into struct arch_vcpu without (in this case) actually going through all the necessary accounting/validation steps. The logic gets changed so that the pieces that bypass accounting will at least be verified to be no different from the currently active bits, and the whole change will fail in case they are. The logic does not get adjusted here to do full error recovery, that is, partially modified state continues to not get unrolled in case of failure. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	Define new <pfn.h> header for PFN_{DOWN,UP} macros.	Keir Fraser	2011-03-23	1	-0/+1
\| \| \| \|	Signed-off-by: Keir Fraser <keir@xen.org>
*	Use bool_t for various boolean variables	Keir Fraser	2010-12-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	... decreasing cache footprint. As a prerequisite this requires making cmdline_parse() a little more flexible. Also remove a few variables altogether, and adjust sections annotations for several others. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xen.org>
*	x86/iommu: account for necessary allocations when calculating Dom0's initial ↵	Keir Fraser	2010-12-14	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	allocation size As of c/s 21812:e382656e4dcc, IOMMU related allocations for Dom0 happen only after it got all of its memory allocated, and hence the reserve (mainly for setting up its swiotlb) may get exhausted without accounting for the necessary allocations up front. While not precise, the estimate has been found to be within a couple of pages for the systems it got tested on. For the calculation to be reasonably correct, this depends on the patch titled "x86/iommu: don't map RAM holes above 4G" sent out yesterday. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86_64: Fix booting 32-bit dom0	Keir Fraser	2010-11-17	1	-0/+2
\| \| \| \| \| \|	dom0/vcpu0 was not getting allocated a hypercall xlat area. Signed-off-by: Keir Fraser <keir@xen.org>
*	x86: allow passing initrd to kernel without exposing it through the initial ↵	Keir Fraser	2010-11-09	1	-27/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	mapping The (Dom0 only for now) kernel can indicate that it doesn't need its initrd mapped through a newly added ELF note - it gets passed the PFN of the initrd in this case instead of the virtual address. Even for kernels not making use of the new feature, the initrd will no longer get copied into the initial mapping, but the memory it lives in will get assigned to and mapped for the guest instead. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: do away with the boot time low-memory 1:1 mapping	Keir Fraser	2010-11-09	1	-35/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By doing so, we're no longer restricted to be able to place all boot loader modules into the low 1Gb/4Gb (32-/64-bit) of memory, nor is there a dependency anymore on where the boot loader places the modules. We're also no longer restricted to copy the modules into a place below 4Gb, nor to put them all together into a single piece of memory. Further it allows even the 32-bit Dom0 kernel to be loaded anywhere in physical memory (except if it doesn't support PAE-above-4G). Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	iommu: Map dom0 initial allocation in 'dom0-strict' iommu mode.	Keir Fraser	2010-07-16	1	-0/+2
\| \| \| \|	Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	x86: put_superpage() must also work for !opt_allow_superpage	Keir Fraser	2010-06-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	This is because the P2M table, when placed at a kernel specified location, gets populated with large pages, which the domain must have a way to unmap/recycle. Additionally when allowing Dom0 to use superpages, they ought to be tracked accordingly in the superpage frame table. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: Do not include apic.h/io_apic.h from asm/smp.h	Keir Fraser	2010-06-11	1	-0/+1
\| \| \| \| \| \|	...and fix up the ensuing fall-out of implicit dependencies Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	x86: fix Dom0 booting time regression	Keir Fraser	2010-05-04	1	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately the changes in c/s 21035 caused boot time to go up significantly on certain large systems. To rectify this without going back to the old behavior, introduce a new memory allocation flag so that Dom0 allocations can exhaust non-DMA memory before starting to consume DMA memory. For the latter, the behavior introduced in aforementioned c/s gets retained, while for the former we can now even try larger chunks first. This builds on the fact that alloc_chunk() gets called with non- increasing 'max_pages' arguments, end hence it can store locally the allocation order last used (as larger order allocations can't succeed during subsequent invocations if they failed once). Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	cpupools [1/6]: hypervisor changes	Keir Fraser	2010-04-21	1	-3/+8
\| \| \| \|	Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
*	x86: adjust Dom0 initial memory allocation strategy	Keir Fraser	2010-03-15	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simply trying order-9 allocations until they won't succeed anymore may consume unnecessarily much memory from the DMA zone (since the page allocator will try to fulfill the request by using memory from that zone when only lower order memory blocks are left in all other zones). To avoid using DMA zone memory, make alloc_chunk() try to allocate a second smaller chunk and use that one in favor of the first one if it came from a higher addressed memory. This way, all memory outside the DMA zone will be consumed before eating into that zone. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: adjust available memory calculation for Dom0 construction	Keir Fraser	2010-03-11	1	-20/+18
\| \| \| \| \| \| \| \| \| \| \| \|	With a large number of CPUs, the amount of memory needed to construct the vCPU structures for Dom0 becomes significant and hence should be accounted for when calculating the amount of memory to pass to Dom0. Signed-off-by: Jan Beulich <jbeulich@novell.com> Add code comments and clean up compute_dom0_nr_pages() invocation. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	Replace process_pending_timers() with process_pending_softirqs().	Keir Fraser	2009-12-22	1	-3/+3
\| \| \| \| \| \| \| \| \|	This ensures that any critical softirqs are handled in a timely manner (e.g., TIME_CALIBRATE_SOFTIRQ) while still avoiding being preempted by the scheduler (by SCHEDULE_SOFTIRQ), which is the reason for avoiding use of do_softirq() directly. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
*	M2P translation cannot be handled through flat table with only one slot per MFN	Keir Fraser	2009-12-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	when an MFN is shared. However, all existing calls can either infer the GFN (for example p2m table destructor) or will not need to know GFN for shared pages. This patch identifies and fixes all the M2P accessors, either by removing the translation altogether or by making the relevant modifications. Shared MFNs have a special value of SHARED_M2P_ENTRY stored in their M2P table slot. Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
*	x86: deny access to the ACPI PM timer I/O port range for Dom0	Keir Fraser	2009-10-28	1	-0/+4
\| \| \| \| \| \|	Also move the declaration of pmtmr_ioport to a suitable header file. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	Scattered code arrangement cleanups.	Keir Fraser	2009-10-07	1	-6/+1
\| \| \| \| \| \| \| \|	- remove redundant declarations - add/move prototypes to headers - move things where they belong to Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
*	properly __initdata-annotate command line option string buffers	Keir Fraser	2009-08-31	1	-2/+2
\| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: run timers when populating Dom0's P2M table	Keir Fraser	2009-08-24	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	When booting Dom0 with huge amounts of memory, and/or memory accesses being sufficiently slow (due to NUMA effects), and the ACPI PM timer or a high frequency HPET being used, the time it takes to populate the M2P table may significantly exceed the overflow time of the platform timer, screwing up time management to the point where Dom0 boot fails. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86-64: do not pass unmanageable amounts of memory to Dom0	Keir Fraser	2009-06-18	1	-18/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to address space restrictions it is not possible to successfully pass more than about 500Gb to a Linux Dom0 unless its kernel specifies a non-default phys-to-machine map location via XEN_ELFNOTE_INIT_P2M. For non-Linux Dom0 kernels I can't say whether the limit could be set to close to 1Tb, but since passing such huge amounts of memory isn't very useful anyway (and can be enforced via dom0_mem=3D), the patch doesn't attempt to guess the kernel type and restricts the memory amount in all cases. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86_64: allow more vCPU-s per guest	Keir Fraser	2009-06-18	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the shared info layout is fixed, guests are required to use VCPUOP_register_vcpu_info prior to booting any vCPU beyond the traditional limit of 32. MAX_VIRT_CPUS, being an implemetation detail of the hypervisor, is no longer being exposed in the public headers. The tools changes are clearly incomplete (and done only so things would build again), and the current state of the tools (using scalar variables all over the place to represent vCPU bitmaps) very likely doesn't permit booting DomU-s with more than the traditional number of vCPU-s. Testing of the extended functionality was done with Dom0 (96 vCPU-s, as well as 128 vCPU-s out of which the kernel elected - by way of a simple kernel side patch - to use only some, resulting in a sparse bitmap). ia64 changes only to make things build, and build-tested only (and the tools part only as far as the build would go without encountering unrelated problems in the blktap code). Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86_64: don't allocate L1 per-domain page table pages in a single chunk	Keir Fraser	2009-06-18	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	Instead, allocate them on demand, and adjust the consumer to no longer assume the allocated space is contiguous. This another prerequisite to extend to number of vCPU-s the hypervisor can support per guest. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: eliminate hard-coded NR_IRQS	Keir Fraser	2009-05-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... splitting it into global nr_irqs (determined at boot time) and per- domain nr_pirqs (derived from nr_irqs and a possibly command line specified value, which probably should later become a per-domain config setting). This has the (desirable imo) side effect of reducing the size of struct hvm_irq_dpci from requiring an order-3 page to order-2 (on x86-64), which nevertheless still is too large. However, there is now a variable size bit array on the stack in pt_irq_time_out() - while for the moment this probably is okay, it certainly doesn't look nice. However, replacing this with a static (pre-)allocation also seems less than ideal, because that would require at least min(d->nr_pirqs, NR_VECTORS) bit arrays of d->nr_pirqs bits, since this bit array is used outside of the serialized code region in that function, and keeping the domain's event lock acquired across pirq_guest_eoi() doesn't look like a good idea either. The IRQ- and vector-indexed arrays hanging off struct hvm_irq_dpci could in fact be changed further to dynamically use the smaller of the two ranges for indexing, since there are other assumptions about a one-to-one relationship between IRQs and vectors here and elsewhere. Additionally, it seems to me that struct hvm_mirq_dpci_mapping's digl_list and gmsi fields could really be overlayed, which would yield significant savings since this structure gets always instanciated in form of d->nr_pirqs (as per the above could also be the smaller of this and NR_VECTORS) dimensioned arrays. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86-64: use MFNs for linking together pages on lists	Keir Fraser	2009-01-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Unless more than 16Tb are going to ever be supported in Xen, this will allow reducing the linked list entries in struct page_info from 16 to 8 bytes. This doesn't modify struct shadow_page_info, yet, so in order to meet the constraints of that 'mirror' structure the list entry gets artificially forced to be 16 bytes in size. That workaround will be removed in a subsequent patch. Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: use alloc_domheap_page() consistently in dom0 building	Keir Fraser	2009-01-27	1	-4/+4
\| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@novell.com>
*	x86: Support booting a bzImage format domain 0 kernel.	Keir Fraser	2009-01-22	1	-1/+10
\| \| \| \| \| \| \| \|	This requires a bzImage v2.08 or later kernel. xen/common/inflate.c is taken unmodified from Linux v2.6.28. Signed-off-by: Ian Campbell <ian.campbell@citrix.com>