- Sep 22, 2017
-
-
Peter Xu authored
We had a per-chardev cache for context, then we don't need this parameter to be passed in every time when chr_update_read_handler() called. As long as we are calling chr_update_read_handler() using qemu_chr_be_update_read_handlers() we'll be fine. Signed-off-by:
Peter Xu <peterx@redhat.com> Message-Id: <1505975754-21555-5-git-send-email-peterx@redhat.com> Reviewed-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
It was only passed in by chr_update_read_handlers(). However when reconnect, we'll lose that context information. So if a chardev was running on another context (rather than the default context, the NULL pointer), it'll switch back to the default context if reconnection happens. But, it should really stick to the old context. Convert all the callers of io_add_watch_poll() to use the internally cached gcontext. Then the context should be able to survive even after reconnections. Signed-off-by:
Peter Xu <peterx@redhat.com> Message-Id: <1505975754-21555-4-git-send-email-peterx@redhat.com> Reviewed-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
It caches the gcontext that is used to poll the chardev IO. Before this patch, we only passed it in via chr_update_read_handlers(). However that may not be enough if the char backend is disconnected and reconnected afterward. There are chardev codes that still assumed the context be NULL (which is the main context). Will fix that up in following up patches. Signed-off-by:
Peter Xu <peterx@redhat.com> Message-Id: <1505975754-21555-3-git-send-email-peterx@redhat.com> Reviewed-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
Add a wrapper for the chr_update_read_handler(). Signed-off-by:
Peter Xu <peterx@redhat.com> Message-Id: <1505975754-21555-2-git-send-email-peterx@redhat.com> Reviewed-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This adds a concrete subclass of pr-manager that talks to qemu-pr-helper. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Proper support of persistent reservation for multipath devices requires communication with the multipath daemon, so that the reservation is registered and applied when a path comes up. The device mapper utilities provide a library to do so; this patch makes qemu-pr-helper.c detect multipath devices and, when one is found, delegate the operation to libmpathpersist. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Introduce a privileged helper to run persistent reservation commands. This lets virtual machines send persistent reservations without using CAP_SYS_RAWIO or out-of-tree patches. The helper uses Unix permissions and SCM_RIGHTS to restrict access to processes that can access its socket and prove that they have an open file descriptor for a raw SCSI device. The next patch will also correct the usage of persistent reservations with multipath devices. It would also be possible to support for Linux's IOC_PR_* ioctls in the future, to support NVMe devices. For now, however, only SCSI is supported. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Sep 21, 2017
-
-
Paolo Bonzini authored
It is a common requirement for virtual machine to send persistent reservations, but this currently requires either running QEMU with CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged QEMU bypass Linux's filter on SG_IO commands. As an alternative mechanism, the next patches will introduce a privileged helper to run persistent reservation commands without expanding QEMU's attack surface unnecessarily. The helper is invoked through a "pr-manager" QOM object, to which file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and PERSISTENT RESERVE IN commands. For example: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd or: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd Multiple pr-manager implementations are conceivable and possible, though only one is implemented right now. For example, a pr-manager could: - talk directly to the multipath daemon from a privileged QEMU (i.e. QEMU links to libmpathpersist); this makes reservation work properly with multipath, but still requires CAP_SYS_RAWIO - use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though) - more interestingly, implement reservations directly in QEMU through file system locks or a shared database (e.g. sqlite) Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This shares an cached empty FlatView among address spaces. The empty FV is used every time when a root MR renders into a FV without memory sections which happens when MR or its children are not enabled or zero-sized. The empty_view is not NULL to keep the rest of memory API intact; it also has a dispatch tree for the same reason. On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves the amount of FlatView's in use (557 -> 260) and dispatch tables (~800000 -> ~370000). In an unrelated experiment with 112 non-virtio devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000 are created at startup. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-16-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
A container can be used instead of an alias to allow switching between multiple subregions. In this case we cannot directly share the subregions (since they only belong to a single parent), but if the subregions are aliases we can in turn walk those. This is not enough to remove all source of quadratic FlatView creation, but it enables sharing of the PCI bus master FlatViews (and their AddressSpaceDispatch structures) across all PCI devices. For 112 virtio-net-pci devices, boot time is reduced from 25 to 10 seconds and memory consumption from 1.4 to 1 G. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This avoids usual memory_region_transaction_commit() which rebuilds all FVs. On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this brings down the boot time from 25s to 20s and reduces the amount of temporary FVs allocated during machine constructon (~800000 -> ~640000) and amount of temporary dispatch trees (~370000 -> ~300000), the total memory footprint goes down (18G -> 17G). Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-18-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
Since FlatViews are shared now and ASes not, this gets rid of address_space_init_shareable(). This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-17-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This adds a new "-d" switch to "info mtree" to print dispatch tree internals. This changes the way "-f" is handled - it prints now flat views and associated address spaces. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-15-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This creates a new AS object without any FlatView as memory_region_transaction_commit() may want to reuse the empty FV. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-14-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This allows sharing flat views between address spaces (AS) when the same root memory region is used when creating a new address space. This is done by walking through all ASes and caching one FlatView per a physical root MR (i.e. not aliased). This removes search for duplicates from address_space_init_shareable() as FlatViews are shared elsewhere and keeping as::ref_count correct seems an unnecessary and useless complication. This should cause no change and memory use or boot time yet. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-13-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
So it is called (twice) from the same function. This is to make the next patches a bit simpler. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-12-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This is to make next patches simpler. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-11-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
Address spaces get to keep a root MR (alias or not) but FlatView stores the actual MR as this is going to be used later on to decide whether to share a particular FlatView or not. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-10-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This renames some helpers to reflect better what they do. This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-9-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
We store AddressSpaceDispatch* in FlatView anyway so there is no need to carry it from mem_add() to register_subpage/register_multipage. This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-8-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
FlatView's will be shared between AddressSpace's and subpage_t and MemoryRegionSection cannot store AS anymore, hence this change. In particular, for: typedef struct subpage_t { MemoryRegion iomem; - AddressSpace *as; + FlatView *fv; hwaddr base; uint16_t sub_section[]; } subpage_t; struct MemoryRegionSection { MemoryRegion *mr; - AddressSpace *address_space; + FlatView *fv; hwaddr offset_within_region; Int128 size; hwaddr offset_within_address_space; bool readonly; }; This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-7-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
AS in ASD is only used to pass AS from mem_begin() to register_subpage() to store it in MemoryRegionSection, we can do this directly now. This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-6-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
As we are going to share FlatView's between AddressSpace's, and AddressSpaceDispatch is a structure to perform quick lookup in FlatView, this moves ASD to FlatView. After previosly open coded ASD rendering, we can also remove as->next_dispatch as the new FlatView pointer is stored on a stack and set to an AS atomically. flatview_destroy() is executed under RCU instead of address_space_dispatch_free() now. This makes mem_begin/mem_commit to work with ASD and mem_add with FV as later on mem_add will be taking FV as an argument anyway. This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-5-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This moves a FlatView allocation and initialization to a helper. While we are nere, replace g_new with g_new0 to not to bother if we add new fields in the future. This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-4-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
We are going to share FlatView's between AddressSpace's and per-AS memory listeners won't suit the purpose anymore so open code the dispatch tree rendering. Since there is a good chance that dispatch_listener was the only listener, this avoids address_space_update_topology_pass() if there is no registered listeners; this should improve starting time. This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-3-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alexey Kardashevskiy authored
This adds an AS** parameter to address_space_do_translate() to make it easier for the next patch to share FlatViews. This should cause no behavioural change. Signed-off-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-2-aik@ozlabs.ru> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
It's possible for address_space_get_flatview() as it currently stands to cause a use-after-free for the returned FlatView, if the reference count is incremented after the FlatView has been replaced by a writer: thread 1 thread 2 RCU thread ------------------------------------------------------------- rcu_read_lock read as->current_map set as->current_map flatview_unref '--> call_rcu flatview_ref [ref=1] rcu_read_unlock flatview_destroy <badness> Since FlatViews are not updated very often, we can just detect the situation using a new atomic op atomic_fetch_inc_nonzero, similar to Linux's atomic_inc_not_zero, which performs the refcount increment only if it hasn't already hit zero. This is similar to Linux commit de09a9771a53 ("CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials", 2010-07-29). Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
KONRAD Frederic authored
This avoids a name clash with the access macro on windows 64: make CHK version_gen.h CC aarch64-softmmu/memory.o /home/konrad/qemu/memory.c: In function 'access_with_adjusted_size': /home/konrad/qemu/memory.c:591:73: error: macro "access" passed 7 arguments, \ but takes just 2 (size - access_size - i) * 8, access_mask, attrs); ^ Signed-off-by:
KONRAD Frederic <frederic.konrad@adacore.com> Message-Id: <1505988260-8483-1-git-send-email-frederic.konrad@adacore.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
David Hildenbrand authored
pflash toggles mr->romd_mode. So this assert does not always hold. 1) a device was added with !mr->romd_mode, therefore effectively not creating a kvm slot as we want to trap every access (add = false). 2) mr->romd_mode was toggled on before remove it. There is now actually no slot to remove and the assert is wrong. So let's just drop the assert. Reported-by:
Gerd Hoffmann <kraxel@redhat.com> Signed-off-by:
David Hildenbrand <david@redhat.com> Message-Id: <20170920145025.19403-1-david@redhat.com> Tested-by:
Gerd Hoffmann <kraxel@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Pavel Butsykin authored
We should guarantee that RAM will not be modified while VM has a stopped state, otherwise it can lead to negative consequences during post-copy migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on source side will not be modified as this could lead to non-consistent vm state on the destination side. Also RAM access during postcopy-ram migration with enabled release-ram capability can lead to sad consequences. Let's add enable_backend() callback to avoid undesirable virtioqueue changes in the guest memory. Signed-off-by:
Pavel Butsykin <pbutsykin@virtuozzo.com> Message-Id: <20170919120733.22020-1-pbutsykin@virtuozzo.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Sep 20, 2017
-
-
Peter Maydell authored
These patches fix regressions in 2.10 # gpg: Signature made Wed 20 Sep 2017 07:51:07 BST # gpg: using DSA key 0x02FC3AEB0101DBC2 # gpg: Good signature from "Greg Kurz <groug@kaod.org>" # gpg: aka "Greg Kurz <groug@free.fr>" # gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>" # gpg: aka "Gregory Kurz (Groug) <groug@free.fr>" # gpg: aka "[jpeg image of size 3330]" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2 * remotes/gkurz/tags/for-upstream: 9pfs: check the size of transport buffer before marshaling 9pfs: fix name_to_path assertion in v9fs_complete_rename() 9pfs: fix readdir() for 9p2000.u Signed-off-by:
Peter Maydell <peter.maydell@linaro.org>
-
Peter Maydell authored
Machine/CPU/NUMA queue, 2017-09-19 # gpg: Signature made Tue 19 Sep 2017 21:17:01 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: MAINTAINERS: Update git URLs for my trees hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM NUMA: Replace MAX_NODES with nb_numa_nodes in for loop numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly pc: use generic cpu_model parsing vl.c: convert cpu_model to cpu type and set of global properties before machine_init() cpu: make cpu_generic_init() abort QEMU on error qom: cpus: split cpu_generic_init() on feature parsing and cpu creation parts hostmem-file: Add "discard-data" option osdep: Define QEMU_MADV_REMOVE vl: Clean up user-creatable objects when exiting Signed-off-by:
Peter Maydell <peter.maydell@linaro.org>
-
Jan Dakinevich authored
v9fs_do_readdir_with_stat() should check for a maximum buffer size before an attempt to marshal gathered data. Otherwise, buffers assumed as misconfigured and the transport would be broken. The patch brings v9fs_do_readdir_with_stat() in conformity with v9fs_do_readdir() behavior. Signed-off-by:
Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused my commit 8d37de41 # 2.10] Signed-off-by:
Greg Kurz <groug@kaod.org>
-
Jan Dakinevich authored
The third parameter of v9fs_co_name_to_path() must not contain `/' character. The issue is most likely related to 9p2000.u protocol only. Signed-off-by:
Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused by commit f57f5878 # 2.10] Signed-off-by:
Greg Kurz <groug@kaod.org>
-
Jan Dakinevich authored
If the client is using 9p2000.u, the following occurs: $ cd ${virtfs_shared_dir} $ mkdir -p a/b/c $ ls a/b ls: cannot access 'a/b/a': No such file or directory ls: cannot access 'a/b/b': No such file or directory a b c instead of the expected: $ ls a/b c This is a regression introduced by commit f57f5878; local_name_to_path() now resolves ".." and "." in paths, and v9fs_do_readdir_with_stat()->stat_to_v9stat() then copies the basename of the resulting path to the response. With the example above, this means that "." and ".." are turned into "b" and "a" respectively... stat_to_v9stat() currently assumes it is passed a full canonicalized path and uses it to do two different things: 1) to pass it to v9fs_co_readlink() in case the file is a symbolic link 2) to set the name field of the V9fsStat structure to the basename part of the given path It only has two users: v9fs_stat() and v9fs_do_readdir_with_stat(). v9fs_stat() really needs 1) and 2) to be performed since it starts with the full canonicalized path stored in the fid. It is different for v9fs_do_readdir_with_stat() though because the name we want to put into the V9fsStat structure is the d_name field of the dirent actually (ie, we want to keep the "." and ".." special names). So, we only need 1) in this case. This patch hence adds a basename argument to stat_to_v9stat(), to be used to set the name field of the V9fsStat structure, and moves the basename logic to v9fs_stat(). Signed-off-by:
Jan Dakinevich <jan.dakinevich@gmail.com> (groug, renamed old name argument to path and updated changelog) Signed-off-by:
Greg Kurz <groug@kaod.org>
-
- Sep 19, 2017
-
-
Eduardo Habkost authored
List the branches where I queue patches for Machine Core, NUMA, Memory Backends, and X86. Update the NUMA section to list the "machine-next" branch instead of "numa". Signed-off-by:
Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170901153928.17058-1-ehabkost@redhat.com> Signed-off-by:
Eduardo Habkost <ehabkost@redhat.com>
-
Eduardo Habkost authored
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Signed-off-by:
Eduardo Habkost <ehabkost@redhat.com> Signed-off-by:
Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1504231805-30957-2-git-send-email-douly.fnst@cn.fujitsu.com> Signed-off-by:
Eduardo Habkost <ehabkost@redhat.com>
-
Dou Liyang authored
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts(). Then, QEMU uses it for iteration, for example: for (i = 0; i < nb_numa_nodes; i++) However, in memory_region_allocate_system_memory(), it uses MAX_NODES not nb_numa_nodes. So, replace MAX_NODES with nb_numa_nodes to keep code consistency and reduce the loop times. Signed-off-by:
Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1503387936-3483-1-git-send-email-douly.fnst@cn.fujitsu.com> Reviewed-by:
Igor Mammedov <imammedo@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by:
Eduardo Habkost <ehabkost@redhat.com>
-