Commits · 86b246534b488f1882ac545ffc76fe896185efd0 · Anton / libtcg

May 08, 2018

Paolo Bonzini authored 6 years ago


The "id" property is unnecessary and can be replaced simply with
object_get_canonical_path_component.  This patch mostly undoes commit
e1ff3c67 ("monitor: fix qmp/hmp query-memdev not reporting IDs of
memory backends", 2017-01-12).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

29de4ec1

May 07, 2018

pc-dimm: factor out MemoryDevice interface · 2cc0e2e8

David Hildenbrand authored 6 years ago


On the qmp level, we already have the concept of memory devices:
    "query-memory-devices"
Right now, we only support NVDIMM and PCDIMM.

We want to map other devices later into the address space of the guest.
Such device could e.g. be virtio devices. These devices will have a
guest memory range assigned but won't be exposed via e.g. ACPI. We want
to make them look like memory device, but not glued to pc-dimm.

Especially, it will not always be possible to have TYPE_PC_DIMM as a parent
class (e.g. virtio devices). Let's use an interface instead. As a first
part, convert handling of
- qmp_pc_dimm_device_list
- get_plugged_memory_size
to our new model. plug/unplug stuff etc. will follow later.

A memory device will have to provide the following functions:
- get_addr(): Necessary, as the property "addr" can e.g. not be used for
              virtio devices (already defined).
- get_plugged_size(): The amount this device offers to the guest as of
                      now.
- get_region_size(): Because this can later on be bigger than the
                     plugged size.
- fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-2-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

2cc0e2e8

Apr 27, 2018

Clear mem_path if we fall back to anonymous RAM allocation · 6233b679

David Gibson authored 6 years ago

If the -mem-path option is set, we attempt to map the guest's RAM from a
file in the given path; it's usually used to back guest RAM with hugepages.
If we're unable to (e.g. not enough free hugepages) then we fall back to
allocating normal anonymous pages. This behaviour can be surprising, but a
comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
we can't change.

What really isn't ok, though, is that in this case we leave mem_path set.
That means functions which attempt to determine the pagesize of main RAM
can erroneously think it is hugepage based on the requested path, even
though it's not.

This is particular bad for the pseries machine type. KVM HV limitations
mean the guest can't use pagesizes larger than the host page size used to
back RAM. That means that such a fallback, rather than merely giving
poorer performance than expected will cause the guest to freeze up early in
boot as it attempts to use large page mappings that can't work.

This patch addresses the problem by clearing the mem_path variable when we
fall back to anonymous pages, meaning that subsequent attempts to
determine the RAM page size will get an accurate result.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

6233b679

Mar 20, 2018

qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList · 6388e18d

Haozhong Zhang authored 7 years ago


It may need to treat PC-DIMM and NVDIMM differently, e.g., when
deciding the necessity of non-volatile flag bit in SRAT memory
affinity structures.

A new field 'nvdimm' is added to the union type MemoryDeviceInfo for
such purpose. Its type is currently PCDIMMDeviceInfo and will be
updated when necessary in the future.

It also fixes "info memory-devices"/query-memory-devices which
currently show nvdimm devices as dimm devices since
object_dynamic_cast(obj, TYPE_PC_DIMM) happily cast nvdimm to
TYPE_PC_DIMM which it's been inherited from.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

6388e18d

pc-dimm: make qmp_pc_dimm_device_list() sort devices by address · 52c95cae

Haozhong Zhang authored 7 years ago


Make qmp_pc_dimm_device_list() return sorted by start address
list of devices so that it could be reused in places that
would need sorted list*. Reuse existing pc_dimm_built_list()
to get sorted list.

While at it hide recursive callbacks from callers, so that:

  qmp_pc_dimm_device_list(qdev_get_machine(), &list);

could be replaced with simpler:

  list = qmp_pc_dimm_device_list();

* follow up patch will use it in build_srat()

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au> for ppc part
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

52c95cae

Mar 08, 2018

numa: we don't implement NUMA for s390x · 81ce6aa5

David Hildenbrand authored 7 years ago


Right now it is possible to crash QEMU for s390x by providing e.g.
    -numa node,nodeid=0,cpus=0-1

Problem is, that numa.c uses mc->cpu_index_to_instance_props as an
indicator whether NUMA is supported by a machine type. We don't
implement NUMA for s390x ("topology") yet. However we need
mc->cpu_index_to_instance_props for query-cpus.

So let's fix this case by also checking for mc->get_default_cpu_node_id,
which will be needed by machine_set_cpu_numa_node().

qemu-system-s390x: -numa node,nodeid=0,cpus=0-1: NUMA is not supported by
                   this machine-type

While at it, make s390_cpu_index_to_props() look like on other
architectures.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180227110255.20999-1-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>

81ce6aa5

Mar 02, 2018

qapi: Empty out qapi-schema.json · 112ed241

Markus Armbruster authored 7 years ago


The previous commit improved compile time by including less of the
generated QAPI headers.  This is impossible for stuff defined directly
in qapi-schema.json, because that ends up in headers that that pull in
everything.

Move everything but include directives from qapi-schema.json to new
sub-module qapi/misc.json, then include just the "misc" shard where
possible.

It's possible everywhere, except:

* monitor.c needs qmp-command.h to get qmp_init_marshal()

* monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need
  qapi-event.h to get enum QAPIEvent

Perhaps we'll get rid of those some other day.

Adding a type to qapi/migration.json now recompiles some 120 instead
of 2300 out of 5100 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-25-armbru@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>

112ed241

Feb 09, 2018

Include qapi/error.h exactly where needed · e688df6b

Markus Armbruster authored 7 years ago


This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.

While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-5-armbru@redhat.com>
[Semantic conflict with commit 34e304e9 resolved, OSX breakage fixed]

e688df6b

Feb 05, 2018

qemu: improve hugepage allocation failure message · e85687ff

Marcelo Tosatti authored 7 years ago


Improve hugepage allocation failure message, indicating
what is happening to the user.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Message-Id: <20180115201700.GA4439@amt.cnet>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e85687ff

Jan 19, 2018

hostmem-file: add "align" option · 98376843

Haozhong Zhang authored 7 years ago


When mmap(2) the backend files, QEMU uses the host page size
(getpagesize(2)) by default as the alignment of mapping address.
However, some backends may require alignments different than the page
size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
fails with a kernel message like

[617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)

Because there is no common approach to get such alignment requirement,
we add the 'align' option to 'memory-backend-file', so that users or
management utils, which have enough knowledge about the backend, can
specify a proper alignment via this option.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20171211072806.2812-2-haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[ehabkost: fixed typo, fixed error_setg() format string]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

98376843

Dec 18, 2017

numa: remove unused #include · 1330f1e2

Philippe Mathieu-Daudé authored 7 years ago


Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

1330f1e2

Dec 14, 2017

spapr: replace numa_get_node() with lookup in pc-dimm list · f47bd1c8

Igor Mammedov authored 7 years ago


SPAPR is the last user of numa_get_node() and a bunch of
supporting code to maintain numa_info[x].addr list.

Get LMB node id from pc-dimm list, which allows to
remove ~80LOC maintaining dynamic address range
lookup list.

It also removes pc-dimm dependency on numa_[un]set_mem_node_id()
and makes pc-dimms a sole source of information about which
node it belongs to and removes duplicate data from global
numa_info.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

f47bd1c8

Nov 16, 2017

NUMA: Enable adding NUMA node implicitly · 7b8be49d

Dou Liyang authored 7 years ago


Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
 * Windows: won't enable memory hotplug without SRAT table at all
 * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
 * Windows: to enable memory hotplug
 * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
Cc: Izumi Taku <izumi.taku@jp.fujitsu.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

7b8be49d

Oct 27, 2017

numa: fixup parsed NumaNodeOptions earlier · cc001888

Igor Mammedov authored 7 years ago


numa 'mem' option with suffix or without one is possible
only on CLI/HMP. Instead of fixing up special suffix less
CLI case deep in parse_numa_node() do it earlier right
after option is parsed into NumaNodeOptions with OptVisistor
so that the rest of the code would use valid values in
NumaNodeOptions and won't have to reparse QemuOpts.

It will help to isolate CLI/HMP parts in parse_numa() and
split out parsed NumaNodeOptions processing into separate
function that could be reused by QMP handler where we have
only NumaNodeOptions and don't need any fixups.

While at it reuse qemu_strtosz_MiB() instead of manually
checking for suffixes.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1507801198-98182-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

cc001888

Sep 19, 2017

NUMA: Replace MAX_NODES with nb_numa_nodes in for loop · f51878ba

Dou Liyang authored 7 years ago


In QEMU, the number of the NUMA nodes is determined by parse_numa_opts().
Then, QEMU uses it for iteration, for example:
  for (i = 0; i < nb_numa_nodes; i++)

However, in memory_region_allocate_system_memory(), it uses MAX_NODES
not nb_numa_nodes.

So, replace MAX_NODES with nb_numa_nodes to keep code consistency and
reduce the loop times.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Message-Id: <1503387936-3483-1-git-send-email-douly.fnst@cn.fujitsu.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

f51878ba

Sep 14, 2017

hmp: extend "info numa" with hotplugged memory information · 31959e82

Vadim Galitsyn authored 7 years ago

Report amount of hotplugged memory in addition to total
amount per NUMA node.

Signed-off-by: Vadim Galitsyn <vadim.galitsyn@profitbricks.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: qemu-devel@nongnu.org
Message-Id: <20170829153022.27004-2-vadim.galitsyn@profitbricks.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

31959e82

Jul 14, 2017

memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate() · 1cfe48c1

Peter Maydell authored 7 years ago


Rename memory_region_init_ram() to memory_region_init_ram_nomigrate().
This leaves the way clear for us to provide a memory_region_init_ram()
which does handle migration.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org

1cfe48c1

Jun 20, 2017

numa: use get_uint() for "size" property · 61d7c144

Marc-André Lureau authored 7 years ago


"size" is a property of TYPE_MEMORY_BACKEND.
host_memory_backend_get_size() and host_memory_backend_set_size() use
visit_type_size().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-39-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

61d7c144

Jun 05, 2017

numa: make sure that all cpus have has_node_id set if numa is enabled · d41f3e75

Igor Mammedov authored 7 years ago


It fixes/add missing _PXM object for non mapped CPU (x86)
and missing fdt node (virt-arm).

It ensures that possible_cpus contains complete mapping if
numa is enabled by the time machine_init() is executed.

As result non completely mapped CPUs:
 1) appear in ACPI/fdt blobs
 2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
 3) allows to drop checks for has_node_id in numa only code,
   reducing number of invariants incomplete mapping could produce
 4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
   (when CPU object is created) to machine_numa_finish_init() which
   helps to fix [1, 2] and make possible_cpus complete source
   of numa mapping available even before CPUs are created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-4-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

d41f3e75

numa: move default mapping init to machine · 60bed6a3

Igor Mammedov authored 7 years ago


there is no need use cpu_index_to_instance_props() for setting
default cpu -> node mapping. Generic machine code can do it
without cpu_index by just enabling already preset defaults
in possible_cpus.

PS:
as bonus it makes one less user of cpu_index_to_instance_props()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

60bed6a3

numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr · a0ceb640

Igor Mammedov authored 7 years ago


Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com>
[ehabkost: Fix indentation]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

a0ceb640

May 30, 2017

numa: Fix format string for "Invalid node" message · f892291e

Eduardo Habkost authored 7 years ago


Some compilers complain about the PRIu16 format string with the
MAX(src, dst) and MAX_NODES arguments.  Example output from Apple LLVM
version 7.3.0 (clang-703.0.31):

  numa.c:236:20: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat]
                     MAX(src, dst), MAX_NODES);
  ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
  include/qapi/error.h:163:35: note: expanded from macro 'error_setg'
                          (fmt), ## __VA_ARGS__)
                                    ^~~~~~~~~~~
  glib/2.52.2/include/glib-2.0/glib/gmacros.h:288:20: note: expanded from macro 'MAX'
  #define MAX(a, b)  (((a) > (b)) ? (a) : (b))
                     ^~~~~~~~~~~~~~~~~~~~~~~~~
  numa.c:236:35: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat]
                     MAX(src, dst), MAX_NODES);
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
  include/qapi/error.h:163:35: note: expanded from macro 'error_setg'
                          (fmt), ## __VA_ARGS__)
                                    ^~~~~~~~~~~
  include/sysemu/sysemu.h:165:19: note: expanded from macro 'MAX_NODES'
  #define MAX_NODES 128
                    ^~~
MAX(src, dst) promotes the src and dst arguments to int, and MAX_NODES
is an int.  Use %d to silence those warnings.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170530184013.31044-1-ehabkost@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

f892291e

May 11, 2017

numa: add '-numa cpu,...' option for property based node mapping · 419fcdec

Igor Mammedov authored 7 years ago


legacy cpu to node mapping is using cpu index values to map
VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
option. However cpu index is internal concept and QEMU users
have to guess /reimplement qemu's logic/ to map it to
a concrete cpu socket/core/thread to make sane CPUs
placement across numa nodes.

This patch allows to map cpu objects to numa nodes using
the same properties as used for cpus with -device/device_add
(socket-id/core-id/thread-id/node-id).

At present valid properties/values to address CPUs could be
fetched using hotpluggable-cpus monitor/qmp command, it will
require user to start qemu twice when creating domain to fetch
possible CPUs for a machine type/-smp layout first and
then the second time with numa explicit mapping for actual
usage. The first step results could be saved and reused to
set/change mapping later as far as machine type/-smp stays
the same.

Proposed impl. supports exact and wildcard matching to
simplify CLI and allow to set mapping for a specific cpu
or group of cpu objects specified by matched properties.

For example:

   # exact mapping x86
   -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n

   # exact mapping SPAPR
   -numa cpu,node-id=x,core-id=y

   # wildcard mapping, all cpu objects that match socket-id=y
   # are mapped to node-id=x
   -numa cpu,node-id=x,socket-id=y

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1494415802-227633-18-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

419fcdec

numa: remove node_cpu bitmaps as they are no longer used · 1171ae9a

Igor Mammedov authored 7 years ago


Postfactum "CPU(s) present in multiple NUMA nodes" check
was the last user of node_cpu bitmaps, but it's not need
as machine_set_cpu_numa_node() does the similar check at
the time mapping is set for cpus (i.e. when -numa cpus=
is parsed) and ensures that cpu can be mapped only to
one node.

Remove duplicate check based on node_cpu bitmaps and
since the last user is gone remove node_cpu as well,
which completes internal transition from legacy bitmap
based mapping storage to possible_cpus storage.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-17-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

1171ae9a

numa: use possible_cpus for not mapped CPUs check · ec78f811

Igor Mammedov authored 7 years ago


and remove corresponding part in numa.c that uses
node_cpu bitmaps.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-16-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

ec78f811

numa: remove no longer need numa_post_machine_init() · 3b8a8557

Igor Mammedov authored 7 years ago


CPUState::numa_node is still in use but now it's set by
board when it creates CPU objects. So there isn't any
need to set it again after all CPU's are created,
since it's been already set.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-14-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

3b8a8557

tests: numa: add case for QMP command query-cpus · 6accfb78

Igor Mammedov authored 7 years ago


Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-13-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

6accfb78

numa: do default mapping based on possible_cpus instead of node_cpu bitmaps · af9b20e8

Igor Mammedov authored 7 years ago


Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-8-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

af9b20e8

numa: mirror cpu to node mapping in MachineState::possible_cpus · 7c88e65d

Igor Mammedov authored 7 years ago


Introduce machine_set_cpu_numa_node() helper that stores
node mapping for CPU in MachineState::possible_cpus.
CPU and node it belongs to is specified by 'props' argument.

Patch doesn't remove old way of storing mapping in
numa_info[X].node_cpu as removing it at the same time
makes patch rather big. Instead it just mirrors mapping
in possible_cpus and follow up per target patches will
switch to possible_cpus and numa_info[X].node_cpu will
be removed once there isn't any users left.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-7-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

7c88e65d

numa: add check that board supports cpu_index to node mapping · 64c2a8f6

Igor Mammedov authored 7 years ago


Default node mapping initialization already checks that board
supports cpu_index to node mapping and refuses to start if
it's not supported. Do the same for explicitly provided
mapping "-numa node,cpus=..."

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-6-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

64c2a8f6

numa: move source of default CPUs to NUMA node mapping into boards · ea089eeb

Igor Mammedov authored 7 years ago


Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

ea089eeb

numa: equally distribute memory on nodes · 3bfe5716

Laurent Vivier authored 7 years ago

When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.

This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.

To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.

We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.

Example:

qemu-system-ppc64 -S -nographic  -nodefaults -monitor stdio -m 1G -smp 8 \
                  -numa node -numa node -numa node \
                  -numa node -numa node -numa node

Before:

(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB

After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB

[1] https://en.wikipedia.org/wiki/Error_diffusion



Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170502162955.1610-2-lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

3bfe5716

numa: Allow setting NUMA distance for different NUMA nodes · 0f203430

He Chen authored 7 years ago


This patch is going to add SLIT table support in QEMU, and provides
additional option `dist` for command `-numa` to allow user set vNUMA
distance by QEMU command.

With this patch, when a user wants to create a guest that contains
several vNUMA nodes and also wants to set distance among those nodes,
the QEMU command would like:

```
-numa node,nodeid=0,cpus=0 \
-numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 \
-numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=0,dst=2,val=31 \
-numa dist,src=0,dst=3,val=41 \
-numa dist,src=1,dst=2,val=21 \
-numa dist,src=1,dst=3,val=31 \
-numa dist,src=2,dst=3,val=21 \
```

Signed-off-by: He Chen <he.chen@linux.intel.com>
Message-Id: <1493260558-20728-1-git-send-email-he.chen@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

0f203430

May 07, 2017

Remove reduntant qemu: from error functions · d0e31a10

Ishani Chugh authored 7 years ago

This patch removes redundant "qemu:" from error functions. The link to the bitesized task is:
http://wiki.qemu-project.org/Contribute/BiteSizedTasks#Error_checking

Signed-off-by: Ishani Chugh <chugh.ishani@research.iiit.ac.in>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

d0e31a10

Mar 22, 2017

numa,spapr: align default numa node memory size to 256MB · 55641213

Laurent Vivier authored 8 years ago


Since commit 224245bf ("spapr: Add LMB DR connectors"), NUMA node
memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE).

But when "-numa" option is provided without "mem" parameter,
the memory is equally divided between nodes, but 8MB aligned.
This can be not valid for pseries.

In that case we can have:
$ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node
qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB

With this patch, we have:
(qemu) info numa
3 nodes
node 0 cpus: 0
node 0 size: 1280 MB
node 1 cpus:
node 1 size: 1280 MB
node 2 cpus:
node 2 size: 1536 MB

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

55641213

Feb 22, 2017

numa: Flatten simple union NumaOptions · d081a49a

Markus Armbruster authored 8 years ago


Simple unions are simpler than flat unions in the schema, but more
complicated in C and on the QMP wire: there's extra indirection in C
and extra nesting on the wire, both pointless.  They're best avoided
in new code.

NumaOptions isn't new, but it's only used internally, not in QMP.
Convert it to a flat union.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487709988-14322-2-git-send-email-armbru@redhat.com>

d081a49a

Jan 16, 2017

ramblock-notifier: new · 0987d735

Paolo Bonzini authored 8 years ago


This adds a notify interface of ram block additions and removals.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0987d735

Jan 12, 2017

numa: make -numa parser dynamically allocate CPUs masks · cdda2018

Igor Mammedov authored 8 years ago


so it won't impose an additional limits on max_cpus limits
supported by different targets.

It removes global MAX_CPUMASK_BITS constant and need to
bump it up whenever max_cpus is being increased for
a target above MAX_CPUMASK_BITS value.

Use runtime max_cpus value instead to allocate sufficiently
sized node_cpu bitmasks in numa parser.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1479466974-249781-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: Added asserts to ensure cpu_index < max_cpus]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

cdda2018

monitor: fix qmp/hmp query-memdev not reporting IDs of memory backends · e1ff3c67

Igor Mammedov authored 8 years ago


Considering 'id' is mandatory for user_creatable objects/backends
and user_creatable_add_type() always has it as an argument
regardless of where from it is called CLI/monitor or QMP,
Fix issue by adding 'id' property to hostmem backends and
set it in user_creatable_add_type() for every object that
implements 'id' property. Then later at query-memdev time
get 'id' from object directly.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1484052795-158195-4-git-send-email-imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

e1ff3c67

Oct 09, 2016

numa: reduce code duplication by adding helper numa_get_node_for_cpu() · 6bea1ddf

Igor Mammedov authored 8 years ago


Replace repeated pattern

    for (i = 0; i < nb_numa_nodes; i++) {
        if (test_bit(idx, numa_info[i].node_cpu)) {
           ...
           break;

with a helper function to lookup numa node index for cpu.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

6bea1ddf