Skip to content
Snippets Groups Projects
  1. Mar 06, 2022
  2. Mar 04, 2022
  3. Feb 21, 2022
  4. Feb 16, 2022
    • Daniel P. Berrangé's avatar
      seccomp: block setns, unshare and execveat syscalls · 46380571
      Daniel P. Berrangé authored
      
      setns/unshare are used to change namespaces which is not something QEMU
      needs to be able todo.
      
      execveat is a new variant of execve so should be blocked just like
      execve already is.
      
      Acked-by: default avatarEduardo Otubo <otubo@redhat.com>
      Signed-off-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      46380571
    • Daniel P. Berrangé's avatar
      seccomp: block use of clone3 syscall · c542b302
      Daniel P. Berrangé authored
      
      Modern glibc will use clone3 instead of clone, when it detects that it
      is available. We need to compare flags in order to decide whether to
      allow clone (thread create vs process fork), but in clone3 the flags
      are hidden inside a struct. Seccomp can't currently match on data inside
      a struct, so our only option is to block clone3 entirely. If we use
      ENOSYS to block it, then glibc transparently falls back to clone.
      
      This may need to be revisited if Linux adds a new architecture in
      future and only provides clone3, without clone.
      
      Acked-by: default avatarEduardo Otubo <otubo@redhat.com>
      Signed-off-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      c542b302
    • Daniel P. Berrangé's avatar
      seccomp: fix blocking of process spawning · 5a2f693f
      Daniel P. Berrangé authored
      
      When '-sandbox on,spawn=deny' is given, we are supposed to block the
      ability to spawn processes. We naively blocked the 'fork' syscall,
      forgetting that any modern libc will use the 'clone' syscall instead.
      
      We can't simply block the 'clone' syscall though, as that will break
      thread creation. We thus list the set of flags used to create threads
      and block anything that doesn't match this exactly.
      
      Acked-by: default avatarEduardo Otubo <otubo@redhat.com>
      Signed-off-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      5a2f693f
    • Daniel P. Berrangé's avatar
      seccomp: allow action to be customized per syscall · 8f46f562
      Daniel P. Berrangé authored
      
      We're currently tailoring whether to use kill process or return EPERM
      based on the syscall set. This is not flexible enough for future
      requirements where we also need to be able to return a variety of
      actions on a per-syscall granularity.
      
      Acked-by: default avatarEduardo Otubo <otubo@redhat.com>
      Signed-off-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      8f46f562
    • Peter Xu's avatar
      memory: Fix qemu crash on starting dirty log twice with stopped VM · a5c90c61
      Peter Xu authored
      QEMU can now easily crash with two continuous migration carried out:
      
      (qemu) migrate -d exec:cat>out
      (qemu) migrate_cancel
      (qemu) migrate -d exec:cat>out
      [crash] ../softmmu/memory.c:2782: memory_global_dirty_log_start: Assertion
      `!(global_dirty_tracking & flags)' failed.
      
      It's because memory API provides a way to postpone dirty log stop if the VM is
      stopped, and that'll be re-done until the next VM start.  It was added in 2017
      with commit 19310760 ("migration: optimize the downtime", 2017-08-01).
      
      However the recent work on allowing dirty tracking to be bitmask broke it,
      which is commit 63b41db4 ("memory: make global_dirty_tracking a bitmask",
      2021-11-01).
      
      The fix proposed in this patch contains two things:
      
        (1) Instead of passing over the flags to postpone stop dirty track, we add a
            global variable (along with current vmstate_change variable) to record
            what flags to stop dirty tracking.
      
        (2) When start dirty tracking, instead if remove the vmstate hook directly,
            we also execute the postponed stop process so that we make sure all the
            starts and stops will be paired.
      
      This procedure is overlooked in the bitmask-ify work in 2021.
      
      Cc: Hyman Huang <huangy81@chinatelecom.cn>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044818
      
      
      Fixes: 63b41db4 ("memory: make global_dirty_tracking a bitmask")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220207123019.27223-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a5c90c61
  5. Feb 08, 2022
  6. Jan 28, 2022
    • Peter Maydell's avatar
      rtc: Move RTC function prototypes to their own header · 2f93d8b0
      Peter Maydell authored
      
      softmmu/rtc.c defines two public functions: qemu_get_timedate() and
      qemu_timedate_diff().  Currently we keep the prototypes for these in
      qemu-common.h, but most files don't need them.  Move them to their
      own header, a new include/sysemu/rtc.h.
      
      Since the C files using these two functions did not need to include
      qemu-common.h for any other reason, we can remove those include lines
      when we add the include of the new rtc.h.
      
      The license for the .h file follows that of the softmmu/rtc.c
      where both the functions are defined.
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      2f93d8b0
  7. Jan 21, 2022
  8. Jan 20, 2022
  9. Jan 18, 2022
    • Philippe Mathieu-Daudé's avatar
      hw/dma: Let dma_buf_read() / dma_buf_write() propagate MemTxResult · f02b664a
      Philippe Mathieu-Daudé authored
      
      Since commit 292e1314, dma_buf_rw() returns a MemTxResult type.
      Do not discard it, return it to the caller. Pass the previously
      returned value (the QEMUSGList residual size, which was rarely used)
      as an optional argument.
      
      With this new API, SCSIRequest::residual might now be accessed via
      a pointer. Since the size_t type does not have the same size on
      32 and 64-bit host architectures, convert it to a uint64_t, which
      is big enough to hold the residual size, and the type is constant
      on both 32/64-bit hosts.
      
      Update the few dma_buf_read() / dma_buf_write() callers to the new
      API.
      
      Reviewed-by: default avatarKlaus Jensen <k.jensen@samsung.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220117125130.131828-1-f4bug@amsat.org>
      f02b664a
    • Philippe Mathieu-Daudé's avatar
      hw/dma: Use dma_addr_t type definition when relevant · bfa30f39
      Philippe Mathieu-Daudé authored
      
      Update the obvious places where dma_addr_t should be used
      (instead of uint64_t, hwaddr, size_t, int32_t types).
      
      This allows to have &dma_addr_t type portable on 32/64-bit
      hosts.
      
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Message-Id: <20220111184309.28637-11-f4bug@amsat.org>
      bfa30f39
    • Philippe Mathieu-Daudé's avatar
      hw/scsi: Rename SCSIRequest::resid as 'residual' · 5f412602
      Philippe Mathieu-Daudé authored
      
      The 'resid' field is slightly confusing and could be
      interpreted as some ID. Rename it as 'residual' which
      is clearer to review. No logical change.
      
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Reviewed-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20220111184309.28637-8-f4bug@amsat.org>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      5f412602
    • Bernhard Beschow's avatar
      softmmu: Provide a clue as to why device tree loading failed · d4fae97d
      Bernhard Beschow authored
      
      fdt_open_into() obligingly returns an error code in case the operation
      failed. So be obliging as well and use it in the error message.
      
      Signed-off-by: default avatarBernhard Beschow <shentey@gmail.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarAlistair Francis <alistair.francis@wdc.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Message-Id: <20220116114649.40859-1-shentey@gmail.com>
      Signed-off-by: default avatarLaurent Vivier <laurent@vivier.eu>
      d4fae97d
    • Peter Xu's avatar
      memory: Fix incorrect calls of log_global_start/stop · 7b0538ed
      Peter Xu authored
      
      We should only call the log_global_start/stop when the global dirty track
      bitmask changes from zero<->non-zero.
      
      No real issue reported for this yet probably because no immediate user to
      enable both dirty rate measurement and migration at the same time.  However
      it'll be good to be prepared for it.
      
      Fixes: 63b41db4 ("memory: make global_dirty_tracking a bitmask")
      Cc: qemu-stable@nongnu.org
      Cc: Hyman Huang <huangy81@chinatelecom.cn>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Juan Quintela <quintela@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20211130080028.6474-1-peterx@redhat.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      7b0538ed
    • David Hildenbrand's avatar
      memory: Make memory_region_is_mapped() succeed when mapped via an alias · 5ead6218
      David Hildenbrand authored
      
      memory_region_is_mapped() currently does not return "true" when a memory
      region is mapped via an alias.
      
      Assuming we have:
          alias (A0) -> alias (A1) -> region (R0)
      Mapping A0 would currently only make memory_region_is_mapped() succeed
      on A0, but not on A1 and R0.
      
      Let's fix that by adding a "mapped_via_alias" counter to memory regions and
      updating it accordingly when an alias gets (un)mapped.
      
      I am not aware of actual issues, this is rather a cleanup to make it
      consistent.
      
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20211102164317.45658-3-david@redhat.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      5ead6218
    • Philippe Mathieu-Daudé's avatar
      memory: Have 'info mtree' remove duplicated Address Space information · 7bdbf99a
      Philippe Mathieu-Daudé authored
      Per Peter Maydell [*]:
      
        'info mtree' monitor command was designed on the assumption that
        there's really only one or two interesting address spaces, and
        with more recent developments that's just not the case any more.
      
      Similarly about how the FlatView are sorted using a GHashTable,
      sort the AddressSpace objects to remove the duplications (AS
      using the same root MemoryRegion).
      
      This drastically reduces the output of 'info mtree' on some boards.
      
      Before:
      
        $ (echo info mtree; echo q) \
          | qemu-system-aarch64 -S -monitor stdio -M raspi3b \
          | wc -l
        423
      
      After:
      
        $ (echo info mtree; echo q) \
          | qemu-system-aarch64 -S -monitor stdio -M raspi3b \
          | wc -l
        106
      
        (qemu) info mtree
        address-space: I/O
          0000000000000000-000000000000ffff (prio 0, i/o): io
      
        address-space: cpu-memory-0
        address-space: cpu-memory-1
        address-space: cpu-memory-2
        address-space: cpu-memory-3
        address-space: cpu-secure-memory-0
        address-space: cpu-secure-memory-1
        address-space: cpu-secure-memory-2
        address-space: cpu-secure-memory-3
        address-space: memory
          0000000000000000-ffffffffffffffff (prio 0, i/o): system
            0000000000000000-000000003fffffff (prio 0, ram): ram
            000000003f000000-000000003fffffff (prio 1, i/o): bcm2835-peripherals
              000000003f003000-000000003f00301f (prio 0, i/o): bcm2835-sys-timer
              000000003f004000-000000003f004fff (prio -1000, i/o): bcm2835-txp
              000000003f006000-000000003f006fff (prio 0, i/o): mphi
              000000003f007000-000000003f007fff (prio 0, i/o): bcm2835-dma
              000000003f00b200-000000003f00b3ff (prio 0, i/o): bcm2835-ic
              000000003f00b400-000000003f00b43f (prio -1000, i/o): bcm2835-sp804
              000000003f00b800-000000003f00bbff (prio 0, i/o): bcm2835-mbox
              000000003f100000-000000003f1001ff (prio 0, i/o): bcm2835-powermgt
              000000003f101000-000000003f102fff (prio 0, i/o): bcm2835-cprman
              000000003f104000-000000003f10400f (prio 0, i/o): bcm2835-rng
              000000003f200000-000000003f200fff (prio 0, i/o): bcm2835_gpio
              000000003f201000-000000003f201fff (prio 0, i/o): pl011
              000000003f202000-000000003f202fff (prio 0, i/o): bcm2835-sdhost
              000000003f203000-000000003f2030ff (prio -1000, i/o): bcm2835-i2s
              000000003f204000-000000003f20401f (prio -1000, i/o): bcm2835-spi0
              000000003f205000-000000003f20501f (prio -1000, i/o): bcm2835-i2c0
              000000003f20f000-000000003f20f07f (prio -1000, i/o): bcm2835-otp
              000000003f212000-000000003f212007 (prio 0, i/o): bcm2835-thermal
              000000003f214000-000000003f2140ff (prio -1000, i/o): bcm2835-spis
              000000003f215000-000000003f2150ff (prio 0, i/o): bcm2835-aux
              000000003f300000-000000003f3000ff (prio 0, i/o): sdhci
              000000003f600000-000000003f6000ff (prio -1000, i/o): bcm2835-smi
              000000003f804000-000000003f80401f (prio -1000, i/o): bcm2835-i2c1
              000000003f805000-000000003f80501f (prio -1000, i/o): bcm2835-i2c2
              000000003f900000-000000003f907fff (prio -1000, i/o): bcm2835-dbus
              000000003f910000-000000003f917fff (prio -1000, i/o): bcm2835-ave0
              000000003f980000-000000003f990fff (prio 0, i/o): dwc2
                000000003f980000-000000003f980fff (prio 0, i/o): dwc2-io
                000000003f981000-000000003f990fff (prio 0, i/o): dwc2-fifo
              000000003fc00000-000000003fc00fff (prio -1000, i/o): bcm2835-v3d
              000000003fe00000-000000003fe000ff (prio -1000, i/o): bcm2835-sdramc
              000000003fe05000-000000003fe050ff (prio 0, i/o): bcm2835-dma-chan15
            0000000040000000-00000000400000ff (prio 0, i/o): bcm2836-control
      
        address-space: bcm2835-dma-memory
        address-space: bcm2835-fb-memory
        address-space: bcm2835-property-memory
        address-space: dwc2
          0000000000000000-00000000ffffffff (prio 0, i/o): bcm2835-gpu
            0000000000000000-000000003fffffff (prio 0, ram): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff
            0000000040000000-000000007fffffff (prio 0, ram): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff
            000000007e000000-000000007effffff (prio 1, i/o): alias bcm2835-peripherals @bcm2835-peripherals 0000000000000000-0000000000ffffff
            0000000080000000-00000000bfffffff (prio 0, ram): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff
            00000000c0000000-00000000ffffffff (prio 0, ram): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff
      
        address-space: bcm2835-mbox-memory
          0000000000000000-000000000000008f (prio 0, i/o): bcm2835-mbox
            0000000000000010-000000000000001f (prio 0, i/o): bcm2835-fb
            0000000000000080-000000000000008f (prio 0, i/o): bcm2835-property
      
        memory-region: ram
          0000000000000000-000000003fffffff (prio 0, ram): ram
      
        memory-region: bcm2835-peripherals
          000000003f000000-000000003fffffff (prio 1, i/o): bcm2835-peripherals
            000000003f003000-000000003f00301f (prio 0, i/o): bcm2835-sys-timer
            000000003f004000-000000003f004fff (prio -1000, i/o): bcm2835-txp
            000000003f006000-000000003f006fff (prio 0, i/o): mphi
            000000003f007000-000000003f007fff (prio 0, i/o): bcm2835-dma
            000000003f00b200-000000003f00b3ff (prio 0, i/o): bcm2835-ic
            000000003f00b400-000000003f00b43f (prio -1000, i/o): bcm2835-sp804
            000000003f00b800-000000003f00bbff (prio 0, i/o): bcm2835-mbox
            000000003f100000-000000003f1001ff (prio 0, i/o): bcm2835-powermgt
            000000003f101000-000000003f102fff (prio 0, i/o): bcm2835-cprman
            000000003f104000-000000003f10400f (prio 0, i/o): bcm2835-rng
            000000003f200000-000000003f200fff (prio 0, i/o): bcm2835_gpio
            000000003f201000-000000003f201fff (prio 0, i/o): pl011
            000000003f202000-000000003f202fff (prio 0, i/o): bcm2835-sdhost
            000000003f203000-000000003f2030ff (prio -1000, i/o): bcm2835-i2s
            000000003f204000-000000003f20401f (prio -1000, i/o): bcm2835-spi0
            000000003f205000-000000003f20501f (prio -1000, i/o): bcm2835-i2c0
            000000003f20f000-000000003f20f07f (prio -1000, i/o): bcm2835-otp
            000000003f212000-000000003f212007 (prio 0, i/o): bcm2835-thermal
            000000003f214000-000000003f2140ff (prio -1000, i/o): bcm2835-spis
            000000003f215000-000000003f2150ff (prio 0, i/o): bcm2835-aux
            000000003f300000-000000003f3000ff (prio 0, i/o): sdhci
            000000003f600000-000000003f6000ff (prio -1000, i/o): bcm2835-smi
            000000003f804000-000000003f80401f (prio -1000, i/o): bcm2835-i2c1
            000000003f805000-000000003f80501f (prio -1000, i/o): bcm2835-i2c2
            000000003f900000-000000003f907fff (prio -1000, i/o): bcm2835-dbus
            000000003f910000-000000003f917fff (prio -1000, i/o): bcm2835-ave0
            000000003f980000-000000003f990fff (prio 0, i/o): dwc2
              000000003f980000-000000003f980fff (prio 0, i/o): dwc2-io
              000000003f981000-000000003f990fff (prio 0, i/o): dwc2-fifo
            000000003fc00000-000000003fc00fff (prio -1000, i/o): bcm2835-v3d
            000000003fe00000-000000003fe000ff (prio -1000, i/o): bcm2835-sdramc
            000000003fe05000-000000003fe050ff (prio 0, i/o): bcm2835-dma-chan15
      
        (qemu) q
      
      [*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg829821.html
      
      
      
      Suggested-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Message-Id: <20210904231101.1071929-2-philmd@redhat.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      7bdbf99a
    • Philippe Mathieu-Daudé's avatar
      memory: Split mtree_info() as mtree_info_flatview() + mtree_info_as() · 670c0780
      Philippe Mathieu-Daudé authored
      
      While mtree_info() handles both ASes and flatviews cases,
      the two cases share basically no code. Split mtree_info()
      as mtree_info_flatview() + mtree_info_as() to simplify.
      
      Suggested-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Message-Id: <20210904231101.1071929-2-philmd@redhat.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      670c0780
    • Philippe Mathieu-Daudé's avatar
      memory: Directly dispatch alias accesses on origin memory region · 1a59bdba
      Philippe Mathieu-Daudé authored
      
      Since commit 2cdfcf27 ("memory: assign MemoryRegionOps to all
      regions"), all newly created regions are assigned with
      unassigned_mem_ops (which might be then overwritten).
      
      When using aliased container regions, and there is no region mapped
      at address 0 in the container, the memory_region_dispatch_read()
      and memory_region_dispatch_write() calls incorrectly return the
      container unassigned_mem_ops, because the alias offset is not used.
      
      Consider the following setup:
      
          +--------------------+ < - - - - - - - - - - - +
          |     Container      |  mr
          |  (unassigned_mem)  |                         |
          |                    |
          |                    |                         |
          |                    |  alias_offset
          +                    + <- - - - - - +----------+---------+
          | +----------------+ |              |                    |
          | |  MemoryRegion0 | |              |                    |
          | +----------------+ |              |       Alias        |  addr1
          | |  MemoryRegion1 | | <~ ~  ~  ~ ~ |                    | <~~~~~~
          | +----------------+ |              |                    |
          |                    |              +--------------------+
          |                    |
          |                    |
          |                    |
          |                    |
          | +----------------+ |
          | |  MemoryRegionX | |
          | +----------------+ |
          | |  MemoryRegionY | |
          | +----------------+ |
          | |  MemoryRegionZ | |
          | +----------------+ |
          +--------------------+
      
      The memory_region_init_alias() flow is:
      
        memory_region_init_alias()
        -> memory_region_init()
           -> object_initialize(TYPE_MEMORY_REGION)
              -> memory_region_initfn()
                 -> mr->ops = &unassigned_mem_ops;
      
      Later when accessing offset=addr1 via the alias, we expect to hit
      MemoryRegion1. The memory_region_dispatch_read() flow is:
      
        memory_region_dispatch_read(addr1)
        -> memory_region_access_valid(mr)   <- addr1 offset is ignored
           -> mr->ops->valid.accepts()
              -> unassigned_mem_accepts()
              <- false
           <- false
         <- MEMTX_DECODE_ERROR
      
      The caller gets a MEMTX_DECODE_ERROR while the access is OK.
      
      Fix by dispatching aliases recursively, accessing its origin region
      after adding the alias offset.
      
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20210418055708.820980-1-f4bug@amsat.org>
      1a59bdba
  10. Jan 14, 2022
  11. Jan 08, 2022
  12. Jan 05, 2022
  13. Dec 31, 2021
    • Yanan Wang's avatar
      hw/core/machine: Introduce CPU cluster topology support · 864c3b5c
      Yanan Wang authored
      
      The new Cluster-Aware Scheduling support has landed in Linux 5.16,
      which has been proved to benefit the scheduling performance (e.g.
      load balance and wake_affine strategy) on both x86_64 and AArch64.
      
      So now in Linux 5.16 we have four-level arch-neutral CPU topology
      definition like below and a new scheduler level for clusters.
      struct cpu_topology {
          int thread_id;
          int core_id;
          int cluster_id;
          int package_id;
          int llc_id;
          cpumask_t thread_sibling;
          cpumask_t core_sibling;
          cpumask_t cluster_sibling;
          cpumask_t llc_sibling;
      }
      
      A cluster generally means a group of CPU cores which share L2 cache
      or other mid-level resources, and it is the shared resources that
      is used to improve scheduler's behavior. From the point of view of
      the size range, it's between CPU die and CPU core. For example, on
      some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
      and 4 CPU cores in each cluster. The 4 CPU cores share a separate
      L2 cache and a L3 cache tag, which brings cache affinity advantage.
      
      In virtualization, on the Hosts which have pClusters (physical
      clusters), if we can design a vCPU topology with cluster level for
      guest kernel and have a dedicated vCPU pinning. A Cluster-Aware
      Guest kernel can also make use of the cache affinity of CPU clusters
      to gain similar scheduling performance.
      
      This patch adds infrastructure for CPU cluster level topology
      configuration and parsing, so that the user can specify cluster
      parameter if their machines support it.
      
      Signed-off-by: default avatarYanan Wang <wangyanan55@huawei.com>
      Message-Id: <20211228092221.21068-3-wangyanan55@huawei.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      [PMD: Added '(since 7.0)' to @clusters in qapi/machine.json]
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      864c3b5c
Loading