Skip to content
Snippets Groups Projects
  1. Sep 20, 2023
  2. Sep 19, 2023
    • Stefan Hajnoczi's avatar
      Merge tag 'mem-2023-09-19' of https://github.com/davidhildenbrand/qemu into staging · 49076448
      Stefan Hajnoczi authored
      Hi,
      
      "Host Memory Backends" and "Memory devices" queue ("mem"):
      - Support and document VM templating with R/O files using a new "rom"
        parameter for memory-backend-file
      - Some cleanups and fixes around NVDIMMs and R/O file handling for guest
        RAM
      - Optimize ioeventfd updates by skipping address spaces that are not
        applicable
      
      # -----BEGIN PGP SIGNATURE-----
      #
      # iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmUJdykRHGRhdmlkQHJl
      # ZGhhdC5jb20ACgkQTd4Q9wD/g1pf2w//akOUoYMuamySGjXtKLVyMKZkjIys+Ama
      # k2C0xzsWAHBP572ezwHi8uxf5j9kzAjsw6GxDZ7FAamD9MhiohkEvkecloBx6f/c
      # q3fVHblBNkG7v2urtf4+6PJtJvhzOST2SFXfWeYhO/vaA04AYCDgexv82JN3gA6B
      # OS8WyOX62b8wILPSY2GLZ8IqpE9XnOYZwzVBn6YB1yo7ZkYEfXO6cA8nykNuNcOE
      # vppqDo7uVIX6317FWj8ygxmzFfOaj0WT2MT2XFzEIDfg8BInQN8HC4mTn0hcVKMa
      # N1y+eZH733CQKT+uNBRZ5YOeljOi4d6gEEyvkkA/L7e5D3Qg9hIdvHb4uryCFSWX
      # Vt07OP1XLBwCZFobOC6sg+2gtTZJxxYK89e6ZzEd0454S24w5bnEteRAaCGOP0XL
      # ww9xYULqhtZs55UC4rvZHJwdUAk1fIY4VqynwkeQXegvz6BxedNeEkJiiEU0Tizx
      # N2VpsxAJ7H/LLSFeZoCRESo4azrH6U4n7S/eS1tkCniFqibfe2yIQCDoJVfb42ec
      # gfg/vThCrDwHkIHzkMmoV8NndA7Q7SIkyMfYeEEBeZMeg8JzYll4DJEw/jQCacxh
      # KRUa+AZvGlTJUq0mkvyOVfLki+iaehoIUuY1yvMrmdWijPO8n3YybmP9Ljhr8VdR
      # 9MSYZe+I2v8=
      # =iraT
      # -----END PGP SIGNATURE-----
      # gpg: Signature made Tue 19 Sep 2023 06:25:45 EDT
      # gpg:                using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
      # gpg:                issuer "david@redhat.com"
      # gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
      # gpg:                 aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full]
      # gpg:                 aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown]
      # gpg: WARNING: The key's User ID is not certified with a trusted signature!
      # gpg:          There is no indication that the signature belongs to the owner.
      # Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D  FCCA 4DDE 10F7 00FF 835A
      
      * tag 'mem-2023-09-19' of https://github.com/davidhildenbrand/qemu
      
      :
        memory: avoid updating ioeventfds for some address_space
        machine: Improve error message when using default RAM backend id
        softmmu/physmem: Hint that "readonly=on,rom=off" exists when opening file R/W for private mapping fails
        docs: Start documenting VM templating
        docs: Don't mention "-mem-path" in multi-process.rst
        softmmu/physmem: Never return directories from file_ram_open()
        softmmu/physmem: Fail creation of new files in file_ram_open() with readonly=true
        softmmu/physmem: Bail out early in ram_block_discard_range() with readonly files
        softmmu/physmem: Remap with proper protection in qemu_ram_remap()
        backends/hostmem-file: Add "rom" property to support VM templating with R/O files
        softmmu/physmem: Distinguish between file access mode and mmap protection
        nvdimm: Reject writing label data to ROM instead of crashing QEMU
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      49076448
    • Stefan Hajnoczi's avatar
      Merge tag 'firmware/edk2-20230918-pull-request' of https://gitlab.com/kraxel/qemu into staging · 1361bba5
      Stefan Hajnoczi authored
      edk2: update to edk2-stable202308
      
      v2: include acpi test data updates
      
      # -----BEGIN PGP SIGNATURE-----
      #
      # iQIzBAABCgAdFiEEoDKM/7k6F6eZAf59TLbY7tPocTgFAmUIUYUACgkQTLbY7tPo
      # cTiPgQ/9Hfn4ooawA2k7i4KB5mAdNMhG1TYmR05hjIPur8S+UBhfHx3Qdv/lojzr
      # 9hRkXsi3CpV8E/t7sA/ZUVbc17ukBrJvL2VbW1nGqPZytiNqmU/2HOZEd88WByyg
      # O1UYg9FZ1JbrqVbFkrE7Y0CHJmrr4EDWRxEGd7ITPDbR4UEuiQUm7+TeHIbQFCll
      # T5vNxkCBP6smY9n/OEMZHX964D7906pBflHSjzpLPV/mXBrlM/rDNtPXA6dcIquh
      # cCOndACPpenM8ngtgbW2gvDkkflXv4gtLozJR8XE8O434HmCviUjcxGW6L7nelcZ
      # +madon48CZ/5AJUvC09R3xuzWHOBuLOn21O3ooprnCBFWAgCtaMEDWwNbgf1Pig3
      # PgwOd1HeiQTKRuNCFDwNX1GJRN7Cyq6tY+ALQal3glDmWEMiyihUHViSsqux3c01
      # RAkyyOJAMOZ6+MbZ4HMWNVI9pKRTYY7IDxg3NWSvlCD3KmDuDt8YBuQftZMN+T8X
      # yMSa1wQda7ATlrsjUZL5LsEYO3qkho4ybffiFFDVz8QR/sO0TQg9uw6mggIghLAh
      # GsSUE9SpVZmu+1lZYV/+/KomGeyNlhfchgIVPApMLQS3j0kDgVeNsrsjfbDgCqsn
      # q3Ame+Roul54cv437F02ugt6JoxP76gNXXn8KdZPIDqOHWxMeS0=
      # =Grjx
      # -----END PGP SIGNATURE-----
      # gpg: Signature made Mon 18 Sep 2023 09:32:53 EDT
      # gpg:                using RSA key A0328CFFB93A17A79901FE7D4CB6D8EED3E87138
      # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
      # gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
      # gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
      # Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138
      
      * tag 'firmware/edk2-20230918-pull-request' of https://gitlab.com/kraxel/qemu
      
      :
        tests/acpi: disallow virt/SSDT.memhp updates
        tests/acpi: update virt/SSDT.memhp
        edk2: update binaries to edk2-stable202308
        edk2: update submodule to edk2-stable202308
        edk2: workaround edk-stable202308 bug
        edk2: update build config
        edk2: update build script
        tests/acpi: allow virt/SSDT.memhp updates
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      1361bba5
    • Stefan Hajnoczi's avatar
      Merge tag 'pull-ppc-20230918' of https://gitlab.com/danielhb/qemu into staging · 6a0eddb3
      Stefan Hajnoczi authored
      ppc patch queue for 2023-09-18:
      
      In this short queue we're making two important changes:
      
      - Nicholas Piggin is now the qemu-ppc maintainer. Cédric Le Goater and
      Daniel Barboza will act as backup during Nick's transition to this new
      role.
      
      - Support for NVIDIA V100 GPU with NVLink2 is dropped from qemu-ppc.
      Linux removed the same support back in 5.13, we're following suit now.
      
      A xive Coverity fix is also included.
      
      # -----BEGIN PGP SIGNATURE-----
      #
      # iIwEABYKADQWIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCZQhPnBYcZGFuaWVsaGI0
      # MTNAZ21haWwuY29tAAoJEDzZypbeAzFk5QUBAJJNnCtv/SPP6bQVNGMgtfI9sz2z
      # MEttDa7SINyLCiVxAP0Y9z8ZHEj6vhztTX0AAv2QubCKWIVbJZbPV5RWrHCEBQ==
      # =y3nh
      # -----END PGP SIGNATURE-----
      # gpg: Signature made Mon 18 Sep 2023 09:24:44 EDT
      # gpg:                using EDDSA key 17EBFF9923D01800AF2838193CD9CA96DE033164
      # gpg:                issuer "danielhb413@gmail.com"
      # gpg: Good signature from "Daniel Henrique Barboza <danielhb413@gmail.com>" [unknown]
      # gpg: WARNING: The key's User ID is not certified with a trusted signature!
      # gpg:          There is no indication that the signature belongs to the owner.
      # Primary key fingerprint: 17EB FF99 23D0 1800 AF28  3819 3CD9 CA96 DE03 3164
      
      * tag 'pull-ppc-20230918' of https://gitlab.com/danielhb/qemu
      
      :
        spapr: Remove support for NVIDIA V100 GPU with NVLink2
        ppc/xive: Fix uint32_t overflow
        MAINTAINERS: Nick Piggin PPC maintainer, other PPC changes
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      6a0eddb3
    • Stefan Hajnoczi's avatar
      Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging · dd0c8498
      Stefan Hajnoczi authored
      # -----BEGIN PGP SIGNATURE-----
      # Version: GnuPG v1
      #
      # iQEcBAABAgAGBQJlB/SLAAoJEO8Ells5jWIR7EQH/1kAbxHcSGJXDOgQAXJ/rOZi
      # UKn3ugJzD0Hxd4Xz8cvdVLM+9/JoEEOK1uB+NIG7Ask/gA5D7eUYzaLtp1OJ8VNO
      # mamfKmn3EIBWJoLSHH19TKzfW2tGMJHQ0Nj+sbDQRkK5f2c7hwLTRXa1EmlJd4dB
      # VoVzX4OiJtrQyv4OVmpP/PSETXJDvYYX/DNcRl9/3ccKtQW/wVDI3YzrMzXrsgyc
      # w9ItJi8k+19mVH6RgQwciqRvTbVMdzkOxqvU//LY0TxnjsHfbyHr+KlNAa2WTY2N
      # QgpAlMZhHqUG6/XXAs0o2VEtA66zmw932Xfy/CZUEcdGWfkG/9CEVfbuT4CKGY4=
      # =tF7K
      # -----END PGP SIGNATURE-----
      # gpg: Signature made Mon 18 Sep 2023 02:56:11 EDT
      # gpg:                using RSA key EF04965B398D6211
      # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [full]
      # Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211
      
      * tag 'net-pull-request' of https://github.com/jasowang/qemu
      
      :
        net/tap: Avoid variable-length array
        net/dump: Avoid variable length array
        hw/net/rocker: Avoid variable length array
        hw/net/fsl_etsec/rings.c: Avoid variable length array
        net: add initial support for AF_XDP network backend
        tests: bump libvirt-ci for libasan and libxdp
        e1000e: rename e1000e_ba_state and e1000e_write_hdr_to_rx_buffers
        igb: packet-split descriptors support
        igb: add IPv6 extended headers traffic detection
        igb: RX payload guest writting refactoring
        igb: RX descriptors guest writting refactoring
        igb: rename E1000E_RingInfo_st
        igb: remove TCP ACK detection
        virtio-net: Add support for USO features
        virtio-net: Add USO flags to vhost support.
        tap: Add check for USO features
        tap: Add USO support to tap device.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      dd0c8498
    • Stefan Hajnoczi's avatar
      Merge tag 'pull-tcg-20230915-2' of https://gitlab.com/rth7680/qemu into staging · d7754940
      Stefan Hajnoczi authored
      *: Delete checks for old host definitions
      tcg/loongarch64: Generate LSX instructions
      fpu: Add conversions between bfloat16 and [u]int8
      fpu: Handle m68k extended precision denormals properly
      accel/tcg: Improve cputlb i/o organization
      accel/tcg: Simplify tlb_plugin_lookup
      accel/tcg: Remove false-negative halted assertion
      tcg: Add gvec compare with immediate and scalar operand
      tcg/aarch64: Emit BTI insns at jump landing pads
      
      [Resolved conflict between CPUINFO_PMULL and CPUINFO_BTI.
      --Stefan]
      
      * tag 'pull-tcg-20230915-2' of https://gitlab.com/rth7680/qemu
      
      : (39 commits)
        tcg: Map code_gen_buffer with PROT_BTI
        tcg/aarch64: Emit BTI insns at jump landing pads
        util/cpuinfo-aarch64: Add CPUINFO_BTI
        tcg: Add tcg_out_tb_start backend hook
        fpu: Handle m68k extended precision denormals properly
        fpu: Add conversions between bfloat16 and [u]int8
        accel/tcg: Introduce do_st16_mmio_leN
        accel/tcg: Introduce do_ld16_mmio_beN
        accel/tcg: Merge io_writex into do_st_mmio_leN
        accel/tcg: Merge io_readx into do_ld_mmio_beN
        accel/tcg: Replace direct use of io_readx/io_writex in do_{ld,st}_1
        accel/tcg: Merge cpu_transaction_failed into io_failed
        plugin: Simplify struct qemu_plugin_hwaddr
        accel/tcg: Use CPUTLBEntryFull.phys_addr in io_failed
        accel/tcg: Split out io_prepare and io_failed
        accel/tcg: Simplify tlb_plugin_lookup
        target/arm: Use tcg_gen_gvec_cmpi for compare vs 0
        tcg: Add gvec compare with immediate and scalar operand
        tcg/loongarch64: Implement 128-bit load & store
        tcg/loongarch64: Lower rotli_vec to vrotri
        ...
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      d7754940
    • hongmianquan's avatar
      memory: avoid updating ioeventfds for some address_space · 544cff46
      hongmianquan authored
      
      When updating ioeventfds, we need to iterate all address spaces,
      but some address spaces do not register eventfd_add|del call when
      memory_listener_register() and they do nothing when updating ioeventfds.
      So we can skip these AS in address_space_update_ioeventfds().
      
      The overhead of memory_region_transaction_commit() can be significantly
      reduced. For example, a VM with 8 vhost net devices and each one has
      64 vectors, can reduce the time spent on memory_region_transaction_commit by 20%.
      
      Message-ID: <20230830032906.12488-1-hongmianquan@bytedance.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarhongmianquan <hongmianquan@bytedance.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      544cff46
    • David Hildenbrand's avatar
      machine: Improve error message when using default RAM backend id · 41ddcd23
      David Hildenbrand authored
      
      For migration purposes, users might want to reuse the default RAM
      backend id, but specify a different memory backend.
      
      For example, to reuse "pc.ram" on q35, one has to set
          -machine q35,memory-backend=pc.ram
      Only then, can a memory backend with the id "pc.ram" be created
      manually.
      
      Let's improve the error message by improving the hint. Use
      error_append_hint() -- which in turn requires ERRP_GUARD().
      
      Message-ID: <20230906120503.359863-12-david@redhat.com>
      Suggested-by: default avatarThinerLogoer <logoerthiner1@163.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Tested-by: default avatarMario Casquero <mcasquer@redhat.com>
      Reviewed-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      41ddcd23
    • David Hildenbrand's avatar
      softmmu/physmem: Hint that "readonly=on,rom=off" exists when opening file R/W... · 6da4b1c2
      David Hildenbrand authored
      softmmu/physmem: Hint that "readonly=on,rom=off" exists when opening file R/W for private mapping fails
      
      It's easy to miss that memory-backend-file with "share=off" (default)
      will always try opening the file R/W as default, and fail if we don't
      have write permissions to the file.
      
      In that case, the user has to explicit specify "readonly=on,rom=off" to
      get usable RAM, for example, for VM templating.
      
      Let's hint that '-object memory-backend-file,readonly=on,rom=off,...'
      exists to consume R/O files in a private mapping to create writable RAM,
      but only if we have permissions to open the file read-only.
      
      Message-ID: <20230906120503.359863-11-david@redhat.com>
      Suggested-by: default avatarThinerLogoer <logoerthiner1@163.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      6da4b1c2
    • David Hildenbrand's avatar
      docs: Start documenting VM templating · 9cd9313f
      David Hildenbrand authored
      
      Let's add some details about VM templating, focusing on the VM memory
      configuration only.
      
      There is much more to VM templating (VM state? block devices?), but I leave
      that as future work.
      
      Message-ID: <20230906120503.359863-10-david@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      9cd9313f
    • David Hildenbrand's avatar
      docs: Don't mention "-mem-path" in multi-process.rst · 9e6180d2
      David Hildenbrand authored
      
      "-mem-path" corresponds to "memory-backend-file,share=off" and,
      therefore, creates a private COW mapping of the file. For multi-proces
      QEMU, we need proper shared file-backed memory.
      
      Let's make that clearer.
      
      Message-ID: <20230906120503.359863-9-david@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      9e6180d2
    • David Hildenbrand's avatar
      softmmu/physmem: Never return directories from file_ram_open() · ca01f1b8
      David Hildenbrand authored
      
      open() does not fail on directories when opening them readonly (O_RDONLY).
      
      Currently, we succeed opening such directories and fail later during
      mmap(), resulting in a misleading error message.
      
      $ ./qemu-system-x86_64 \
          -object memory-backend-file,id=ram0,mem-path=tmp,readonly=true,size=1g
       qemu-system-x86_64: unable to map backing store for guest RAM: No such device
      
      To identify directories and handle them accordingly in file_ram_open()
      also when readonly=true was specified, detect if we just opened a directory
      using fstat() instead. Then, fail file_ram_open() right away, similarly
      to how we now fail if the file does not exist and we want to open the
      file readonly.
      
      With this change, we get a nicer error message:
       qemu-system-x86_64: can't open backing store tmp for guest RAM: Is a directory
      
      Note that the only memory-backend-file will end up calling
      memory_region_init_ram_from_file() -> qemu_ram_alloc_from_file() ->
      file_ram_open().
      
      Message-ID: <20230906120503.359863-8-david@redhat.com>
      Reported-by: default avatarThiner Logoer <logoerthiner1@163.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Tested-by: default avatarMario Casquero <mcasquer@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      ca01f1b8
    • David Hildenbrand's avatar
      softmmu/physmem: Fail creation of new files in file_ram_open() with readonly=true · 4d6b23f7
      David Hildenbrand authored
      
      Currently, if a file does not exist yet, file_ram_open() will create new
      empty file and open it writable. However, it even does that when
      readonly=true was specified.
      
      Specifying O_RDONLY instead to create a new readonly file would
      theoretically work, however, ftruncate() will refuse to resize the new
      empty file and we'll get a warning:
          ftruncate: Invalid argument
      And later eventually more problems when actually mmap'ing that file and
      accessing it.
      
      If someone intends to let QEMU open+mmap a file read-only, better
      create+resize+fill that file ahead of time outside of QEMU context.
      
      We'll now fail with:
      ./qemu-system-x86_64 \
          -object memory-backend-file,id=ram0,mem-path=tmp,readonly=true,size=1g
      qemu-system-x86_64: can't open backing store tmp for guest RAM: No such file or directory
      
      All use cases of readonly files (R/O NVDIMMs, VM templating) work on
      existing files, so silently creating new files might just hide user
      errors when accidentally specifying a non-existent file.
      
      Note that the only memory-backend-file will end up calling
      memory_region_init_ram_from_file() -> qemu_ram_alloc_from_file() ->
      file_ram_open().
      
      Move error reporting to the single caller.
      
      Message-ID: <20230906120503.359863-7-david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      4d6b23f7
    • David Hildenbrand's avatar
      softmmu/physmem: Bail out early in ram_block_discard_range() with readonly files · b2cccb52
      David Hildenbrand authored
      
      fallocate() will fail, let's print a nicer error message.
      
      Message-ID: <20230906120503.359863-6-david@redhat.com>
      Suggested-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      b2cccb52
    • David Hildenbrand's avatar
      softmmu/physmem: Remap with proper protection in qemu_ram_remap() · 9e6b9f37
      David Hildenbrand authored
      
      Let's remap with the proper protection that we can derive from
      RAM_READONLY.
      
      Message-ID: <20230906120503.359863-5-david@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      9e6b9f37
    • David Hildenbrand's avatar
      backends/hostmem-file: Add "rom" property to support VM templating with R/O files · e92666b0
      David Hildenbrand authored
      
      For now, "share=off,readonly=on" would always result in us opening the
      file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
      turning it into ROM.
      
      Especially for VM templating, "share=off" is a common use case. However,
      that use case is impossible with files that lack write permissions,
      because "share=off,readonly=on" will not give us writable RAM.
      
      The sole user of ROM via memory-backend-file are R/O NVDIMMs, but as we
      have users (Kata Containers) that rely on the existing behavior --
      malicious VMs should not be able to consume COW memory for R/O NVDIMMs --
      we cannot change the semantics of "share=off,readonly=on"
      
      So let's add a new "rom" property with on/off/auto values. "auto" is
      the default and what most people will use: for historical reasons, to not
      change the old semantics, it defaults to the value of the "readonly"
      property.
      
      For VM templating, one can now use:
          -object memory-backend-file,share=off,readonly=on,rom=off,...
      
      But we'll disallow:
          -object memory-backend-file,share=on,readonly=on,rom=off,...
      because we would otherwise get an error when trying to mmap the R/O file
      shared and writable. An explicit error message is cleaner.
      
      We will also disallow for now:
          -object memory-backend-file,share=off,readonly=off,rom=on,...
          -object memory-backend-file,share=on,readonly=off,rom=on,...
      It's not harmful, but also not really required for now.
      
      Alternatives that were abandoned:
      * Make "unarmed=on" for the NVDIMM set the memory region container
        readonly. We would still see a change of ROM->RAM and possibly run
        into memslot limits with vhost-user. Further, there might be use cases
        for "unarmed=on" that should still allow writing to that memory
        (temporary files, system RAM, ...).
      * Add a new "readonly=on/off/auto" parameter for NVDIMMs. Similar issues
        as with "unarmed=on".
      * Make "readonly" consume "on/off/file" instead of being a 'bool' type.
        This would slightly changes the behavior of the "readonly" parameter:
        values like true/false (as accepted by a 'bool'type) would no longer be
        accepted.
      
      Message-ID: <20230906120503.359863-4-david@redhat.com>
      Acked-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      e92666b0
    • David Hildenbrand's avatar
      softmmu/physmem: Distinguish between file access mode and mmap protection · 5c52a219
      David Hildenbrand authored
      
      There is a difference between how we open a file and how we mmap it,
      and we want to support writable private mappings of readonly files. Let's
      define RAM_READONLY and RAM_READONLY_FD flags, to replace the single
      "readonly" parameter for file-related functions.
      
      In memory_region_init_ram_from_fd() and memory_region_init_ram_from_file(),
      initialize mr->readonly based on the new RAM_READONLY flag.
      
      While at it, add some RAM_* flags we missed to add to the list of accepted
      flags in the documentation of some functions.
      
      No change in functionality intended. We'll make use of both flags next
      and start setting them independently for memory-backend-file.
      
      Message-ID: <20230906120503.359863-3-david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      5c52a219
    • David Hildenbrand's avatar
      nvdimm: Reject writing label data to ROM instead of crashing QEMU · 3a125839
      David Hildenbrand authored
      
      Currently, when using a true R/O NVDIMM (ROM memory backend) with a label
      area, the VM can easily crash QEMU by trying to write to the label area,
      because the ROM memory is mmap'ed without PROT_WRITE.
      
          [root@vm-0 ~]# ndctl disable-region region0
          disabled 1 region
          [root@vm-0 ~]# ndctl zero-labels nmem0
          -> QEMU segfaults
      
      Let's remember whether we have a ROM memory backend and properly
      reject the write request:
      
          [root@vm-0 ~]# ndctl disable-region region0
          disabled 1 region
          [root@vm-0 ~]# ndctl zero-labels nmem0
          zeroed 0 nmem
      
      In comparison, on a system with a R/W NVDIMM:
      
          [root@vm-0 ~]# ndctl disable-region region0
          disabled 1 region
          [root@vm-0 ~]# ndctl zero-labels nmem0
          zeroed 1 nmem
      
      For ACPI, just return "unsupported", like if no label exists. For spapr,
      return "H_P2", similar to when no label area exists.
      
      Could we rely on the "unarmed" property? Maybe, but it looks cleaner to
      only disallow what certainly cannot work.
      
      After all "unarmed=on" primarily means: cannot accept persistent writes. In
      theory, there might be setups where devices with "unarmed=on" set could
      be used to host non-persistent data (temporary files, system RAM, ...); for
      example, in Linux, admins can overwrite the "readonly" setting and still
      write to the device -- which will work as long as we're not using ROM.
      Allowing writing label data in such configurations can make sense.
      
      Message-ID: <20230906120503.359863-2-david@redhat.com>
      Fixes: dbd730e8 ("nvdimm: check -object memory-backend-file, readonly=on option")
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      3a125839
  3. Sep 18, 2023
    • Stefan Hajnoczi's avatar
      Merge tag 'pull-crypto-20230915' of https://gitlab.com/rth7680/qemu into staging · 13d6b160
      Stefan Hajnoczi authored
      Unify implementation of carry-less multiply.
      Accelerate carry-less multiply for 64x64->128.
      
      # -----BEGIN PGP SIGNATURE-----
      #
      # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmUEiPodHHJpY2hhcmQu
      # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV/akgf/XkiIeErWJr1YXSbS
      # YPQtCsDAfIrqn3RiyQ2uwSn2eeuwVqTFFPGER04YegRDK8dyO874JBfvOwmBT70J
      # I/aU8Z4BbRyNu9nfaCtFMlXQH9KArAKcAds1PnshfcnI5T2yBloZ1sAU97IuJFZk
      # Uuz96H60+ohc4wzaUiPqPhXQStgZeSYwwAJB0s25DhCckdea0udRCAJ1tQTVpxkM
      # wIFef1SHPoM6DtMzFKHLLUH6VivSlHjqx8GqFusa7pVqfQyDzNBfwvDl1F/bkE07
      # yTocQEkV3QnZvIplhqUxAaZXIFZr9BNk7bDimMjHW6z3pNPN3T8zRn4trNjxbgPV
      # jqzAtg==
      # =8nnk
      # -----END PGP SIGNATURE-----
      # gpg: Signature made Fri 15 Sep 2023 12:40:26 EDT
      # gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
      # gpg:                issuer "richard.henderson@linaro.org"
      # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
      # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F
      
      * tag 'pull-crypto-20230915' of https://gitlab.com/rth7680/qemu
      
      :
        host/include/aarch64: Implement clmul.h
        host/include/i386: Implement clmul.h
        target/ppc: Use clmul_64
        target/s390x: Use clmul_64
        target/i386: Use clmul_64
        target/arm: Use clmul_64
        crypto: Add generic 64-bit carry-less multiply routine
        target/ppc: Use clmul_32* routines
        target/s390x: Use clmul_32* routines
        target/arm: Use clmul_32* routines
        crypto: Add generic 32-bit carry-less multiply routines
        target/ppc: Use clmul_16* routines
        target/s390x: Use clmul_16* routines
        target/arm: Use clmul_16* routines
        crypto: Add generic 16-bit carry-less multiply routines
        target/ppc: Use clmul_8* routines
        target/s390x: Use clmul_8* routines
        target/arm: Use clmul_8* routines
        crypto: Add generic 8-bit carry-less multiply routines
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      13d6b160
    • Gerd Hoffmann's avatar
      0ec0767e
    • Gerd Hoffmann's avatar
      tests/acpi: update virt/SSDT.memhp · 5f88dd43
      Gerd Hoffmann authored
      
      The edk2 update caused an address change:
      
       DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x00000001)
       {
           Scope (\_SB)
           {
               Device (NVDR)
               {
                   Name (_HID, "ACPI0012" /* NVDIMM Root Device */)  // _HID: Hardware ID
                   [ ... ]
               }
           }
      
      -    Name (MEMA, 0x43D10000)
      +    Name (MEMA, 0x43C90000)
       }
      
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      5f88dd43
    • Gerd Hoffmann's avatar
      91e01270
    • Gerd Hoffmann's avatar
      edk2: update submodule to edk2-stable202308 · 241f9939
      Gerd Hoffmann authored
      
      New stable release was tagged in August 2023,
      update the edk2 submodule to it.
      
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      241f9939
    • Gerd Hoffmann's avatar
    • Gerd Hoffmann's avatar
      edk2: update build config · b0494f13
      Gerd Hoffmann authored
      
      risc-v switched to use split code/vars images like the other archs.
      
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      b0494f13
    • Gerd Hoffmann's avatar
      edk2: update build script · c28a2891
      Gerd Hoffmann authored
      
      Sync with latest version from gitlab.com/kraxel/edk2-build-config
      
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      c28a2891
    • Gerd Hoffmann's avatar
      3808a058
    • Cédric Le Goater's avatar
      spapr: Remove support for NVIDIA V100 GPU with NVLink2 · 44fa20c9
      Cédric Le Goater authored
      
      NVLink2 support was removed from the PPC PowerNV platform and VFIO in
      Linux 5.13 with commits :
      
        562d1e207d32 ("powerpc/powernv: remove the nvlink support")
        b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")
      
      This was 2.5 years ago. Do the same in QEMU with a revert of commit
      ec132efa ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some
      adjustements are required on the NUMA part.
      
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      Acked-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarCédric Le Goater <clg@redhat.com>
      Message-ID: <20230918091717.149950-1-clg@kaod.org>
      Signed-off-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      44fa20c9
    • Cédric Le Goater's avatar
      ppc/xive: Fix uint32_t overflow · 527b2383
      Cédric Le Goater authored
      
      As reported by Coverity, "idx << xive->pc_shift" is evaluated using
      32-bit arithmetic, and then used in a context expecting a "uint64_t".
      Add a uint64_t cast.
      
      Fixes: Coverity CID 1519049
      Fixes: b68147b7 ("ppc/xive: Add support for the PC MMIOs")
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Reviewed-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Message-ID: <20230914154650.222111-1-clg@kaod.org>
      Signed-off-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      527b2383
    • Daniel Henrique Barboza's avatar
      MAINTAINERS: Nick Piggin PPC maintainer, other PPC changes · 0cbc34dc
      Daniel Henrique Barboza authored
      
      Update all relevant PowerPC entries as follows:
      
      - Nick Piggin is promoted to Maintainer in all qemu-ppc subsystems.
        Nick has  been a solid contributor for the last couple of years and
        has the required knowledge and motivation to drive the boat.
      
      - Greg Kurz is being removed from all qemu-ppc entries. Greg has moved
        to other areas of interest and will retire from qemu-ppc.  Thanks Mr
        Kurz for all the years of service.
      
      - David Gibson was removed as 'Reviewer' from PowerPC TCG CPUs and PPC
        KVM CPUs. Change done per his request.
      
      - Daniel Barboza downgraded from 'Maintainer' to 'Reviewer' in sPAPR and
        PPC KVM CPUs. It has been a long since I last touched those areas and
        it's not justified to be kept as maintainer in them.
      
      - Cedric Le Goater and Daniel Barboza removed as 'Reviewer' in VOF. We
        don't have the required knowledge to justify it.
      
      - VOF support downgraded from 'Maintained' to 'Odd Fixes' since it
        better reflects the current state of the subsystem.
      
      Acked-by: default avatarCédric Le Goater <clg@kaod.org>
      Acked-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Reviewed-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: default avatarHarsh Prateek Bora <harshpb@linux.ibm.com>
      Acked-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Message-ID: <20230915110507.194762-1-danielhb413@gmail.com>
      Signed-off-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      0cbc34dc
    • Peter Maydell's avatar
      net/tap: Avoid variable-length array · 6d7a53e9
      Peter Maydell authored
      
      Use a heap allocation instead of a variable length array in
      tap_receive_iov().
      
      The codebase has very few VLAs, and if we can get rid of them all we
      can make the compiler error on new additions.  This is a defensive
      measure against security bugs where an on-stack dynamic allocation
      isn't correctly size-checked (e.g.  CVE-2021-3527).
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarFrancisco Iglesias <frasse.iglesias@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      6d7a53e9
    • Peter Maydell's avatar
      net/dump: Avoid variable length array · c4cf6819
      Peter Maydell authored
      
      Use a g_autofree heap allocation instead of a variable length
      array in dump_receive_iov().
      
      The codebase has very few VLAs, and if we can get rid of them all we
      can make the compiler error on new additions.  This is a defensive
      measure against security bugs where an on-stack dynamic allocation
      isn't correctly size-checked (e.g.  CVE-2021-3527).
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarFrancisco Iglesias <frasse.iglesias@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      c4cf6819
    • Peter Maydell's avatar
      hw/net/rocker: Avoid variable length array · 12570657
      Peter Maydell authored
      
      Replace an on-stack variable length array in of_dpa_ig() with
      a g_autofree heap allocation.
      
      The codebase has very few VLAs, and if we can get rid of them all we
      can make the compiler error on new additions.  This is a defensive
      measure against security bugs where an on-stack dynamic allocation
      isn't correctly size-checked (e.g.  CVE-2021-3527).
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarFrancisco Iglesias <frasse.iglesias@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      12570657
    • Peter Maydell's avatar
      hw/net/fsl_etsec/rings.c: Avoid variable length array · 2a6cb383
      Peter Maydell authored
      
      In fill_rx_bd() we create a variable length array of size
      etsec->rx_padding. In fact we know that this will never be
      larger than 64 bytes, because rx_padding is set in rx_init_frame()
      in a way that ensures it is only that large. Use a fixed sized
      array and assert that it is big enough.
      
      Since padd[] is now potentially rather larger than the actual
      padding required, adjust the memset() we do on it to match the
      size that we write with cpu_physical_memory_write(), rather than
      clearing the entire array.
      
      The codebase has very few VLAs, and if we can get rid of them all we
      can make the compiler error on new additions.  This is a defensive
      measure against security bugs where an on-stack dynamic allocation
      isn't correctly size-checked (e.g.  CVE-2021-3527).
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      2a6cb383
    • Ilya Maximets's avatar
      net: add initial support for AF_XDP network backend · cb039ef3
      Ilya Maximets authored
      
      AF_XDP is a network socket family that allows communication directly
      with the network device driver in the kernel, bypassing most or all
      of the kernel networking stack.  In the essence, the technology is
      pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
      and works with any network interfaces without driver modifications.
      Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
      require access to character devices or unix sockets.  Only access to
      the network interface itself is necessary.
      
      This patch implements a network backend that communicates with the
      kernel by creating an AF_XDP socket.  A chunk of userspace memory
      is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
      Fill and Completion) are placed in that memory along with a pool of
      memory buffers for the packet data.  Data transmission is done by
      allocating one of the buffers, copying packet data into it and
      placing the pointer into Tx ring.  After transmission, device will
      return the buffer via Completion ring.  On Rx, device will take
      a buffer form a pre-populated Fill ring, write the packet data into
      it and place the buffer into Rx ring.
      
      AF_XDP network backend takes on the communication with the host
      kernel and the network interface and forwards packets to/from the
      peer device in QEMU.
      
      Usage example:
      
        -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
        -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1
      
      XDP program bridges the socket with a network interface.  It can be
      attached to the interface in 2 different modes:
      
      1. skb - this mode should work for any interface and doesn't require
               driver support.  With a caveat of lower performance.
      
      2. native - this does require support from the driver and allows to
                  bypass skb allocation in the kernel and potentially use
                  zero-copy while getting packets in/out userspace.
      
      By default, QEMU will try to use native mode and fall back to skb.
      Mode can be forced via 'mode' option.  To force 'copy' even in native
      mode, use 'force-copy=on' option.  This might be useful if there is
      some issue with the driver.
      
      Option 'queues=N' allows to specify how many device queues should
      be open.  Note that all the queues that are not open are still
      functional and can receive traffic, but it will not be delivered to
      QEMU.  So, the number of device queues should generally match the
      QEMU configuration, unless the device is shared with something
      else and the traffic re-direction to appropriate queues is correctly
      configured on a device level (e.g. with ethtool -N).
      'start-queue=M' option can be used to specify from which queue id
      QEMU should start configuring 'N' queues.  It might also be necessary
      to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
      for examples.
      
      In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
      or CAP_BPF capabilities in order to load default XSK/XDP programs to
      the network interface and configure BPF maps.  It is possible, however,
      to run with no capabilities.  For that to work, an external process
      with enough capabilities will need to pre-load default XSK program,
      create AF_XDP sockets and pass their file descriptors to QEMU process
      on startup via 'sock-fds' option.  Network backend will need to be
      configured with 'inhibit=on' to avoid loading of the program.
      QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
      or CAP_IPC_LOCK.
      
      There are few performance challenges with the current network backends.
      
      First is that they do not support IO threads.  This means that data
      path is handled by the main thread in QEMU and may slow down other
      work or may be slowed down by some other work.  This also means that
      taking advantage of multi-queue is generally not possible today.
      
      Another thing is that data path is going through the device emulation
      code, which is not really optimized for performance.  The fastest
      "frontend" device is virtio-net.  But it's not optimized for heavy
      traffic either, because it expects such use-cases to be handled via
      some implementation of vhost (user, kernel, vdpa).  In practice, we
      have virtio notifications and rcu lock/unlock on a per-packet basis
      and not very efficient accesses to the guest memory.  Communication
      channels between backend and frontend devices do not allow passing
      more than one packet at a time as well.
      
      Some of these challenges can be avoided in the future by adding better
      batching into device emulation or by implementing vhost-af-xdp variant.
      
      There are also a few kernel limitations.  AF_XDP sockets do not
      support any kinds of checksum or segmentation offloading.  Buffers
      are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
      support implementation for AF_XDP is in progress, but not ready yet.
      Also, transmission in all non-zero-copy modes is synchronous, i.e.
      done in a syscall.  That doesn't allow high packet rates on virtual
      interfaces.
      
      However, keeping in mind all of these challenges, current implementation
      of the AF_XDP backend shows a decent performance while running on top
      of a physical NIC with zero-copy support.
      
      Test setup:
      
      2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
      Network backend is configured to open the NIC directly in native mode.
      The driver supports zero-copy.  NIC is configured to use 1 queue.
      
      Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
      for PPS testing.
      
      iperf3 result:
       TCP stream      : 19.1 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
       Tx only         : 3.4 Mpps
       Rx only         : 2.0 Mpps
       L2 FWD Loopback : 1.5 Mpps
      
      In skb mode the same setup shows much lower performance, similar to
      the setup where pair of physical NICs is replaced with veth pair:
      
      iperf3 result:
        TCP stream      : 9 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
        Tx only         : 1.2 Mpps
        Rx only         : 1.0 Mpps
        L2 FWD Loopback : 0.7 Mpps
      
      Results in skb mode or over the veth are close to results of a tap
      backend with vhost=on and disabled segmentation offloading bridged
      with a NIC.
      
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      cb039ef3
Loading