Skip to content
Snippets Groups Projects
  1. Feb 09, 2024
  2. Nov 07, 2023
  3. Nov 06, 2023
    • Maciej S. Szmigiero's avatar
      Add Hyper-V Dynamic Memory Protocol driver (hv-balloon) base · 0d9e8c0b
      Maciej S. Szmigiero authored
      
      This driver is like virtio-balloon on steroids: it allows both changing the
      guest memory allocation via ballooning and (in the next patch) inserting
      pieces of extra RAM into it on demand from a provided memory backend.
      
      The actual resizing is done via ballooning interface (for example, via
      the "balloon" HMP command).
      This includes resizing the guest past its boot size - that is, hot-adding
      additional memory in granularity limited only by the guest alignment
      requirements, as provided by the next patch.
      
      In contrast with ACPI DIMM hotplug where one can only request to unplug a
      whole DIMM stick this driver allows removing memory from guest in single
      page (4k) units via ballooning.
      
      After a VM reboot the guest is back to its original (boot) size.
      
      In the future, the guest boot memory size might be changed on reboot
      instead, taking into account the effective size that VM had before that
      reboot (much like Hyper-V does).
      
      For performance reasons, the guest-released memory is tracked in a few
      range trees, as a series of (start, count) ranges.
      Each time a new page range is inserted into such tree its neighbors are
      checked as candidates for possible merging with it.
      
      Besides performance reasons, the Dynamic Memory protocol itself uses page
      ranges as the data structure in its messages, so relevant pages need to be
      merged into such ranges anyway.
      
      One has to be careful when tracking the guest-released pages, since the
      guest can maliciously report returning pages outside its current address
      space, which later clash with the address range of newly added memory.
      Similarly, the guest can report freeing the same page twice.
      
      The above design results in much better ballooning performance than when
      using virtio-balloon with the same guest: 230 GB / minute with this driver
      versus 70 GB / minute with virtio-balloon.
      
      During a ballooning operation most of time is spent waiting for the guest
      to come up with newly freed page ranges, processing the received ranges on
      the host side (in QEMU and KVM) is nearly instantaneous.
      
      The unballoon operation is also pretty much instantaneous:
      thanks to the merging of the ballooned out page ranges 200 GB of memory can
      be returned to the guest in about 1 second.
      With virtio-balloon this operation takes about 2.5 minutes.
      
      These tests were done against a Windows Server 2019 guest running on a
      Xeon E5-2699, after dirtying the whole memory inside guest before each
      balloon operation.
      
      Using a range tree instead of a bitmap to track the removed memory also
      means that the solution scales well with the guest size: even a 1 TB range
      takes just a few bytes of such metadata.
      
      Since the required GTree operations aren't present in every Glib version
      a check for them was added to the meson build script, together with new
      "--enable-hv-balloon" and "--disable-hv-balloon" configure arguments.
      If these GTree operations are missing in the system's Glib version this
      driver will be skipped during QEMU build.
      
      An optional "status-report=on" device parameter requests memory status
      events from the guest (typically sent every second), which allow the host
      to learn both the guest memory available and the guest memory in use
      counts.
      
      Following commits will add support for their external emission as
      "HV_BALLOON_STATUS_REPORT" QMP events.
      
      The driver is named hv-balloon since the Linux kernel client driver for
      the Dynamic Memory Protocol is named as such and to follow the naming
      pattern established by the virtio-balloon driver.
      The whole protocol runs over Hyper-V VMBus.
      
      The driver was tested against Windows Server 2012 R2, Windows Server 2016
      and Windows Server 2019 guests and obeys the guest alignment requirements
      reported to the host via DM_CAPABILITIES_REPORT message.
      
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      0d9e8c0b
  4. Oct 18, 2023
    • Paolo Bonzini's avatar
      configure, meson: use command line options to configure qemu-ga · e20d68aa
      Paolo Bonzini authored
      
      Preserve the functionality of the environment variables, but
      allow using the command line instead.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e20d68aa
    • Paolo Bonzini's avatar
      meson, cutils: allow non-relocatable installs · 655e2a77
      Paolo Bonzini authored
      Say QEMU is configured with bindir = "/usr/bin" and a firmware path
      that starts with "/usr/share/qemu".  Ever since QEMU 5.2, QEMU's
      install has been relocatable: if you move qemu-system-x86_64 from
      /usr/bin to /home/username/bin, it will start looking for firmware in
      /home/username/share/qemu.  Previously, you would get a non-relocatable
      install where the moved QEMU will keep looking for firmware in
      /usr/share/qemu.
      
      Windows almost always wants relocatable installs, and in fact that
      is why QEMU 5.2 introduced relocatability in the first place.
      However, newfangled distribution mechanisms such as AppImage
      (https://docs.appimage.org/reference/best-practices.html
      
      ), and
      possibly NixOS, also dislike using at runtime the absolute paths
      that were established at build time.
      
      On POSIX systems you almost never care; if you do, your usecase
      dictates which one is desirable, so there's no single answer.
      Obviously relocatability works fine most of the time, because not many
      people have complained about QEMU's switch to relocatable install,
      and that's why until now there was no way to disable relocatability.
      
      But a non-relocatable, non-modular binary can help if you want to do
      experiments with old firmware and new QEMU or vice versa (because you
      can just upgrade/downgrade the firmware package, and use rpm2cpio or
      similar to extract the QEMU binaries outside /usr), so allow both.
      This patch allows one to build a non-relocatable install using a new
      option to configure.  Why?  Because it's not too hard, and because
      it helps the user double check the relocatability of their install.
      
      Note that the same code that handles relocation also lets you run QEMU
      from the build tree and pick e.g. firmware files from the source tree
      transparently.  Therefore that part remains active with this patch,
      even if you configure with --disable-relocatable.
      
      Suggested-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      Reviewed-by: default avatarEmmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      655e2a77
  5. Oct 16, 2023
  6. Oct 04, 2023
  7. Sep 25, 2023
    • Thomas Huth's avatar
      meson.build: Make keyutils independent from keyring · c64023b0
      Thomas Huth authored
      Commit 0db0fbb5 ("Add conditional dependency for libkeyutils")
      tried to provide a possibility for the user to disable keyutils
      if not required by makeing it depend on the keyring feature. This
      looked reasonable at a first glance (the unit test in tests/unit/
      needs both), but the condition in meson.build fails if the feature
      is meant to be detected automatically, and there is also another
      spot in backends/meson.build where keyutils is used independently
      from keyring. So let's remove the dependency on keyring again and
      introduce a proper meson build option instead.
      
      Cc: qemu-stable@nongnu.org
      Fixes: 0db0fbb5 ("Add conditional dependency for libkeyutils")
      Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1842
      
      
      Message-ID: <20230824094208.255279-1-thuth@redhat.com>
      Reviewed-by: default avatar"Daniel P. Berrangé" <berrange@redhat.com>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      c64023b0
  8. Sep 18, 2023
    • Ilya Maximets's avatar
      net: add initial support for AF_XDP network backend · cb039ef3
      Ilya Maximets authored
      
      AF_XDP is a network socket family that allows communication directly
      with the network device driver in the kernel, bypassing most or all
      of the kernel networking stack.  In the essence, the technology is
      pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
      and works with any network interfaces without driver modifications.
      Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
      require access to character devices or unix sockets.  Only access to
      the network interface itself is necessary.
      
      This patch implements a network backend that communicates with the
      kernel by creating an AF_XDP socket.  A chunk of userspace memory
      is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
      Fill and Completion) are placed in that memory along with a pool of
      memory buffers for the packet data.  Data transmission is done by
      allocating one of the buffers, copying packet data into it and
      placing the pointer into Tx ring.  After transmission, device will
      return the buffer via Completion ring.  On Rx, device will take
      a buffer form a pre-populated Fill ring, write the packet data into
      it and place the buffer into Rx ring.
      
      AF_XDP network backend takes on the communication with the host
      kernel and the network interface and forwards packets to/from the
      peer device in QEMU.
      
      Usage example:
      
        -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
        -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1
      
      XDP program bridges the socket with a network interface.  It can be
      attached to the interface in 2 different modes:
      
      1. skb - this mode should work for any interface and doesn't require
               driver support.  With a caveat of lower performance.
      
      2. native - this does require support from the driver and allows to
                  bypass skb allocation in the kernel and potentially use
                  zero-copy while getting packets in/out userspace.
      
      By default, QEMU will try to use native mode and fall back to skb.
      Mode can be forced via 'mode' option.  To force 'copy' even in native
      mode, use 'force-copy=on' option.  This might be useful if there is
      some issue with the driver.
      
      Option 'queues=N' allows to specify how many device queues should
      be open.  Note that all the queues that are not open are still
      functional and can receive traffic, but it will not be delivered to
      QEMU.  So, the number of device queues should generally match the
      QEMU configuration, unless the device is shared with something
      else and the traffic re-direction to appropriate queues is correctly
      configured on a device level (e.g. with ethtool -N).
      'start-queue=M' option can be used to specify from which queue id
      QEMU should start configuring 'N' queues.  It might also be necessary
      to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
      for examples.
      
      In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
      or CAP_BPF capabilities in order to load default XSK/XDP programs to
      the network interface and configure BPF maps.  It is possible, however,
      to run with no capabilities.  For that to work, an external process
      with enough capabilities will need to pre-load default XSK program,
      create AF_XDP sockets and pass their file descriptors to QEMU process
      on startup via 'sock-fds' option.  Network backend will need to be
      configured with 'inhibit=on' to avoid loading of the program.
      QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
      or CAP_IPC_LOCK.
      
      There are few performance challenges with the current network backends.
      
      First is that they do not support IO threads.  This means that data
      path is handled by the main thread in QEMU and may slow down other
      work or may be slowed down by some other work.  This also means that
      taking advantage of multi-queue is generally not possible today.
      
      Another thing is that data path is going through the device emulation
      code, which is not really optimized for performance.  The fastest
      "frontend" device is virtio-net.  But it's not optimized for heavy
      traffic either, because it expects such use-cases to be handled via
      some implementation of vhost (user, kernel, vdpa).  In practice, we
      have virtio notifications and rcu lock/unlock on a per-packet basis
      and not very efficient accesses to the guest memory.  Communication
      channels between backend and frontend devices do not allow passing
      more than one packet at a time as well.
      
      Some of these challenges can be avoided in the future by adding better
      batching into device emulation or by implementing vhost-af-xdp variant.
      
      There are also a few kernel limitations.  AF_XDP sockets do not
      support any kinds of checksum or segmentation offloading.  Buffers
      are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
      support implementation for AF_XDP is in progress, but not ready yet.
      Also, transmission in all non-zero-copy modes is synchronous, i.e.
      done in a syscall.  That doesn't allow high packet rates on virtual
      interfaces.
      
      However, keeping in mind all of these challenges, current implementation
      of the AF_XDP backend shows a decent performance while running on top
      of a physical NIC with zero-copy support.
      
      Test setup:
      
      2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
      Network backend is configured to open the NIC directly in native mode.
      The driver supports zero-copy.  NIC is configured to use 1 queue.
      
      Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
      for PPS testing.
      
      iperf3 result:
       TCP stream      : 19.1 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
       Tx only         : 3.4 Mpps
       Rx only         : 2.0 Mpps
       L2 FWD Loopback : 1.5 Mpps
      
      In skb mode the same setup shows much lower performance, similar to
      the setup where pair of physical NICs is replaced with veth pair:
      
      iperf3 result:
        TCP stream      : 9 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
        Tx only         : 1.2 Mpps
        Rx only         : 1.0 Mpps
        L2 FWD Loopback : 0.7 Mpps
      
      Results in skb mode or over the veth are close to results of a tap
      backend with vhost=on and disabled segmentation offloading bridged
      with a NIC.
      
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      cb039ef3
  9. Sep 07, 2023
  10. Aug 31, 2023
  11. Jul 17, 2023
  12. Jun 26, 2023
  13. May 18, 2023
  14. May 16, 2023
  15. May 10, 2023
  16. May 05, 2023
    • Dorinda Bassey's avatar
      audio/pwaudio.c: Add Pipewire audio backend for QEMU · c2d3d1c2
      Dorinda Bassey authored
      
      This commit adds a new audiodev backend to allow QEMU to use Pipewire as
      both an audio sink and source. This backend is available on most systems
      
      Add Pipewire entry points for QEMU Pipewire audio backend
      Add wrappers for QEMU Pipewire audio backend in qpw_pcm_ops()
      qpw_write function returns the current state of the stream to pwaudio
      and Writes some data to the server for playback streams using pipewire
      spa_ringbuffer implementation.
      qpw_read function returns the current state of the stream to pwaudio and
      reads some data from the server for capture streams using pipewire
      spa_ringbuffer implementation. These functions qpw_write and qpw_read
      are called during playback and capture.
      Added some functions that convert pw audio formats to QEMU audio format
      and vice versa which would be needed in the pipewire audio sink and
      source functions qpw_init_in() & qpw_init_out().
      These methods that implement playback and recording will create streams
      for playback and capture that will start processing and will result in
      the on_process callbacks to be called.
      Built a connection to the Pipewire sound system server in the
      qpw_audio_init() method.
      
      Signed-off-by: default avatarDorinda Bassey <dbassey@redhat.com>
      Reviewed-by: default avatarVolker Rümelin <vr_qemu@t-online.de>
      Message-Id: <20230417105654.32328-1-dbassey@redhat.com>
      Reviewed-by: default avatarMarc-André Lureau <marcandre.lureau@redhat.com>
      c2d3d1c2
  17. Apr 21, 2023
  18. Feb 27, 2023
    • John Snow's avatar
      meson: stop looking for 'sphinx-build-3' · 1b1be8d3
      John Snow authored
      
      Once upon a time, "sphinx-build" on certain RPM platforms invoked
      specifically a Python 2.x version, while "sphinx-build-3" was a distro
      shim for the Python 3.x version.
      
      These days, none of our supported platforms utilize a 2.x version, and
      those that still have 'sphinx-build-3' make it a symbolic link to
      'sphinx-build'.  Not searching for 'sphinx-build-3' will prefer
      pip/venv installed versions of sphinx if they're available.
      
      This adds an extremely convenient ability to test document building
      ability in QEMU across multiple versions of Sphinx for the purposes of
      compatibility testing.
      
      Signed-off-by: default avatarJohn Snow <jsnow@redhat.com>
      Message-Id: <20230221012456.2607692-6-jsnow@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1b1be8d3
  19. Feb 16, 2023
  20. Feb 14, 2023
  21. Feb 11, 2023
  22. Dec 16, 2022
  23. Nov 23, 2022
  24. Oct 26, 2022
    • Stefan Hajnoczi's avatar
      blkio: add libblkio block driver · fd66dbd4
      Stefan Hajnoczi authored
      libblkio (https://gitlab.com/libblkio/libblkio/
      
      ) is a library for
      high-performance disk I/O. It currently supports io_uring,
      virtio-blk-vhost-user, and virtio-blk-vhost-vdpa with additional drivers
      under development.
      
      One of the reasons for developing libblkio is that other applications
      besides QEMU can use it. This will be particularly useful for
      virtio-blk-vhost-user which applications may wish to use for connecting
      to qemu-storage-daemon.
      
      libblkio also gives us an opportunity to develop in Rust behind a C API
      that is easy to consume from QEMU.
      
      This commit adds io_uring, nvme-io_uring, virtio-blk-vhost-user, and
      virtio-blk-vhost-vdpa BlockDrivers to QEMU using libblkio. It will be
      easy to add other libblkio drivers since they will share the majority of
      code.
      
      For now I/O buffers are copied through bounce buffers if the libblkio
      driver requires it. Later commits add an optimization for
      pre-registering guest RAM to avoid bounce buffers.
      
      The syntax is:
      
        --blockdev io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
      
        --blockdev nvme-io_uring,node-name=drive0,filename=/dev/ng0n1,readonly=on|off,cache.direct=on
      
        --blockdev virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off,cache.direct=on
      
        --blockdev virtio-blk-vhost-user,node-name=drive0,path=vhost-user-blk.sock,readonly=on|off,cache.direct=on
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-id: 20221013185908.1297568-3-stefanha@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      fd66dbd4
  25. Sep 27, 2022
  26. Sep 26, 2022
    • Thomas Huth's avatar
      Remove the slirp submodule (i.e. compile only with an external libslirp) · 5890258a
      Thomas Huth authored
      
      Since QEMU 7.1 we don't support Ubuntu 18.04 anymore, so the last big
      important Linux distro that did not have a pre-packaged libslirp has
      been dismissed. All other major distros seem to have a libslirp package
      in their distribution already - according to repology.org:
      
                Fedora 35: 4.6.1
        CentOS 8 (RHEL-8): 4.4.0
                Debian 11: 4.4.0
       OpenSUSE Leap 15.3: 4.3.1
         Ubuntu LTS 20.04: 4.1.0
            FreeBSD Ports: 4.7.0
            NetBSD pkgsrc: 4.7.0
                 Homebrew: 4.7.0
              MSYS2 mingw: 4.7.0
      
      The only one that was still missing a libslirp package is OpenBSD - but
      the next version (OpenBSD 7.2 which will be shipped in October) is going
      to include a libslirp package. Since QEMU 7.2 will be published after
      OpenBSD 7.2, we should be fine there, too.
      
      So there is no real urgent need for keeping the slirp submodule in
      the QEMU tree anymore. Thus let's drop the slirp submodule now and
      rely on the libslirp packages from the distributions instead.
      
      Message-Id: <20220824151122.704946-7-thuth@redhat.com>
      Acked-by: default avatarSamuel Thibault <samuel.thibault@ens-lyon.org>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      5890258a
  27. Sep 01, 2022
  28. Jul 13, 2022
  29. Jul 12, 2022
  30. Jun 24, 2022
    • Xie Yongji's avatar
      vduse-blk: Implement vduse-blk export · 2a2359b8
      Xie Yongji authored
      
      This implements a VDUSE block backends based on
      the libvduse library. We can use it to export the BDSs
      for both VM and container (host) usage.
      
      The new command-line syntax is:
      
      $ qemu-storage-daemon \
          --blockdev file,node-name=drive0,filename=test.img \
          --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on
      
      After the qemu-storage-daemon started, we need to use
      the "vdpa" command to attach the device to vDPA bus:
      
      $ vdpa dev add name vduse-export0 mgmtdev vduse
      
      Also the device must be removed via the "vdpa" command
      before we stop the qemu-storage-daemon.
      
      Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Message-Id: <20220523084611.91-7-xieyongji@bytedance.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      2a2359b8
Loading