Skip to content
Snippets Groups Projects
  1. Oct 17, 2016
  2. Oct 16, 2016
    • David Gibson's avatar
      spapr: Improved placement of PCI host bridges in guest memory map · 357d1e3b
      David Gibson authored
      
      Currently, the MMIO space for accessing PCI on pseries guests begins at
      1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
      chunk of address space in which it places its outbound PIO and 32-bit and
      64-bit MMIO windows.
      
      This scheme as several problems:
        - It limits guest RAM to 1 TiB (though we have a limited fix for this
          now)
        - It limits the total MMIO window to 64 GiB.  This is not always enough
          for some of the large nVidia GPGPU cards
        - Putting all the windows into a single 64 GiB area means that naturally
          aligning things within there will waste more address space.
      In addition there was a miscalculation in some of the defaults, which meant
      that the MMIO windows for each PHB actually slightly overran the 64 GiB
      region for that PHB.  We got away without nasty consequences because
      the overrun fit within an unused area at the beginning of the next PHB's
      region, but it's not pretty.
      
      This patch implements a new scheme which addresses those problems, and is
      also closer to what bare metal hardware and pHyp guests generally use.
      
      Because some guest versions (including most current distro kernels) can't
      access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
      64 TiB.  This is broken into 1 TiB chunks.  The first 1 TiB contains the
      PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
      subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
      one PHB each.
      
      This reduces the number of allowed PHBs (without full manual configuration
      of all the windows) from 256 to 31, but this should still be plenty in
      practice.
      
      We also change some of the default window sizes for manually configured
      PHBs to saner values.
      
      Finally we adjust some tests and libqos so that it correctly uses the new
      default locations.  Ideally it would parse the device tree given to the
      guest, but that's a more complex problem for another time.
      
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarLaurent Vivier <lvivier@redhat.com>
      357d1e3b
    • David Gibson's avatar
      spapr_pci: Add a 64-bit MMIO window · daa23699
      David Gibson authored
      
      On real hardware, and under pHyp, the PCI host bridges on Power machines
      typically advertise two outbound MMIO windows from the guest's physical
      memory space to PCI memory space:
        - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
        - A 64-bit window which maps onto a large region somewhere high in PCI
          address space (traditionally this used an identity mapping from guest
          physical address to PCI address, but that's not always the case)
      
      The qemu implementation in spapr-pci-host-bridge, however, only supports a
      single outbound MMIO window, however.  At least some Linux versions expect
      the two windows however, so we arranged this window to map onto the PCI
      memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
      windows, the "32-bit" window from 2G..4G and the "64-bit" window from
      4G..~64G.
      
      This approach means, however, that the 64G window is not naturally aligned.
      In turn this limits the size of the largest BAR we can map (which does have
      to be naturally aligned) to roughly half of the total window.  With some
      large nVidia GPGPU cards which have huge memory BARs, this is starting to
      be a problem.
      
      This patch adds true support for separate 32-bit and 64-bit outbound MMIO
      windows to the spapr-pci-host-bridge implementation, each of which can
      be independently configured.  The 32-bit window always maps to 2G.. in PCI
      space, but the PCI address of the 64-bit window can be configured (it
      defaults to the same as the guest physical address).
      
      So as not to break possible existing configurations, as long as a 64-bit
      window is not specified, a large single window can be specified.  This
      will appear the same way to the guest as the old approach, although it's
      now implemented by two contiguous memory regions rather than a single one.
      
      For now, this only adds the possibility of 64-bit windows.  The default
      configuration still uses the legacy mode.
      
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarLaurent Vivier <lvivier@redhat.com>
      daa23699
    • David Gibson's avatar
      spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM · 2efff1c0
      David Gibson authored
      
      Currently the default PCI host bridge for the 'pseries' machine type is
      constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
      guest memory space.  This means that if > 1TiB of guest RAM is specified,
      the RAM will collide with the PCI IO windows, causing serious problems.
      
      Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
      there's a little unused space at the bottom of the area reserved for PCI,
      but essentially this means that > 1TiB of RAM has never worked with the
      pseries machine type.
      
      This patch fixes this by altering the placement of PHBs on large-RAM VMs.
      Instead of always placing the first PHB at 1TiB, it is placed at the next
      1 TiB boundary after the maximum RAM address.
      
      Technically, this changes behaviour in a migration-breaking way for
      existing machines with > 1TiB maximum memory, but since having > 1 TiB
      memory was broken anyway, this seems like a reasonable trade-off.
      
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarLaurent Vivier <lvivier@redhat.com>
      2efff1c0
    • David Gibson's avatar
      spapr_pci: Delegate placement of PCI host bridges to machine type · 6737d9ad
      David Gibson authored
      
      The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
      for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
      and PAPR guests) to have numerous independent PHBs, each controlling a
      separate PCI domain.
      
      There are two ways of configuring the spapr-pci-host-bridge device: first
      it can be done fully manually, specifying the locations and sizes of all
      the IO windows.  This gives the most control, but is very awkward with 6
      mandatory parameters.  Alternatively just an "index" can be specified
      which essentially selects from an array of predefined PHB locations.
      The PHB at index 0 is automatically created as the default PHB.
      
      The current set of default locations causes some problems for guests with
      large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
      GPGPU cards via VFIO).  Obviously, for migration we can only change the
      locations on a new machine type, however.
      
      This is awkward, because the placement is currently decided within the
      spapr-pci-host-bridge code, so it breaks abstraction to look inside the
      machine type version.
      
      So, this patch delegates the "default mode" PHB placement from the
      spapr-pci-host-bridge device back to the machine type via a public method
      in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
      can do.
      
      For now, this just changes where the calculation is done.  It doesn't
      change the actual location of the host bridges, or any other behaviour.
      
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarLaurent Vivier <lvivier@redhat.com>
      6737d9ad
Loading