Skip to content
Snippets Groups Projects
  • Michal Privoznik's avatar
    6bb613f0
    hostmem: Honor multiple preferred nodes if possible · 6bb613f0
    Michal Privoznik authored
    
    If a memory-backend is configured with mode
    HOST_MEM_POLICY_PREFERRED then
    host_memory_backend_memory_complete() calls mbind() as:
    
      mbind(..., MPOL_PREFERRED, nodemask, ...);
    
    Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
    to the .host-nodes attribute. Therefore, there can be multiple
    nodes specified. However, the documentation to MPOL_PREFERRED
    says:
    
      MPOL_PREFERRED
        This mode sets the preferred node for allocation. ...
        If nodemask specifies more than one node ID, the first node
        in the mask will be selected as the preferred node.
    
    Therefore, only the first node is honored and the rest is
    silently ignored. Well, with recent changes to the kernel and
    numactl we can do better.
    
    The Linux kernel added in v5.15 via commit cfcaa66f8032
    ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
    support for MPOL_PREFERRED_MANY, which accepts multiple preferred
    NUMA nodes instead.
    
    Then, numa_has_preferred_many() API was introduced to numactl
    (v2.0.15~26) allowing applications to query kernel support.
    
    Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
    mbind() call instead and stop ignoring multiple nodes, silently.
    
    Signed-off-by: default avatarMichal Privoznik <mprivozn@redhat.com>
    Message-Id: <a0b4adce1af5bd2344c2218eb4a04b3ff7bcfdb4.1671097918.git.mprivozn@redhat.com>
    Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    6bb613f0
    History
    hostmem: Honor multiple preferred nodes if possible
    Michal Privoznik authored
    
    If a memory-backend is configured with mode
    HOST_MEM_POLICY_PREFERRED then
    host_memory_backend_memory_complete() calls mbind() as:
    
      mbind(..., MPOL_PREFERRED, nodemask, ...);
    
    Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
    to the .host-nodes attribute. Therefore, there can be multiple
    nodes specified. However, the documentation to MPOL_PREFERRED
    says:
    
      MPOL_PREFERRED
        This mode sets the preferred node for allocation. ...
        If nodemask specifies more than one node ID, the first node
        in the mask will be selected as the preferred node.
    
    Therefore, only the first node is honored and the rest is
    silently ignored. Well, with recent changes to the kernel and
    numactl we can do better.
    
    The Linux kernel added in v5.15 via commit cfcaa66f8032
    ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
    support for MPOL_PREFERRED_MANY, which accepts multiple preferred
    NUMA nodes instead.
    
    Then, numa_has_preferred_many() API was introduced to numactl
    (v2.0.15~26) allowing applications to query kernel support.
    
    Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
    mbind() call instead and stop ignoring multiple nodes, silently.
    
    Signed-off-by: default avatarMichal Privoznik <mprivozn@redhat.com>
    Message-Id: <a0b4adce1af5bd2344c2218eb4a04b3ff7bcfdb4.1671097918.git.mprivozn@redhat.com>
    Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>