Intel® MPI Library Developer Reference for Linux* OS

ID 768732
Date 3/22/2024
Public
Document Table of Contents

NIC Pinning

Use this feature to control the assignment of NICs to ranks. This feature is for machines with multiple NICs per node.

To enable NIC pinning, set I_MPI_MULTIRAIL=true to enable the use of multiple NICs on the machine and, in turn, enable NIC pinning. The NIC pinning information is printed out in the Intel(R) MPI debug output withI_MPI_DEBUG=3.

Default Settings

By default, when multi-rail is enabled, the available NICs are distributed between MPI ranks as equally as possible depending on the hardware topology. The NIC closest to the pinned CPU and GPU (when enabled) is preferred. This leads to selecting the most effective NIC because it has the fewest number of PCIe hops from the CPU/GPU to itself.

Examples

The following examples represent a machine configuration with two NUMA nodes and two NICs.

Figure 1. Four MPI Ranks

Debug output I_MPI_DEBUG=3:

[0] MPI startup(): Number of NICs: 2

[0] MPI startup(): ===== NIC pinning on host1 =====

[0] MPI startup(): Rank Pin nic NIC Id

[0] MPI startup(): 0    nic0    0

[0] MPI startup(): 1    nic0    0

[0] MPI startup(): 2    nic1    1

[0] MPI startup(): 3    nic1    1

Figure 2. Three MPI Ranks

Debug output I_MPI_DEBUG=3:

[0] MPI startup(): ===== NIC pinning on host1 =====

[0] MPI startup(): Rank Pin nic NIC Id

[0] MPI startup(): 0    nic0    0

[0] MPI startup(): 1    nic0    0

[0] MPI startup(): 2    nic1    1

I_MPI_OFI_NIC_AFFINITY

Control the selection strategy used in NIC affinity when Intel(R) GPUs are used.

Syntax

I_MPI_OFI_NIC_AFFINITY=<strategy>

Arguments

Value Description
<strategy> Specify the NIC-selection strategy.
cpu Select NIC so it prefers the closest CPU followed by the closest GPU.
gpu Select NIC so it prefers the closest GPU followed by the closest CPU.

Description

Set this environment variable to control the selection strategy for the NIC affinity of ranks.

NOTE:
GPU is factored in the NIC affinity only when GPUs are enabled with I_MPI_OFFLOAD=1 and Intel(R) GPUs are used.

In most cases, the default NIC-selection logic performs best. Use this environment variable only to override the default NIC-selection strategy.

Examples

The following examples represent a machine configuration with two NUMA nodes and two NICs.

Figure 3. Four MPI Ranks, I_MPI_OFFLOAD_CELL_LIST=2,3,0,1 I_MPI_OFI_NIC_AFFINITY=cpu

Debug output I_MPI_DEBUG=3:

[0] MPI startup(): Number of NICs: 2

[0] MPI startup(): ===== NIC pinning on host1 =====

[0] MPI startup(): Rank Pin nic NIC Id

[0] MPI startup(): 0    nic0    0

[0] MPI startup(): 1    nic0    0

[0] MPI startup(): 2    nic1    1

[0] MPI startup(): 3    nic1    1

Figure 3. Four MPI Ranks, I_MPI_OFFLOAD_CELL_LIST=2,3,0,1 I_MPI_OFI_NIC_AFFINITY=gpu

Debug output I_MPI_DEBUG=3:

[0] MPI startup(): Number of NICs: 2

[0] MPI startup(): ===== NIC pinning on host1 =====

[0] MPI startup(): Rank Pin nic NIC Id

[0] MPI startup(): 0    nic1    1

[0] MPI startup(): 1    nic1    1

[0] MPI startup(): 2    nic0    0

[0] MPI startup(): 3    nic0    0

I_MPI_OFI_NIC_LIST

Override the default NIC selection with an explicit list of NICs.

Syntax

I_MPI_OFI_NIC_LIST=<niclist>

Arguments

Value Description
<niclist> A comma-separated list of the NIC ids and/or NIC id ranges.
<l>-<m> Range of NICs with IDs from l to m.
<k>,<l>-<m> NIC with the id k, NICs with ids l to m.

Description

Set this environment variable to explicitly control the NIC selection.

Define a list of NIC ids to map a local rank to a NIC. The list should contain at least as many entries as the number of local ranks. The individual values should be between 0 and the NIC id of the last NIC in the node. The process with the i-th rank is pinned to the i-th NIC in the list.

The NIC id is not the absolute NIC number but the logical NIC index assigned to the NIC by Intel(R) MPI. You can view the logical indices for available NICs in the NIC pinning output with I_MPI_DEBUG=3.

For example, if a node has three NICs, which are registered on the machine as cxi2, cxi1,and cxi0, in that order, they are assigned logical ids as 0 (for cxi2), 1 (for cxi1), and 2 (for cxi0) respectively. Specifying the value of the environment variable as 0,2,1 for a 3-process run results in Rank 0 using cxi2, Rank 1 using cxi1, and Rank 3 using cxi0.

This environment variable is only relevant when you enable the multi-rail capability with I_MPI_MULTIRAIL=1.

In most cases, the default NIC-selection logic performs best. Use this environment variable only to override the default NIC-selection list.

NOTE:
If both I_MPI_OFI_NIC_AFFINITY and I_MPI_OFI_NIC_LIST are specified, I_MPI_OFI_NIC_LIST is used for the NIC selection.

Examples

Figure 5. Four MPI Ranks, I_MPI_OFI_NIC_LIST=1,0,0,1

Debug output I_MPI_DEBUG=3:

[0] MPI startup(): Number of NICs: 2

[0] MPI startup(): ===== NIC pinning on host1 =====

[0] MPI startup(): Rank Pin nic NIC Id

[0] MPI startup(): 0    nic1    1

[0] MPI startup(): 1    nic0    0

[0] MPI startup(): 2    nic0    0

[0] MPI startup(): 3    nic1    1

Figure 6. Four MPI Ranks, I_MPI_OFI_NIC_LIST=0-1,0-1

Debug output I_MPI_DEBUG=3:

[0] MPI startup(): Number of NICs: 2

[0] MPI startup(): ===== NIC pinning on host1 =====

[0] MPI startup(): Rank Pin nic NIC Id

[0] MPI startup(): 0    nic0    0

[0] MPI startup(): 1    nic1    1

[0] MPI startup(): 2    nic0    0

[0] MPI startup(): 3    nic1    1