dgl.distributed.node_split

dgl.distributed.node_split(nodes, partition_book=None, ntype='_N', rank=None, force_even=True, node_trainer_ids=None)[source]

Split nodes and return a subset for the local rank.

This function splits the input nodes based on the partition book and returns a subset of nodes for the local rank. This method is used for dividing workloads for distributed training.

The input nodes are stored as a vector of masks. The length of the vector is the same as the number of nodes in a graph; 1 indicates that the vertex in the corresponding location exists.

There are two strategies to split the nodes. By default, it splits the nodes in a way to maximize data locality. That is, all nodes that belong to a process are returned. If force_even is set to true, the nodes are split evenly so that each process gets almost the same number of nodes.

When force_even is True, the data locality is still preserved if a graph is partitioned with Metis and the node/edge IDs are shuffled. In this case, majority of the nodes returned for a process are the ones that belong to the process. If node/edge IDs are not shuffled, data locality is not guaranteed.

Parameters:
  • nodes (1D tensor or DistTensor) – A boolean mask vector that indicates input nodes.

  • partition_book (GraphPartitionBook, optional) – The graph partition book

  • ntype (str, optional) – The node type of the input nodes.

  • rank (int, optional) – The rank of a process. If not given, the rank of the current process is used.

  • force_even (bool, optional) – Force the nodes are split evenly.

  • node_trainer_ids (1D tensor or DistTensor, optional) – If not None, split the nodes to the trainers on the same machine according to trainer IDs assigned to each node. Otherwise, split randomly.

Returns:

The vector of node IDs that belong to the rank.

Return type:

1D-tensor