dgl.distributed.partition.partition_graph¶

dgl.distributed.partition.partition_graph(g, graph_name, num_parts, out_path, num_hops=1, part_method='metis', reshuffle=True, balance_ntypes=None, balance_edges=False, return_mapping=False, num_trainers_per_machine=1)[source]¶

Partition a graph for distributed training and store the partitions on files.

The partitioning occurs in three steps: 1) run a partition algorithm (e.g., Metis) to assign nodes to partitions; 2) construct partition graph structure based on the node assignment; 3) split the node features and edge features based on the partition result.

When a graph is partitioned, each partition can contain HALO nodes, which are assigned to other partitions but are included in this partition for efficiency purpose. In this document, local nodes/edges refers to the nodes and edges that truly belong to a partition. The rest are “HALO nodes/edges”.

The partitioned data is stored into multiple files organized as follows:

data_root_dir/
  |-- graph_name.json     # partition configuration file in JSON
  |-- node_map.npy        # partition id of each node stored in a numpy array (optional)
  |-- edge_map.npy        # partition id of each edge stored in a numpy array (optional)
  |-- part0/              # data for partition 0
      |-- node_feats.dgl  # node features stored in binary format
      |-- edge_feats.dgl  # edge features stored in binary format
      |-- graph.dgl       # graph structure of this partition stored in binary format
  |-- part1/              # data for partition 1
      |-- node_feats.dgl
      |-- edge_feats.dgl
      |-- graph.dgl

First, the metadata of the original graph and the partitioning is stored in a JSON file named after graph_name. This JSON file contains the information of the original graph as well as the path of the files that store each partition. Below show an example.

{
   "graph_name" : "test",
   "part_method" : "metis",
   "num_parts" : 2,
   "halo_hops" : 1,
   "node_map": {
       "_U": [ [ 0, 1261310 ],
               [ 1261310, 2449029 ] ]
   },
   "edge_map": {
       "_V": [ [ 0, 62539528 ],
               [ 62539528, 123718280 ] ]
   },
   "etypes": { "_V": 0 },
   "ntypes": { "_U": 0 },
   "num_nodes" : 1000000,
   "num_edges" : 52000000,
   "part-0" : {
     "node_feats" : "data_root_dir/part0/node_feats.dgl",
     "edge_feats" : "data_root_dir/part0/edge_feats.dgl",
     "part_graph" : "data_root_dir/part0/graph.dgl",
   },
   "part-1" : {
     "node_feats" : "data_root_dir/part1/node_feats.dgl",
     "edge_feats" : "data_root_dir/part1/edge_feats.dgl",
     "part_graph" : "data_root_dir/part1/graph.dgl",
   },
}

Here are the definition of the fields in the partition configuration file:

graph_name is the name of the graph given by a user.
part_method is the method used to assign nodes to partitions. Currently, it supports “random” and “metis”.
num_parts is the number of partitions.
halo_hops is the number of hops of nodes we include in a partition as HALO nodes.
node_map is the node assignment map, which tells the partition ID a node is assigned to. The format of node_map is described below.
edge_map is the edge assignment map, which tells the partition ID an edge is assigned to.
num_nodes is the number of nodes in the global graph.
num_edges is the number of edges in the global graph.
part-* stores the data of a partition.

If reshuffle=False, node IDs and edge IDs of a partition do not fall into contiguous ID ranges. In this case, DGL stores node/edge mappings (from node/edge IDs to partition IDs) in separate files (node_map.npy and edge_map.npy). The node/edge mappings are stored in numpy files.

Warning

this format is deprecated and will not be supported by the next release. In other words, the future release will always shuffle node IDs and edge IDs when partitioning a graph.

If reshuffle=True, node_map and edge_map contains the information for mapping between global node/edge IDs to partition-local node/edge IDs. For heterogeneous graphs, the information in node_map and edge_map can also be used to compute node types and edge types. The format of the data in node_map and edge_map is as follows:

{
    "node_type": [ [ part1_start, part1_end ],
                   [ part2_start, part2_end ],
                   ... ],
    ...
},

Essentially, node_map and edge_map are dictionaries. The keys are node/edge types. The values are lists of pairs containing the start and end of the ID range for the corresponding types in a partition. The length of the list is the number of partitions; each element in the list is a tuple that stores the start and the end of an ID range for a particular node/edge type in the partition.

The graph structure of a partition is stored in a file with the DGLGraph format. Nodes in each partition is relabeled to always start with zero. We call the node ID in the original graph, global ID, while the relabeled ID in each partition, local ID. Each partition graph has an integer node data tensor stored under name dgl.NID and each value is the node’s global ID. Similarly, edges are relabeled too and the mapping from local ID to global ID is stored as an integer edge data tensor under name dgl.EID. For a heterogeneous graph, the DGLGraph also contains a node data dgl.NTYPE for node type and an edge data dgl.ETYPE for the edge type.

The partition graph contains additional node data (“inner_node” and “orig_id”) and edge data (“inner_edge”):

“inner_node” indicates whether a node belongs to a partition.
“inner_edge” indicates whether an edge belongs to a partition.
“orig_id” exists when reshuffle=True. It indicates the original node IDs in the original graph before reshuffling.

Node and edge features are splitted and stored together with each graph partition. All node/edge features in a partition are stored in a file with DGL format. The node/edge features are stored in dictionaries, in which the key is the node/edge data name and the value is a tensor. We do not store features of HALO nodes and edges.

When performing Metis partitioning, we can put some constraint on the partitioning. Current, it supports two constrants to balance the partitioning. By default, Metis always tries to balance the number of nodes in each partition.

balance_ntypes balances the number of nodes of different types in each partition.
balance_edges balances the number of edges in each partition.

To balance the node types, a user needs to pass a vector of N elements to indicate the type of each node. N is the number of nodes in the input graph.

Parameters

g (DGLGraph) – The input graph to partition
graph_name (str) – The name of the graph. The name will be used to construct DistGraph().
num_parts (int) – The number of partitions
out_path (str) – The path to store the files for all partitioned data.
num_hops (int, optional) – The number of hops of HALO nodes we construct on a partition graph structure. The default value is 1.
part_method (str, optional) – The partition method. It supports “random” and “metis”. The default value is “metis”.
reshuffle (bool, optional) – Reshuffle nodes and edges so that nodes and edges in a partition are in contiguous ID range. The default value is True. The argument is deprecated and will be removed in the next release.
balance_ntypes (tensor, optional) – Node type of each node. This is a 1D-array of integers. Its values indicates the node type of each node. This argument is used by Metis partition. When the argument is specified, the Metis algorithm will try to partition the input graph into partitions where each partition has roughly the same number of nodes for each node type. The default value is None, which means Metis partitions the graph to only balance the number of nodes.
balance_edges (bool) – Indicate whether to balance the edges in each partition. This argument is used by the Metis algorithm.
return_mapping (bool) – If reshuffle=True, this indicates to return the mapping between shuffled node/edge IDs and the original node/edge IDs.
num_trainers_per_machine (int, optional) – The number of trainers per machine. If is not 1, the whole graph will be first partitioned to each trainer, that is num_parts*num_trainers_per_machine parts. And the trainer ids of each node will be stored in the node feature ‘trainer_id’. Then the partitions of trainers on the same machine will be coalesced into one larger partition. The final number of partitions is num_part.

Returns

Tensor or dict of tensors, optional – If return_mapping=True, return a 1D tensor that indicates the mapping between shuffled node IDs and the original node IDs for a homogeneous graph; return a dict of 1D tensors whose key is the node type and value is a 1D tensor mapping between shuffled node IDs and the original node IDs for each node type for a heterogeneous graph.
Tensor or dict of tensors, optional – If return_mapping=True, return a 1D tensor that indicates the mapping between shuffled edge IDs and the original edge IDs for a homogeneous graph; return a dict of 1D tensors whose key is the edge type and value is a 1D tensor mapping between shuffled edge IDs and the original edge IDs for each edge type for a heterogeneous graph.

Examples

>>> dgl.distributed.partition_graph(g, 'test', 4, num_hops=1, part_method='metis',
...                                 out_path='output/', reshuffle=True,
...                                 balance_ntypes=g.ndata['train_mask'],
...                                 balance_edges=True)
>>> g, node_feats, edge_feats, gpb, graph_name = dgl.distributed.load_partition(
...                                 'output/test.json', 0)