dgl.distributed.sample_etype_neighbors

dgl.distributed.sample_etype_neighbors(g, nodes, etype_field, fanout, edge_dir='in', prob=None, replace=False, etype_sorted=True)[source]

Sample from the neighbors of the given nodes from a distributed graph.

For each node, a number of inbound (or outbound when edge_dir == 'out') edges will be randomly chosen. The returned graph will contain all the nodes in the original graph, but only the sampled edges.

Node/edge features are not preserved. The original IDs of the sampled edges are stored as the dgl.EID feature in the returned graph.

This function assumes the input is a homogeneous DGLGraph with the TRUE edge type information stored as the edge data in etype_field. The sampled subgraph is also stored in the homogeneous graph format. That is, all nodes and edges are assigned with unique IDs (in contrast, we typically use a type name and a node/edge ID to identify a node or an edge in DGLGraph). We refer to this type of IDs as homogeneous ID. Users can use dgl.distributed.GraphPartitionBook.map_to_per_ntype() and dgl.distributed.GraphPartitionBook.map_to_per_etype() to identify their node/edge types and node/edge IDs of that type.

Parameters
  • g (DistGraph) – The distributed graph..

  • nodes (tensor or dict) – Node IDs to sample neighbors from. If it’s a dict, it should contain only one key-value pair to make this API consistent with dgl.sampling.sample_neighbors.

  • etype_field (string) – The field in g.edata storing the edge type.

  • fanout (int or dict[etype, int]) –

    The number of edges to be sampled for each node per edge type. If an integer is given, DGL assumes that the same fanout is applied to every edge type.

    If -1 is given, all of the neighbors will be selected.

  • edge_dir (str, optional) –

    Determines whether to sample inbound or outbound edges.

    Can take either in for inbound edges or out for outbound edges.

  • prob (str, optional) –

    Feature name used as the (unnormalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge.

    The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don’t have to sum up to one). Otherwise, the result will be undefined.

  • replace (bool, optional) –

    If True, sample with replacement.

    When sampling with replacement, the sampled subgraph could have parallel edges.

    For sampling without replacement, if fanout > the number of neighbors, all the neighbors are sampled. If fanout == -1, all neighbors are collected.

  • etype_sorted (bool, optional) – Indicates whether etypes are sorted.

Returns

A sampled subgraph containing only the sampled neighboring edges. It is on CPU.

Return type

DGLGraph