dgl.sampling.sample_labors¶

dgl.sampling.sample_labors(g, nodes, fanout, edge_dir='in', prob=None, importance_sampling=0, random_seed=None, seed2_contribution=0, copy_ndata=True, copy_edata=True, exclude_edges=None, output_device=None)[source]¶

Sampler that builds computational dependency of node representations via labor sampling for multilayer GNN from (LA)yer-neigh(BOR) Sampling: Defusing Neighborhood Explosion in GNNs <https://arxiv.org/abs/2210.13339>

This sampler will make every node gather messages from a fixed number of neighbors per edge type. The neighbors are picked uniformly with default parameters. For every vertex t that will be considered to be sampled, there will be a single random variate r_t.

For each node, a number of inbound (or outbound when edge_dir == 'out') edges will be randomly chosen. The graph returned will then contain all the nodes in the original graph, but only the sampled edges.

Node/edge features are not preserved. The original IDs of the sampled edges are stored as the dgl.EID feature in the returned graph.

Parameters

g (DGLGraph) – The graph, allowed to have multiple node or edge types. Can be either on CPU or GPU.
nodes (tensor or dict) –
Node IDs to sample neighbors from.

This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes.
fanout (int or dict[etype, int]) –
The number of edges to be sampled for each node on each edge type.

This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type.

If -1 is given for a single edge type, all the neighboring edges with that edge type will be selected.
edge_dir (str, optional) –
Determines whether to sample inbound or outbound edges.

Can take either in for inbound edges or out for outbound edges.
prob (str, optional) –
Feature name used as the (unnormalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge.

The features must be non-negative floats, and the sum of the features of inbound/outbound edges for every node must be positive (though they don’t have to sum up to one). Otherwise, the result will be undefined.

If prob is not None, GPU sampling is not supported.
importance_sampling (int, optional) – Whether to use importance sampling or uniform sampling, use of negative values optimizes importance sampling probabilities until convergence while use of positive values runs optimization steps that many times. If the value is i, then LABOR-i variant is used.
random_seed (tensor) –
An int64 tensor with one element.

The passed random_seed makes it so that for any seed vertex s and its neighbor t, the rolled random variate r_t is the same for any call to this function with the same random seed. When sampling as part of the same batch, one would want identical seeds so that LABOR can globally sample. One example is that for heterogenous graphs, there is a single random seed passed for each edge type. This will sample much fewer vertices compared to having unique random seeds for each edge type. If one called this function individually for each edge type for a heterogenous graph with different random seeds, then it would run LABOR locally for each edge type, resulting into a larger number of vertices being sampled.

If this function is called without a random_seed, we get the random seed by getting a random number from DGL. Use this argument with identical random_seed if multiple calls to this function are used to sample as part of a single batch.
seed2_contribution (float, optional) – A float value between [0, 1) that determines the contribution of the second random seed to generate the random variates for the LABOR sampling algorithm.
copy_ndata (bool, optional) –
If True, the node features of the new graph are copied from the original graph. If False, the new graph will not have any node features.

(Default: True)
copy_edata (bool, optional) –
If True, the edge features of the new graph are copied from the original graph. If False, the new graph will not have any edge features.

(Default: True)
exclude_edges (tensor or dict) –
Edge IDs to exclude during sampling neighbors for the seed nodes.

This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes.
output_device (Framework-specific device context object, optional) – The output device. Default is the same as the input graph.

Returns

A sampled subgraph containing only the sampled neighboring edges along with edge weights.

Return type

tuple(DGLGraph, list[Tensor])

Notes

If copy_ndata or copy_edata is True, same tensors are used as the node or edge features of the original graph and the new graph. As a result, users should avoid performing in-place operations on the node features of the new graph to avoid feature corruption.

Examples

Assume that you have the following graph

>>> g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]))

And the weights

>>> g.edata['prob'] = torch.FloatTensor([0., 1., 0., 1., 0., 1.])

To sample one inbound edge for node 0 and node 1:

>>> sg = dgl.sampling.sample_labors(g, [0, 1], 1)
>>> sg.edges(order='eid')
(tensor([1, 0]), tensor([0, 1]))
>>> sg.edata[dgl.EID]
tensor([2, 0])

To sample one inbound edge for node 0 and node 1 with probability in edge feature prob:

>>> sg = dgl.sampling.sample_labors(g, [0, 1], 1, prob='prob')
>>> sg.edges(order='eid')
(tensor([2, 1]), tensor([0, 1]))

With fanout greater than the number of actual neighbors and without replacement, DGL will take all neighbors instead:

>>> sg = dgl.sampling.sample_labors(g, [0, 1], 3)
>>> sg.edges(order='eid')
(tensor([1, 2, 0, 1]), tensor([0, 0, 1, 1]))

To exclude certain EID’s during sampling for the seed nodes:

>>> g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]))
>>> g_edges = g.all_edges(form='all')``
(tensor([0, 0, 1, 1, 2, 2]), tensor([1, 2, 0, 1, 2, 0]), tensor([0, 1, 2, 3, 4, 5]))
>>> sg = dgl.sampling.sample_labors(g, [0, 1], 3, exclude_edges=[0, 1, 2])
>>> sg.all_edges(form='all')
(tensor([2, 1]), tensor([0, 1]), tensor([0, 1]))
>>> sg.has_edges_between(g_edges[0][:3],g_edges[1][:3])
tensor([False, False, False])
>>> g = dgl.heterograph({
...   ('drug', 'interacts', 'drug'): ([0, 0, 1, 1, 3, 2], [1, 2, 0, 1, 2, 0]),
...   ('drug', 'interacts', 'gene'): ([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]),
...   ('drug', 'treats', 'disease'): ([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0])})
>>> g_edges = g.all_edges(form='all', etype=('drug', 'interacts', 'drug'))
(tensor([0, 0, 1, 1, 3, 2]), tensor([1, 2, 0, 1, 2, 0]), tensor([0, 1, 2, 3, 4, 5]))
>>> excluded_edges  = {('drug', 'interacts', 'drug'): g_edges[2][:3]}
>>> sg = dgl.sampling.sample_labors(g, {'drug':[0, 1]}, 3, exclude_edges=excluded_edges)
>>> sg.all_edges(form='all', etype=('drug', 'interacts', 'drug'))
(tensor([2, 1]), tensor([0, 1]), tensor([0, 1]))
>>> sg.has_edges_between(g_edges[0][:3],g_edges[1][:3],etype=('drug', 'interacts', 'drug'))
tensor([False, False, False])