LayerNeighborSampler

class dgl.graphbolt.LayerNeighborSampler(datapipe, graph, fanouts, replace=False, prob_name=None, deduplicate=True, layer_dependency=False, batch_dependency=1)[source]

Bases: NeighborSamplerImpl

Sample layer neighbor edges from a graph and return a subgraph.

Functional name: sample_layer_neighbor.

Sampler that builds computational dependency of node representations via labor sampling for multilayer GNN from the NeurIPS 2023 paper Layer-Neighbor Sampling – Defusing Neighborhood Explosion in GNNs

Layer-Neighbor sampler is responsible for sampling a subgraph from given data. It returns an induced subgraph along with compacted information. In the context of a node classification task, the neighbor sampler directly utilizes the nodes provided as seed nodes. However, in scenarios involving link prediction, the process needs another pre-process operation. That is, gathering unique nodes from the given node pairs, encompassing both positive and negative node pairs, and employs these nodes as the seed nodes for subsequent steps.

Implements the approach described in Appendix A.3 of the paper. Similar to dgl.dataloading.LaborSampler but this uses sequential poisson sampling instead of poisson sampling to keep the count of sampled edges per vertex deterministic like NeighborSampler. Thus, it is a drop-in replacement for NeighborSampler. However, unlike NeighborSampler, it samples fewer vertices and edges for multilayer GNN scenario without harming convergence speed with respect to training iterations.

Parameters:
  • datapipe (DataPipe) – The datapipe.

  • graph (FusedCSCSamplingGraph) – The graph on which to perform subgraph sampling.

  • fanouts (list[torch.Tensor]) – The number of edges to be sampled for each node with or without considering edge types. The length of this parameter implicitly signifies the layer of sampling being conducted.

  • replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.

  • prob_name (str, optional) – The name of an edge attribute used as the weights of sampling for each node. This attribute tensor should contain (unnormalized) probabilities corresponding to each neighboring edge of a node. It must be a 1D floating-point or boolean tensor, with the number of elements equalling the total number of edges.

  • deduplicate (bool) – Boolean indicating whether seeds between hops will be deduplicated. If True, the same elements in seeds will be deleted to only one. Otherwise, the same elements will be remained.

  • layer_dependency (bool) – Boolean indicating whether different layers should use the same random variates. Results in a reduction in the number of nodes sampled and turns LayerNeighborSampler into a subgraph sampling method. Later layers will be guaranteed to sample overlapping neighbors as the previous layers.

  • batch_dependency (int) – Specifies whether consecutive minibatches should use similar random variates. Results in a higher temporal access locality of sampled nodes and edges. Setting it to \(\kappa\) slows down the change in the random variates proportional to \(\frac{1}{\kappa}\). Implements the dependent minibatching approach in arXiv:2310.12403.

Examples

>>> import dgl.graphbolt as gb
>>> import torch
>>> indptr = torch.LongTensor([0, 2, 4, 5, 6, 7 ,8])
>>> indices = torch.LongTensor([1, 2, 0, 3, 5, 4, 3, 5])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices)
>>> seeds = torch.LongTensor([[0, 1], [1, 2]])
>>> item_set = gb.ItemSet(seeds, names="seeds")
>>> item_sampler = gb.ItemSampler(item_set, batch_size=1,)
>>> neg_sampler = gb.UniformNegativeSampler(item_sampler, graph, 2)
>>> fanouts = [torch.LongTensor([5]),
...     torch.LongTensor([10]),torch.LongTensor([15])]
>>> subgraph_sampler = gb.LayerNeighborSampler(neg_sampler, graph, fanouts)
>>> next(iter(subgraph_sampler)).sampled_subgraphs
[SampledSubgraphImpl(sampled_csc=CSCFormatBase(
        indptr=tensor([0, 2, 4, 5, 6, 7, 8]),
        indices=tensor([1, 3, 0, 4, 2, 2, 5, 4]),
    ),
    original_row_node_ids=tensor([0, 1, 5, 2, 3, 4]),
    original_edge_ids=None,
    original_column_node_ids=tensor([0, 1, 5, 2, 3, 4]),
),
SampledSubgraphImpl(sampled_csc=CSCFormatBase(
        indptr=tensor([0, 2, 4, 5, 6, 7]),
        indices=tensor([1, 3, 0, 4, 2, 2, 5]),
    ),
    original_row_node_ids=tensor([0, 1, 5, 2, 3, 4]),
    original_edge_ids=None,
    original_column_node_ids=tensor([0, 1, 5, 2, 3]),
),
SampledSubgraphImpl(sampled_csc=CSCFormatBase(
        indptr=tensor([0, 2, 4, 5, 6]),
        indices=tensor([1, 3, 0, 4, 2, 2]),
    ),
    original_row_node_ids=tensor([0, 1, 5, 2, 3]),
    original_edge_ids=None,
    original_column_node_ids=tensor([0, 1, 5, 2]),
)]
>>> next(iter(subgraph_sampler)).compacted_seeds
tensor([[0, 1], [0, 2], [0, 3]])
>>> next(iter(subgraph_sampler)).labels
tensor([1., 0., 0.])
>>> next(iter(subgraph_sampler)).indexes
tensor([0, 0, 0])