SampledSubgraph

class dgl.graphbolt.SampledSubgraph[source]

Bases: object

An abstract class for sampled subgraph. In the context of a heterogeneous graph, each field should be of Dict type. Otherwise, for homogeneous graphs, each field should correspond to its respective value type.

exclude_edges(edges: Dict[str, Tensor] | Tensor, assume_num_node_within_int32: bool = True)[source]

Exclude edges from the sampled subgraph.

This function can be used with sampled subgraphs, regardless of whether they have compacted row/column nodes or not. If the original subgraph has compacted row or column nodes, the corresponding row or column nodes in the returned subgraph will also be compacted.

Parameters:
  • self (SampledSubgraph) – The sampled subgraph.

  • edges (Union[torch.Tensor, Dict[str, torch.Tensor]]) – Edges to exclude. If sampled subgraph is homogeneous, then edges should be a N*2 tensors representing the edges to exclude. If sampled subgraph is heterogeneous, then edges should be a dictionary of edge types and the corresponding edges to exclude.

  • assume_num_node_within_int32 (bool) – If True, assumes the value of node IDs in the provided edges fall within the int32 range, which can significantly enhance computation speed. Default: True

Returns:

An instance of a class that inherits from SampledSubgraph.

Return type:

SampledSubgraph

Examples

>>> import dgl.graphbolt as gb
>>> import torch
>>> sampled_csc = {"A:relation:B": gb.CSCFormatBase(
...     indptr=torch.tensor([0, 1, 2, 3]),
...     indices=torch.tensor([0, 1, 2]))}
>>> original_column_node_ids = {"B": torch.tensor([10, 11, 12])}
>>> original_row_node_ids = {"A": torch.tensor([13, 14, 15])}
>>> original_edge_ids = {"A:relation:B": torch.tensor([19, 20, 21])}
>>> subgraph = gb.SampledSubgraphImpl(
...     sampled_csc=sampled_csc,
...     original_column_node_ids=original_column_node_ids,
...     original_row_node_ids=original_row_node_ids,
...     original_edge_ids=original_edge_ids
... )
>>> edges_to_exclude = {"A:relation:B": torch.tensor([[14, 11], [15, 12]])}
>>> result = subgraph.exclude_edges(edges_to_exclude)
>>> print(result.sampled_csc)
{'A:relation:B': CSCFormatBase(indptr=tensor([0, 1, 1, 1]),
            indices=tensor([0]),
)}
>>> print(result.original_column_node_ids)
{'B': tensor([10, 11, 12])}
>>> print(result.original_row_node_ids)
{'A': tensor([13, 14, 15])}
>>> print(result.original_edge_ids)
{'A:relation:B': tensor([19])}
to(device: device) None[source]

Copy SampledSubgraph to the specified device using reflection.

property original_column_node_ids: Tensor | Dict[str, Tensor]

Returns corresponding reverse column node ids the original graph. Column’s reverse node ids in the original graph. A graph structure can be treated as a coordinated row and column pair, and this is the mapped ids of the column.

  • If original_column_node_ids is a tensor: It represents the original node ids.

  • If original_column_node_ids is a dictionary: The keys should be node type and the values should be corresponding original heterogeneous node ids.

If present, it means column IDs are compacted, and sampled_csc column IDs match these compacted ones.

property original_edge_ids: Tensor | Dict[str, Tensor]

Returns corresponding reverse edge ids the original graph. Reverse edge ids in the original graph. This is useful when edge features are needed.

  • If original_edge_ids is a tensor: It represents the original edge ids.

  • If original_edge_ids is a dictionary: The keys should be edge type and the values should be corresponding original heterogeneous edge ids.

property original_row_node_ids: Tensor | Dict[str, Tensor]

Returns corresponding reverse row node ids the original graph. Row’s reverse node ids in the original graph. A graph structure can be treated as a coordinated row and column pair, and this is the mapped ids of the row.

  • If original_row_node_ids is a tensor: It represents the original node ids.

  • If original_row_node_ids is a dictionary: The keys should be node type and the values should be corresponding original heterogeneous node ids.

If present, it means row IDs are compacted, and sampled_csc row IDs match these compacted ones.

property sampled_csc: CSCFormatBase | Dict[str, CSCFormatBase]
Returns the node pairs representing edges in csc format.
  • If sampled_csc is a CSCFormatBase: It should be in the csc format. indptr stores the index in the data array where each column starts. indices stores the row indices of the non-zero elements.

  • If sampled_csc is a dictionary: The keys should be edge type and the values should be corresponding node pairs. The ids inside is heterogeneous ids.

Examples

  1. Homogeneous graph.

>>> import dgl.graphbolt as gb
>>> import torch
>>> sampled_csc = gb.CSCFormatBase(
...     indptr=torch.tensor([0, 1, 2, 3]),
...     indices=torch.tensor([0, 1, 2]))
>>> print(sampled_csc)
CSCFormatBase(indptr=tensor([0, 1, 2, 3]),
            indices=tensor([0, 1, 2]),
)
  1. Heterogeneous graph.

>>> sampled_csc = {"A:relation:B": gb.CSCFormatBase(
...     indptr=torch.tensor([0, 1, 2, 3]),
...     indices=torch.tensor([0, 1, 2]))}
>>> print(sampled_csc)
{'A:relation:B': CSCFormatBase(indptr=tensor([0, 1, 2, 3]),
            indices=tensor([0, 1, 2]),
)}