ItemSet

class dgl.graphbolt.ItemSet(items: int | Tensor | Iterable | Tuple[Iterable], names: str | Tuple[str] | None = None)[source]

Bases: object

A wrapper of iterable data or tuple of iterable data.

All itemsets that represent an iterable of items should subclass it. Such form of itemset is particularly useful when items come from a stream. This class requires each input itemset to be iterable.

Parameters:
  • items (Union[int, Iterable, Tuple[Iterable]]) – The items to be iterated over. If it is a single integer, a range() object will be created and iterated over. If it’s multi-dimensional iterable such as torch.Tensor, it will be iterated over the first dimension. If it is a tuple, each item in the tuple is an iterable of items.

  • names (Union[str, Tuple[str]], optional) – The names of the items. If it is a tuple, each name corresponds to an item in the tuple. The naming is arbitrary, but in general practice, the names should be chosen from [‘seed_nodes’, ‘node_pairs’, ‘labels’, ‘seeds’, ‘negative_srcs’, ‘negative_dsts’] to align with the attributes of class dgl.graphbolt.MiniBatch.

Examples

>>> import torch
>>> from dgl import graphbolt as gb
  1. Integer: number of nodes.

>>> num = 10
>>> item_set = gb.ItemSet(num, names="seed_nodes")
>>> list(item_set)
[tensor(0), tensor(1), tensor(2), tensor(3), tensor(4), tensor(5),
 tensor(6), tensor(7), tensor(8), tensor(9)]
>>> item_set[:]
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> item_set.names
('seed_nodes',)
  1. Torch scalar: number of nodes. Customizable dtype compared to Integer.

>>> num = torch.tensor(10, dtype=torch.int32)
>>> item_set = gb.ItemSet(num, names="seed_nodes")
>>> list(item_set)
[tensor(0, dtype=torch.int32), tensor(1, dtype=torch.int32),
 tensor(2, dtype=torch.int32), tensor(3, dtype=torch.int32),
 tensor(4, dtype=torch.int32), tensor(5, dtype=torch.int32),
 tensor(6, dtype=torch.int32), tensor(7, dtype=torch.int32),
 tensor(8, dtype=torch.int32), tensor(9, dtype=torch.int32)]
>>> item_set[:]
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=torch.int32)
>>> item_set.names
('seed_nodes',)
  1. Single iterable: seed nodes.

>>> node_ids = torch.arange(0, 5)
>>> item_set = gb.ItemSet(node_ids, names="seed_nodes")
>>> list(item_set)
[tensor(0), tensor(1), tensor(2), tensor(3), tensor(4)]
>>> item_set[:]
tensor([0, 1, 2, 3, 4])
>>> item_set.names
('seed_nodes',)
  1. Tuple of iterables with same shape: seed nodes and labels.

>>> node_ids = torch.arange(0, 5)
>>> labels = torch.arange(5, 10)
>>> item_set = gb.ItemSet(
...     (node_ids, labels), names=("seed_nodes", "labels"))
>>> list(item_set)
[(tensor(0), tensor(5)), (tensor(1), tensor(6)), (tensor(2), tensor(7)),
 (tensor(3), tensor(8)), (tensor(4), tensor(9))]
>>> item_set[:]
(tensor([0, 1, 2, 3, 4]), tensor([5, 6, 7, 8, 9]))
>>> item_set.names
('seed_nodes', 'labels')
  1. Tuple of iterables with different shape: node pairs and negative dsts.

>>> node_pairs = torch.arange(0, 10).reshape(-1, 2)
>>> neg_dsts = torch.arange(10, 25).reshape(-1, 3)
>>> item_set = gb.ItemSet(
...     (node_pairs, neg_dsts), names=("node_pairs", "negative_dsts"))
>>> list(item_set)
[(tensor([0, 1]), tensor([10, 11, 12])),
 (tensor([2, 3]), tensor([13, 14, 15])),
 (tensor([4, 5]), tensor([16, 17, 18])),
 (tensor([6, 7]), tensor([19, 20, 21])),
 (tensor([8, 9]), tensor([22, 23, 24]))]
>>> item_set[:]
(tensor([[0, 1], [2, 3], [4, 5], [6, 7],[8, 9]]),
 tensor([[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21],
    [22, 23, 24]]))
>>> item_set.names
('node_pairs', 'negative_dsts')
property names: Tuple[str]

Return the names of the items.