LegacyTUDataset¶
-
class
dgl.data.
LegacyTUDataset
(name, use_pandas=False, hidden_size=10, max_allow_node=None, raw_dir=None, force_reload=False, verbose=False, transform=None)[source]¶ Bases:
dgl.data.dgl_dataset.DGLBuiltinDataset
LegacyTUDataset contains lots of graph kernel datasets for graph classification.
- Parameters
name (str) – Dataset Name, such as
ENZYMES
,DD
,COLLAB
,MUTAG
, can be the datasets name on https://chrsmrrs.github.io/datasets/docs/datasets/.use_pandas (bool) – Numpy’s file read function has performance issue when file is large, using pandas can be faster. Default: False
hidden_size (int) – Some dataset doesn’t contain features. Use constant node features initialization instead, with hidden size as
hidden_size
. Default : 10max_allow_node (int) – Remove graphs that contains more nodes than
max_allow_node
. Default : Nonetransform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
-
num_labels
¶ (DEPRECATED, use num_classes instead) Number of classes
- Type
numpy.int64
Notes
LegacyTUDataset uses provided node feature by default. If no feature provided, it uses one-hot node label instead. If neither labels provided, it uses constant for node feature.
The dataset sorts graphs by their labels. Shuffle is preferred before manual train/val split.
Examples
>>> data = LegacyTUDataset('DD')
The dataset instance is an iterable
>>> len(data) 1178 >>> g, label = data[1024] >>> g Graph(num_nodes=88, num_edges=410, ndata_schemes={'feat': Scheme(shape=(89,), dtype=torch.float32), '_ID': Scheme(shape=(), dtype=torch.int64)} edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}) >>> label tensor(1)
Batch the graphs and labels for mini-batch training
>>> graphs, labels = zip(*[data[i] for i in range(16)]) >>> batched_graphs = dgl.batch(graphs) >>> batched_labels = torch.tensor(labels) >>> batched_graphs Graph(num_nodes=9539, num_edges=47382, ndata_schemes={'feat': Scheme(shape=(89,), dtype=torch.float32), '_ID': Scheme(shape=(), dtype=torch.int64)} edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)})
-
__getitem__
(idx)[source]¶ Get the idx-th sample.
- Parameters
idx (int) – The sample index.
- Returns
Graph with node feature stored in
feat
field and node label innode_label
if available. And its label.- Return type
(
dgl.DGLGraph
, Tensor)