PPIDataset

class dgl.data.PPIDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]

Bases: DGLBuiltinDataset

Protein-Protein Interaction dataset for inductive node classification

A toy Protein-Protein Interaction network dataset. The dataset contains 24 graphs. The average number of nodes per graph is 2372. Each node has 50 features and 121 labels. 20 graphs for training, 2 for validation and 2 for testing.

Reference: http://snap.stanford.edu/graphsage/

Statistics:

Train examples: 20
Valid examples: 2
Test examples: 2

Parameters:

mode (str) – Must be one of (‘train’, ‘valid’, ‘test’). Default: ‘train’
raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: True.
transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_labels

Number of labels for each node

Type:: int

labels

Node labels

Type:: Tensor

features

Node features

Type:: Tensor

Examples

>>> dataset = PPIDataset(mode='valid')
>>> num_classes = dataset.num_classes
>>> for g in dataset:
....    feat = g.ndata['feat']
....    label = g.ndata['label']
....    # your code here
>>>

__getitem__(item)[source]

Get the item^th sample.

Parameters:

item (int) – The sample index.

Returns:

graph structure, node features and node labels.

ndata['feat']: node features
ndata['label']: node labels

Return type:

dgl.DGLGraph

__len__()[source]: Return number of samples in this dataset.