TolokersDatasetยถ
-
class
dgl.data.
TolokersDataset
(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]ยถ Bases:
dgl.data.heterophilous_graphs.HeterophilousGraphDataset
Tolokers dataset from the โA Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress? <https://arxiv.org/abs/2302.11640>โ__ paper.
This dataset is based on data from the Toloka crowdsourcing platform. The nodes represent tolokers (workers). An edge connects two tolokers if they have worked on the same task. The goal is to predict which tolokers have been banned in one of the projects. Node features are based on the workerโs profile information and task performance statistics.
Statistics:
Nodes: 11758
Edges: 1038000
Classes: 2
Node features: 10
10 train/val/test splits
- Parameters
raw_dir (str, optional) โ Raw file directory to store the processed data. Default: ~/.dgl/
force_reload (bool, optional) โ Whether to re-download the data source. Default: False
verbose (bool, optional) โ Whether to print progress information. Default: True
transform (callable, optional) โ A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access. Default: None
Examples
>>> from dgl.data import TolokersDataset >>> dataset = TolokersDataset() >>> g = dataset[0] >>> num_classes = dataset.num_classes
>>> # get node features >>> feat = g.ndata["feat"]
>>> # get the first data split >>> train_mask = g.ndata["train_mask"][:, 0] >>> val_mask = g.ndata["val_mask"][:, 0] >>> test_mask = g.ndata["test_mask"][:, 0]
>>> # get labels >>> label = g.ndata['label']
-
__getitem__
(idx)ยถ Gets the data object at index.
-
__len__
()ยถ The number of examples in the dataset.