ZINCDatasetยถ
-
class
dgl.data.
ZINCDataset
(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]ยถ Bases:
dgl.data.dgl_dataset.DGLBuiltinDataset
ZINC dataset for the graph regression task.
A subset (12K) of ZINC molecular graphs (250K) dataset is used to regress a molecular property known as the constrained solubility. For each molecular graph, the node features are the types of heavy atoms, between which the edge features are the types of bonds. Each graph contains 9-37 nodes and 16-84 edges.
Reference https://arxiv.org/pdf/2003.00982.pdf
Statistics:
Train examples: 10,000 Valid examples: 1,000 Test examples: 1,000 Average number of nodes: 23.16 Average number of edges: 39.83 Number of atom types: 28 Number of bond types: 4
- Parameters
mode (str, optional) โ Should be chosen from [โtrainโ, โvalidโ, โtestโ] Default: โtrainโ.
raw_dir (str) โ Raw file directory to download/contains the input data directory. Default: โ~/.dgl/โ.
force_reload (bool) โ Whether to reload the dataset. Default: False.
verbose (bool) โ Whether to print out progress information. Default: False.
transform (callable, optional) โ A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
Examples
>>> from dgl.data import ZINCDataset
>>> training_set = ZINCDataset(mode="train") >>> training_set.num_atom_types 28 >>> len(training_set) 10000 >>> graph, label = training_set[0] >>> graph Graph(num_nodes=29, num_edges=64, ndata_schemes={'feat': Scheme(shape=(), dtype=torch.int64)} edata_schemes={'feat': Scheme(shape=(), dtype=torch.int64)})