QM9Dataset

class dgl.data.QM9Dataset(label_keys, cutoff=5.0, raw_dir=None, force_reload=False, verbose=False, transform=None)[source]

Bases: dgl.data.dgl_dataset.DGLDataset

QM9 dataset for graph property prediction (regression)

This dataset consists of 130,831 molecules with 12 regression targets. Nodes correspond to atoms and edges correspond to close atom pairs.

This dataset differs from QM9EdgeDataset in the following aspects:
  1. Edges in this dataset are purely distance-based.

  2. It only provides atoms’ coordinates and atomic numbers as node features

  3. It only provides 12 regression targets.

Reference:

Statistics:

  • Number of graphs: 130,831

  • Number of regression targets: 12

Keys

Property

Description

Unit

mu

\(\mu\)

Dipole moment

\(\textrm{D}\)

alpha

\(\alpha\)

Isotropic polarizability

\({a_0}^3\)

homo

\(\epsilon_{\textrm{HOMO}}\)

Highest occupied molecular orbital energy

\(\textrm{eV}\)

lumo

\(\epsilon_{\textrm{LUMO}}\)

Lowest unoccupied molecular orbital energy

\(\textrm{eV}\)

gap

\(\Delta \epsilon\)

Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)

\(\textrm{eV}\)

r2

\(\langle R^2 \rangle\)

Electronic spatial extent

\({a_0}^2\)

zpve

\(\textrm{ZPVE}\)

Zero point vibrational energy

\(\textrm{eV}\)

U0

\(U_0\)

Internal energy at 0K

\(\textrm{eV}\)

U

\(U\)

Internal energy at 298.15K

\(\textrm{eV}\)

H

\(H\)

Enthalpy at 298.15K

\(\textrm{eV}\)

G

\(G\)

Free energy at 298.15K

\(\textrm{eV}\)

Cv

\(c_{\textrm{v}}\)

Heat capavity at 298.15K

\(\frac{\textrm{cal}}{\textrm{mol K}}\)

Parameters
  • label_keys (list) – Names of the regression property, which should be a subset of the keys in the table above.

  • cutoff (float) – Cutoff distance for interatomic interactions, i.e. two atoms are connected in the corresponding graph if the distance between them is no larger than this. Default: 5.0 Angstrom

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_labels

Number of labels for each graph, i.e. number of prediction tasks

Type

int

Raises

UserWarning – If the raw data is changed in the remote server by the author.

Examples

>>> data = QM9Dataset(label_keys=['mu', 'gap'], cutoff=5.0)
>>> data.num_labels
2
>>>
>>> # iterate over the dataset
>>> for g, label in data:
...     R = g.ndata['R'] # get coordinates of each atom
...     Z = g.ndata['Z'] # get atomic numbers of each atom
...     # your code here...
>>>
__getitem__(idx)[source]

Get graph and label by index

Parameters

idx (int) – Item index

Returns

  • dgl.DGLGraph – The graph contains:

    • ndata['R']: the coordinates of each atom

    • ndata['Z']: the atomic number

  • Tensor – Property values of molecular graphs

__len__()[source]

Number of graphs in the dataset.

Returns

Return type

int