QM9EdgeDataset

class dgl.data.QM9EdgeDataset(label_keys=None, raw_dir=None, force_reload=False, verbose=True, transform=None)[source]

Bases: DGLDataset

QM9Edge dataset for graph property prediction (regression)

This dataset consists of 130,831 molecules with 19 regression targets. Nodes correspond to atoms and edges correspond to bonds.

This dataset differs from QM9Dataset in the following aspects:
  1. It includes the bonds in a molecule in the edges of the corresponding graph while the edges in QM9Dataset are purely distance-based.

  2. It provides edge features, and node features in addition to the atoms’ coordinates and atomic numbers.

  3. It provides another 7 regression tasks(from 12 to 19).

This class is built based on a preprocessed version of the dataset, and we provide the preprocessing datails here.

Reference:

For Statistics:

  • Number of graphs: 130,831.

  • Number of regression targets: 19.

Node attributes:

  • pos: the 3D coordinates of each atom.

  • attr: the 11D atom features.

Edge attributes:

  • edge_attr: the 4D bond features.

Regression targets:

Keys

Property

Description

Unit

mu

\(\mu\)

Dipole moment

\(\textrm{D}\)

alpha

\(\alpha\)

Isotropic polarizability

\({a_0}^3\)

homo

\(\epsilon_{\textrm{HOMO}}\)

Highest occupied molecular orbital energy

\(\textrm{eV}\)

lumo

\(\epsilon_{\textrm{LUMO}}\)

Lowest unoccupied molecular orbital energy

\(\textrm{eV}\)

gap

\(\Delta \epsilon\)

Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)

\(\textrm{eV}\)

r2

\(\langle R^2 \rangle\)

Electronic spatial extent

\({a_0}^2\)

zpve

\(\textrm{ZPVE}\)

Zero point vibrational energy

\(\textrm{eV}\)

U0

\(U_0\)

Internal energy at 0K

\(\textrm{eV}\)

U

\(U\)

Internal energy at 298.15K

\(\textrm{eV}\)

H

\(H\)

Enthalpy at 298.15K

\(\textrm{eV}\)

G

\(G\)

Free energy at 298.15K

\(\textrm{eV}\)

Cv

\(c_{\textrm{v}}\)

Heat capavity at 298.15K

\(\frac{\textrm{cal}}{\textrm{mol K}}\)

U0_atom

\(U_0^{\textrm{ATOM}}\)

Atomization energy at 0K

\(\textrm{eV}\)

U_atom

\(U^{\textrm{ATOM}}\)

Atomization energy at 298.15K

\(\textrm{eV}\)

H_atom

\(H^{\textrm{ATOM}}\)

Atomization enthalpy at 298.15K

\(\textrm{eV}\)

G_atom

\(G^{\textrm{ATOM}}\)

Atomization free energy at 298.15K

\(\textrm{eV}\)

A

\(A\)

Rotational constant

\(\textrm{GHz}\)

B

\(B\)

Rotational constant

\(\textrm{GHz}\)

C

\(C\)

Rotational constant

\(\textrm{GHz}\)

Parameters:
  • label_keys (list) – Names of the regression property, which should be a subset of the keys in the table above. If not provided, it will load all the labels.

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False.

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_tasks

Number of prediction tasks

Type:

int

num_labels

(DEPRECATED, use num_tasks instead) Number of prediction tasks

Type:

int

Raises:

UserWarning – If the raw data is changed in the remote server by the author.

Examples

>>> data = QM9EdgeDataset(label_keys=['mu', 'alpha'])
>>> data.num_tasks
2
>>> # iterate over the dataset
>>> for graph, labels in data:
...     print(graph) # get information of each graph
...     print(labels) # get labels of the corresponding graph
...     # your code here...
>>>
__getitem__(idx)[source]

Get graph and label by index

Parameters:

idx (int) – Item index

Returns:

  • dgl.DGLGraph – The graph contains:

    • ndata['pos']: the coordinates of each atom

    • ndata['attr']: the features of each atom

    • edata['edge_attr']: the features of each bond

  • Tensor – Property values of molecular graphs

__len__()[source]

Number of graphs in the dataset.

Return type:

int