Note

Click here to download the full example code

Stochastic Training of GNN for Link Prediction¶

This tutorial will show how to train a multi-layer GraphSAGE for link prediction on ogbn-arxiv provided by Open Graph Benchmark (OGB). The dataset contains around 170 thousand nodes and 1 million edges.

By the end of this tutorial, you will be able to

Train a GNN model for link prediction on a single GPU with DGL’s neighbor sampling components.

This tutorial assumes that you have read the Introduction of Neighbor Sampling for GNN Training and Neighbor Sampling for Node Classification.

Link Prediction Overview¶

Link prediction requires the model to predict the probability of existence of an edge. This tutorial does so by computing a dot product between the representations of both incident nodes.

\[\hat{y}_{u\sim v} = \sigma(h_u^T h_v)\]

It then minimizes the following binary cross entropy loss.

\[\mathcal{L} = -\sum_{u\sim v\in \mathcal{D}}\left( y_{u\sim v}\log(\hat{y}_{u\sim v}) + (1-y_{u\sim v})\log(1-\hat{y}_{u\sim v})) \right)\]

This is identical to the link prediction formulation in the previous tutorial on link prediction.

Loading Dataset¶

This tutorial loads the dataset from the ogb package as in the previous tutorial.

import os

os.environ["DGLBACKEND"] = "pytorch"
import dgl
import numpy as np
import torch
from ogb.nodeproppred import DglNodePropPredDataset

dataset = DglNodePropPredDataset("ogbn-arxiv")
device = "cpu"  # change to 'cuda' for GPU

graph, node_labels = dataset[0]
# Add reverse edges since ogbn-arxiv is unidirectional.
graph = dgl.add_reverse_edges(graph)
print(graph)
print(node_labels)

node_features = graph.ndata["feat"]
node_labels = node_labels[:, 0]
num_features = node_features.shape[1]
num_classes = (node_labels.max() + 1).item()
print("Number of classes:", num_classes)

idx_split = dataset.get_idx_split()
train_nids = idx_split["train"]
valid_nids = idx_split["valid"]
test_nids = idx_split["test"]

Out:

Graph(num_nodes=169343, num_edges=2332486,
      ndata_schemes={'year': Scheme(shape=(1,), dtype=torch.int64), 'feat': Scheme(shape=(128,), dtype=torch.float32)}
      edata_schemes={})
tensor([[ 4],
        [ 5],
        [28],
        ...,
        [10],
        [ 4],
        [ 1]])
Number of classes: 40

Defining Neighbor Sampler and Data Loader in DGL¶

Different from the link prediction tutorial for full graph, a common practice to train GNN on large graphs is to iterate over the edges in minibatches, since computing the probability of all edges is usually impossible. For each minibatch of edges, you compute the output representation of their incident nodes using neighbor sampling and GNN, in a similar fashion introduced in the large-scale node classification tutorial.

DGL provides dgl.dataloading.as_edge_prediction_sampler to iterate over edges for edge classification or link prediction tasks.

To perform link prediction, you need to specify a negative sampler. DGL provides builtin negative samplers such as dgl.dataloading.negative_sampler.Uniform. Here this tutorial uniformly draws 5 negative examples per positive example.

negative_sampler = dgl.dataloading.negative_sampler.Uniform(5)

After defining the negative sampler, one can then define the edge data loader with neighbor sampling. To create an DataLoader for link prediction, provide a neighbor sampler object as well as the negative sampler object created above.

sampler = dgl.dataloading.NeighborSampler([4, 4])
sampler = dgl.dataloading.as_edge_prediction_sampler(
    sampler, negative_sampler=negative_sampler
)
train_dataloader = dgl.dataloading.DataLoader(
    # The following arguments are specific to DataLoader.
    graph,  # The graph
    torch.arange(graph.num_edges()),  # The edges to iterate over
    sampler,  # The neighbor sampler
    device=device,  # Put the MFGs on CPU or GPU
    # The following arguments are inherited from PyTorch DataLoader.
    batch_size=1024,  # Batch size
    shuffle=True,  # Whether to shuffle the nodes for every epoch
    drop_last=False,  # Whether to drop the last incomplete batch
    num_workers=0,  # Number of sampler processes
)

You can peek one minibatch from train_dataloader and see what it will give you.

input_nodes, pos_graph, neg_graph, mfgs = next(iter(train_dataloader))
print("Number of input nodes:", len(input_nodes))
print(
    "Positive graph # nodes:",
    pos_graph.num_nodes(),
    "# edges:",
    pos_graph.num_edges(),
)
print(
    "Negative graph # nodes:",
    neg_graph.num_nodes(),
    "# edges:",
    neg_graph.num_edges(),
)
print(mfgs)

Out:

/home/ubuntu/prod-doc/readthedocs.org/user_builds/dgl/checkouts/1.1.x/python/dgl/dataloading/dataloader.py:1150: DGLWarning: Dataloader CPU affinity opt is not enabled, consider switching it on (see enable_cpu_affinity() or CPU best practices for DGL [https://docs.dgl.ai/tutorials/cpu/cpu_best_practises.html])
  f"Dataloader CPU affinity opt is not enabled, consider switching it on "
Number of input nodes: 57279
Positive graph # nodes: 6890 # edges: 1024
Negative graph # nodes: 6890 # edges: 5120
[Block(num_src_nodes=57279, num_dst_nodes=23975, num_edges=89511), Block(num_src_nodes=23975, num_dst_nodes=6890, num_edges=24215)]

The example minibatch consists of four elements.

The first element is an ID tensor for the input nodes, i.e., nodes whose input features are needed on the first GNN layer for this minibatch.

The second element and the third element are the positive graph and the negative graph for this minibatch. The concept of positive and negative graphs have been introduced in the full-graph link prediction tutorial. In minibatch training, the positive graph and the negative graph only contain nodes necessary for computing the pair-wise scores of positive and negative examples in the current minibatch.

The last element is a list of MFGs storing the computation dependencies for each GNN layer. The MFGs are used to compute the GNN outputs of the nodes involved in positive/negative graph.

Defining Model for Node Representation¶

The model is almost identical to the one in the node classification tutorial. The only difference is that since you are doing link prediction, the output dimension will not be the number of classes in the dataset.

import torch.nn as nn
import torch.nn.functional as F
from dgl.nn import SAGEConv


class Model(nn.Module):
    def __init__(self, in_feats, h_feats):
        super(Model, self).__init__()
        self.conv1 = SAGEConv(in_feats, h_feats, aggregator_type="mean")
        self.conv2 = SAGEConv(h_feats, h_feats, aggregator_type="mean")
        self.h_feats = h_feats

    def forward(self, mfgs, x):
        h_dst = x[: mfgs[0].num_dst_nodes()]
        h = self.conv1(mfgs[0], (x, h_dst))
        h = F.relu(h)
        h_dst = h[: mfgs[1].num_dst_nodes()]
        h = self.conv2(mfgs[1], (h, h_dst))
        return h


model = Model(num_features, 128).to(device)

Defining the Score Predictor for Edges¶

After getting the node representation necessary for the minibatch, the last thing to do is to predict the score of the edges and non-existent edges in the sampled minibatch.

The following score predictor, copied from the link prediction tutorial, takes a dot product between the incident nodes’ representations.

import dgl.function as fn


class DotPredictor(nn.Module):
    def forward(self, g, h):
        with g.local_scope():
            g.ndata["h"] = h
            # Compute a new edge feature named 'score' by a dot-product between the
            # source node feature 'h' and destination node feature 'h'.
            g.apply_edges(fn.u_dot_v("h", "h", "score"))
            # u_dot_v returns a 1-element vector for each edge so you need to squeeze it.
            return g.edata["score"][:, 0]

Evaluating Performance with Unsupervised Learning (Optional)¶

There are various ways to evaluate the performance of link prediction. This tutorial follows the practice of GraphSAGE paper. Basically, it first trains a GNN via link prediction, and get an embedding for each node. Then it trains a downstream classifier on top of this embedding and compute the accuracy as an assessment of the embedding quality.

To obtain the representations of all the nodes, this tutorial uses neighbor sampling as introduced in the node classification tutorial.

Note

If you would like to obtain node representations without neighbor sampling during inference, please refer to this user guide.

def inference(model, graph, node_features):
    with torch.no_grad():
        nodes = torch.arange(graph.num_nodes())

        sampler = dgl.dataloading.NeighborSampler([4, 4])
        train_dataloader = dgl.dataloading.DataLoader(
            graph,
            torch.arange(graph.num_nodes()),
            sampler,
            batch_size=1024,
            shuffle=False,
            drop_last=False,
            num_workers=4,
            device=device,
        )

        result = []
        for input_nodes, output_nodes, mfgs in train_dataloader:
            # feature copy from CPU to GPU takes place here
            inputs = mfgs[0].srcdata["feat"]
            result.append(model(mfgs, inputs))

        return torch.cat(result)


import sklearn.metrics


def evaluate(emb, label, train_nids, valid_nids, test_nids):
    classifier = nn.Linear(emb.shape[1], num_classes).to(device)
    opt = torch.optim.LBFGS(classifier.parameters())

    def compute_loss():
        pred = classifier(emb[train_nids].to(device))
        loss = F.cross_entropy(pred, label[train_nids].to(device))
        return loss

    def closure():
        loss = compute_loss()
        opt.zero_grad()
        loss.backward()
        return loss

    prev_loss = float("inf")
    for i in range(1000):
        opt.step(closure)
        with torch.no_grad():
            loss = compute_loss().item()
            if np.abs(loss - prev_loss) < 1e-4:
                print("Converges at iteration", i)
                break
            else:
                prev_loss = loss

    with torch.no_grad():
        pred = classifier(emb.to(device)).cpu()
        label = label
        valid_acc = sklearn.metrics.accuracy_score(
            label[valid_nids].numpy(), pred[valid_nids].numpy().argmax(1)
        )
        test_acc = sklearn.metrics.accuracy_score(
            label[test_nids].numpy(), pred[test_nids].numpy().argmax(1)
        )
    return valid_acc, test_acc

Defining Training Loop¶

The following initializes the model and defines the optimizer.

model = Model(node_features.shape[1], 128).to(device)
predictor = DotPredictor().to(device)
opt = torch.optim.Adam(list(model.parameters()) + list(predictor.parameters()))


import sklearn.metrics

The following is the training loop for link prediction and evaluation, and also saves the model that performs the best on the validation set:

import tqdm

best_accuracy = 0
best_model_path = "model.pt"
for epoch in range(1):
    with tqdm.tqdm(train_dataloader) as tq:
        for step, (input_nodes, pos_graph, neg_graph, mfgs) in enumerate(tq):
            # feature copy from CPU to GPU takes place here
            inputs = mfgs[0].srcdata["feat"]

            outputs = model(mfgs, inputs)
            pos_score = predictor(pos_graph, outputs)
            neg_score = predictor(neg_graph, outputs)

            score = torch.cat([pos_score, neg_score])
            label = torch.cat(
                [torch.ones_like(pos_score), torch.zeros_like(neg_score)]
            )
            loss = F.binary_cross_entropy_with_logits(score, label)

            opt.zero_grad()
            loss.backward()
            opt.step()

            tq.set_postfix({"loss": "%.03f" % loss.item()}, refresh=False)

            if (step + 1) % 500 == 0:
                model.eval()
                emb = inference(model, graph, node_features)
                valid_acc, test_acc = evaluate(
                    emb, node_labels, train_nids, valid_nids, test_nids
                )
                print(
                    "Epoch {} Validation Accuracy {} Test Accuracy {}".format(
                        epoch, valid_acc, test_acc
                    )
                )
                if best_accuracy < valid_acc:
                    best_accuracy = valid_acc
                    torch.save(model.state_dict(), best_model_path)
                model.train()

                # Note that this tutorial do not train the whole model to the end.
                break

Out:

  0%|          | 0/2278 [00:00<?, ?it/s]
  0%|          | 1/2278 [00:00<12:30,  3.03it/s, loss=45.239]
  0%|          | 2/2278 [00:00<09:07,  4.16it/s, loss=34.076]
  0%|          | 3/2278 [00:00<08:02,  4.72it/s, loss=25.293]
  0%|          | 4/2278 [00:00<07:34,  5.00it/s, loss=18.539]
  0%|          | 5/2278 [00:01<07:14,  5.24it/s, loss=13.384]
  0%|          | 6/2278 [00:01<07:06,  5.32it/s, loss=9.647]
  0%|          | 7/2278 [00:01<07:08,  5.30it/s, loss=6.883]
  0%|          | 8/2278 [00:01<07:04,  5.35it/s, loss=4.964]
  0%|          | 9/2278 [00:01<07:02,  5.38it/s, loss=3.714]
  0%|          | 10/2278 [00:01<06:58,  5.42it/s, loss=2.956]
  0%|          | 11/2278 [00:02<07:00,  5.39it/s, loss=2.489]
  1%|          | 12/2278 [00:02<06:59,  5.40it/s, loss=2.232]
  1%|          | 13/2278 [00:02<06:57,  5.42it/s, loss=2.080]
  1%|          | 14/2278 [00:02<07:04,  5.33it/s, loss=1.941]
  1%|          | 15/2278 [00:02<07:01,  5.37it/s, loss=1.758]
  1%|          | 16/2278 [00:03<06:59,  5.40it/s, loss=1.622]
  1%|          | 17/2278 [00:03<07:03,  5.34it/s, loss=1.471]
  1%|          | 18/2278 [00:03<06:52,  5.48it/s, loss=1.339]
  1%|          | 19/2278 [00:03<06:24,  5.88it/s, loss=1.204]
  1%|          | 20/2278 [00:03<06:33,  5.74it/s, loss=1.105]
  1%|          | 21/2278 [00:03<06:35,  5.71it/s, loss=1.048]
  1%|          | 22/2278 [00:04<06:43,  5.60it/s, loss=0.966]
  1%|1         | 23/2278 [00:04<06:40,  5.63it/s, loss=0.923]
  1%|1         | 24/2278 [00:04<07:38,  4.92it/s, loss=0.901]
  1%|1         | 25/2278 [00:04<07:20,  5.11it/s, loss=0.877]
  1%|1         | 26/2278 [00:04<07:10,  5.23it/s, loss=0.856]
  1%|1         | 27/2278 [00:05<07:02,  5.33it/s, loss=0.846]
  1%|1         | 28/2278 [00:05<06:56,  5.40it/s, loss=0.842]
  1%|1         | 29/2278 [00:05<06:51,  5.47it/s, loss=0.830]
  1%|1         | 30/2278 [00:05<06:39,  5.62it/s, loss=0.816]
  1%|1         | 31/2278 [00:05<06:44,  5.55it/s, loss=0.819]
  1%|1         | 32/2278 [00:05<06:43,  5.56it/s, loss=0.811]
  1%|1         | 33/2278 [00:06<06:44,  5.55it/s, loss=0.805]
  1%|1         | 34/2278 [00:06<06:44,  5.55it/s, loss=0.797]
  2%|1         | 35/2278 [00:06<06:43,  5.56it/s, loss=0.792]
  2%|1         | 36/2278 [00:06<06:45,  5.52it/s, loss=0.785]
  2%|1         | 37/2278 [00:06<06:43,  5.56it/s, loss=0.775]
  2%|1         | 38/2278 [00:07<06:42,  5.56it/s, loss=0.773]
  2%|1         | 39/2278 [00:07<06:36,  5.64it/s, loss=0.764]
  2%|1         | 40/2278 [00:07<06:35,  5.66it/s, loss=0.750]
  2%|1         | 41/2278 [00:07<06:46,  5.50it/s, loss=0.743]
  2%|1         | 42/2278 [00:07<06:47,  5.49it/s, loss=0.732]
  2%|1         | 43/2278 [00:07<06:46,  5.50it/s, loss=0.733]
  2%|1         | 44/2278 [00:08<06:43,  5.54it/s, loss=0.731]
  2%|1         | 45/2278 [00:08<06:41,  5.57it/s, loss=0.721]
  2%|2         | 46/2278 [00:08<06:37,  5.61it/s, loss=0.712]
  2%|2         | 47/2278 [00:08<06:40,  5.57it/s, loss=0.718]
  2%|2         | 48/2278 [00:08<06:37,  5.62it/s, loss=0.710]
  2%|2         | 49/2278 [00:09<06:40,  5.56it/s, loss=0.704]
  2%|2         | 50/2278 [00:09<06:42,  5.53it/s, loss=0.708]
  2%|2         | 51/2278 [00:09<06:40,  5.56it/s, loss=0.707]
  2%|2         | 52/2278 [00:09<06:48,  5.45it/s, loss=0.699]
  2%|2         | 53/2278 [00:09<06:41,  5.54it/s, loss=0.699]
  2%|2         | 54/2278 [00:09<06:36,  5.61it/s, loss=0.692]
  2%|2         | 55/2278 [00:10<06:31,  5.68it/s, loss=0.689]
  2%|2         | 56/2278 [00:10<06:27,  5.73it/s, loss=0.680]
  3%|2         | 57/2278 [00:10<06:27,  5.73it/s, loss=0.692]
  3%|2         | 58/2278 [00:10<06:39,  5.55it/s, loss=0.681]
  3%|2         | 59/2278 [00:10<06:37,  5.58it/s, loss=0.690]
  3%|2         | 60/2278 [00:11<06:36,  5.59it/s, loss=0.689]
  3%|2         | 61/2278 [00:11<06:34,  5.62it/s, loss=0.681]
  3%|2         | 62/2278 [00:11<06:38,  5.57it/s, loss=0.679]
  3%|2         | 63/2278 [00:11<06:52,  5.37it/s, loss=0.675]
  3%|2         | 64/2278 [00:11<06:48,  5.42it/s, loss=0.679]
  3%|2         | 65/2278 [00:11<06:43,  5.49it/s, loss=0.677]
  3%|2         | 66/2278 [00:12<06:45,  5.46it/s, loss=0.672]
  3%|2         | 67/2278 [00:12<06:40,  5.52it/s, loss=0.671]
  3%|2         | 68/2278 [00:12<06:39,  5.54it/s, loss=0.677]
  3%|3         | 69/2278 [00:12<06:36,  5.57it/s, loss=0.670]
  3%|3         | 70/2278 [00:12<06:34,  5.60it/s, loss=0.680]
  3%|3         | 71/2278 [00:13<06:37,  5.56it/s, loss=0.678]
  3%|3         | 72/2278 [00:13<06:37,  5.55it/s, loss=0.677]
  3%|3         | 73/2278 [00:13<06:37,  5.55it/s, loss=0.672]
  3%|3         | 74/2278 [00:13<06:39,  5.52it/s, loss=0.668]
  3%|3         | 75/2278 [00:13<06:38,  5.53it/s, loss=0.671]
  3%|3         | 76/2278 [00:13<06:35,  5.57it/s, loss=0.676]
  3%|3         | 77/2278 [00:14<06:35,  5.57it/s, loss=0.674]
  3%|3         | 78/2278 [00:14<06:35,  5.56it/s, loss=0.677]
  3%|3         | 79/2278 [00:14<06:35,  5.55it/s, loss=0.672]
  4%|3         | 80/2278 [00:14<06:39,  5.50it/s, loss=0.673]
  4%|3         | 81/2278 [00:14<06:37,  5.52it/s, loss=0.672]
  4%|3         | 82/2278 [00:15<06:38,  5.51it/s, loss=0.671]
  4%|3         | 83/2278 [00:15<06:38,  5.51it/s, loss=0.671]
  4%|3         | 84/2278 [00:15<06:28,  5.65it/s, loss=0.669]
  4%|3         | 85/2278 [00:15<06:19,  5.78it/s, loss=0.667]
  4%|3         | 86/2278 [00:15<06:13,  5.87it/s, loss=0.673]
  4%|3         | 87/2278 [00:15<06:21,  5.74it/s, loss=0.678]
  4%|3         | 88/2278 [00:16<06:26,  5.67it/s, loss=0.668]
  4%|3         | 89/2278 [00:16<06:19,  5.76it/s, loss=0.675]
  4%|3         | 90/2278 [00:16<06:15,  5.83it/s, loss=0.671]
  4%|3         | 91/2278 [00:16<06:16,  5.81it/s, loss=0.672]
  4%|4         | 92/2278 [00:16<06:14,  5.83it/s, loss=0.675]
  4%|4         | 93/2278 [00:16<06:22,  5.71it/s, loss=0.676]
  4%|4         | 94/2278 [00:17<06:21,  5.73it/s, loss=0.672]
  4%|4         | 95/2278 [00:17<06:27,  5.63it/s, loss=0.670]
  4%|4         | 96/2278 [00:17<06:29,  5.60it/s, loss=0.671]
  4%|4         | 97/2278 [00:17<06:32,  5.55it/s, loss=0.667]
  4%|4         | 98/2278 [00:17<06:32,  5.55it/s, loss=0.670]
  4%|4         | 99/2278 [00:17<06:24,  5.67it/s, loss=0.680]
  4%|4         | 100/2278 [00:18<06:32,  5.55it/s, loss=0.668]
  4%|4         | 101/2278 [00:18<06:32,  5.54it/s, loss=0.672]
  4%|4         | 102/2278 [00:18<06:25,  5.64it/s, loss=0.676]
  5%|4         | 103/2278 [00:18<06:29,  5.59it/s, loss=0.668]
  5%|4         | 104/2278 [00:18<06:29,  5.59it/s, loss=0.668]
  5%|4         | 105/2278 [00:19<06:29,  5.58it/s, loss=0.676]
  5%|4         | 106/2278 [00:19<06:30,  5.57it/s, loss=0.664]
  5%|4         | 107/2278 [00:19<06:33,  5.51it/s, loss=0.668]
  5%|4         | 108/2278 [00:19<06:34,  5.51it/s, loss=0.666]
  5%|4         | 109/2278 [00:19<06:24,  5.65it/s, loss=0.674]
  5%|4         | 110/2278 [00:19<06:27,  5.59it/s, loss=0.663]
  5%|4         | 111/2278 [00:20<06:29,  5.56it/s, loss=0.665]
  5%|4         | 112/2278 [00:20<06:21,  5.67it/s, loss=0.676]
  5%|4         | 113/2278 [00:20<06:20,  5.68it/s, loss=0.670]
  5%|5         | 114/2278 [00:20<06:08,  5.87it/s, loss=0.669]
  5%|5         | 115/2278 [00:20<06:12,  5.81it/s, loss=0.660]
  5%|5         | 116/2278 [00:20<06:17,  5.72it/s, loss=0.669]
  5%|5         | 117/2278 [00:21<06:18,  5.71it/s, loss=0.670]
  5%|5         | 118/2278 [00:21<06:22,  5.65it/s, loss=0.670]
  5%|5         | 119/2278 [00:21<06:23,  5.63it/s, loss=0.669]
  5%|5         | 120/2278 [00:21<06:23,  5.63it/s, loss=0.670]
  5%|5         | 121/2278 [00:21<06:20,  5.66it/s, loss=0.671]
  5%|5         | 122/2278 [00:22<06:05,  5.89it/s, loss=0.675]
  5%|5         | 123/2278 [00:22<06:07,  5.87it/s, loss=0.668]
  5%|5         | 124/2278 [00:22<06:12,  5.78it/s, loss=0.675]
  5%|5         | 125/2278 [00:22<06:14,  5.76it/s, loss=0.667]
  6%|5         | 126/2278 [00:22<06:10,  5.81it/s, loss=0.664]
  6%|5         | 127/2278 [00:22<06:19,  5.67it/s, loss=0.664]
  6%|5         | 128/2278 [00:23<06:13,  5.76it/s, loss=0.665]
  6%|5         | 129/2278 [00:23<06:17,  5.69it/s, loss=0.669]
  6%|5         | 130/2278 [00:23<06:21,  5.63it/s, loss=0.662]
  6%|5         | 131/2278 [00:23<06:23,  5.60it/s, loss=0.669]
  6%|5         | 132/2278 [00:23<06:25,  5.57it/s, loss=0.667]
  6%|5         | 133/2278 [00:23<06:25,  5.56it/s, loss=0.666]
  6%|5         | 134/2278 [00:24<06:12,  5.75it/s, loss=0.669]
  6%|5         | 135/2278 [00:24<06:14,  5.72it/s, loss=0.669]
  6%|5         | 136/2278 [00:24<06:21,  5.62it/s, loss=0.662]
  6%|6         | 137/2278 [00:24<06:27,  5.52it/s, loss=0.661]
  6%|6         | 138/2278 [00:24<06:15,  5.70it/s, loss=0.667]
  6%|6         | 139/2278 [00:25<06:10,  5.77it/s, loss=0.665]
  6%|6         | 140/2278 [00:25<06:12,  5.73it/s, loss=0.660]
  6%|6         | 141/2278 [00:25<06:16,  5.68it/s, loss=0.664]
  6%|6         | 142/2278 [00:25<06:16,  5.67it/s, loss=0.669]
  6%|6         | 143/2278 [00:25<06:19,  5.62it/s, loss=0.667]
  6%|6         | 144/2278 [00:25<06:21,  5.60it/s, loss=0.666]
  6%|6         | 145/2278 [00:26<06:15,  5.69it/s, loss=0.668]
  6%|6         | 146/2278 [00:26<06:18,  5.63it/s, loss=0.670]
  6%|6         | 147/2278 [00:26<06:13,  5.70it/s, loss=0.671]
  6%|6         | 148/2278 [00:26<06:17,  5.64it/s, loss=0.665]
  7%|6         | 149/2278 [00:26<06:19,  5.61it/s, loss=0.668]
  7%|6         | 150/2278 [00:26<06:14,  5.68it/s, loss=0.661]
  7%|6         | 151/2278 [00:27<06:14,  5.67it/s, loss=0.669]
  7%|6         | 152/2278 [00:27<06:09,  5.75it/s, loss=0.672]
  7%|6         | 153/2278 [00:27<06:04,  5.83it/s, loss=0.669]
  7%|6         | 154/2278 [00:27<06:06,  5.79it/s, loss=0.669]
  7%|6         | 155/2278 [00:27<06:10,  5.73it/s, loss=0.662]
  7%|6         | 156/2278 [00:27<05:54,  5.98it/s, loss=0.670]
  7%|6         | 157/2278 [00:28<06:07,  5.77it/s, loss=0.666]
  7%|6         | 158/2278 [00:28<06:22,  5.55it/s, loss=0.667]
  7%|6         | 159/2278 [00:28<06:15,  5.64it/s, loss=0.669]
  7%|7         | 160/2278 [00:28<06:03,  5.83it/s, loss=0.669]
  7%|7         | 161/2278 [00:28<05:49,  6.06it/s, loss=0.667]
  7%|7         | 162/2278 [00:29<05:51,  6.02it/s, loss=0.673]
  7%|7         | 163/2278 [00:29<05:42,  6.17it/s, loss=0.664]
  7%|7         | 164/2278 [00:29<05:44,  6.14it/s, loss=0.673]
  7%|7         | 165/2278 [00:29<05:44,  6.14it/s, loss=0.672]
  7%|7         | 166/2278 [00:29<05:58,  5.90it/s, loss=0.659]
  7%|7         | 167/2278 [00:29<06:05,  5.77it/s, loss=0.668]
  7%|7         | 168/2278 [00:30<06:11,  5.68it/s, loss=0.661]
  7%|7         | 169/2278 [00:30<06:15,  5.62it/s, loss=0.666]
  7%|7         | 170/2278 [00:30<06:16,  5.59it/s, loss=0.666]
  8%|7         | 171/2278 [00:30<06:28,  5.42it/s, loss=0.662]
  8%|7         | 172/2278 [00:30<06:30,  5.40it/s, loss=0.661]
  8%|7         | 173/2278 [00:30<06:28,  5.42it/s, loss=0.664]
  8%|7         | 174/2278 [00:31<06:20,  5.52it/s, loss=0.662]
  8%|7         | 175/2278 [00:31<06:21,  5.51it/s, loss=0.666]
  8%|7         | 176/2278 [00:31<06:22,  5.49it/s, loss=0.665]
  8%|7         | 177/2278 [00:31<06:17,  5.56it/s, loss=0.667]
  8%|7         | 178/2278 [00:31<06:17,  5.57it/s, loss=0.665]
  8%|7         | 179/2278 [00:32<06:16,  5.58it/s, loss=0.671]
  8%|7         | 180/2278 [00:32<06:03,  5.78it/s, loss=0.665]
  8%|7         | 181/2278 [00:32<05:50,  5.98it/s, loss=0.658]
  8%|7         | 182/2278 [00:32<06:01,  5.80it/s, loss=0.666]
  8%|8         | 183/2278 [00:32<05:47,  6.03it/s, loss=0.662]
  8%|8         | 184/2278 [00:32<05:58,  5.84it/s, loss=0.660]
  8%|8         | 185/2278 [00:33<06:06,  5.71it/s, loss=0.665]
  8%|8         | 186/2278 [00:33<06:10,  5.65it/s, loss=0.665]
  8%|8         | 187/2278 [00:33<06:13,  5.60it/s, loss=0.665]
  8%|8         | 188/2278 [00:33<06:07,  5.68it/s, loss=0.666]
  8%|8         | 189/2278 [00:33<06:00,  5.79it/s, loss=0.663]
  8%|8         | 190/2278 [00:33<05:59,  5.81it/s, loss=0.666]
  8%|8         | 191/2278 [00:34<05:57,  5.84it/s, loss=0.663]
  8%|8         | 192/2278 [00:34<05:59,  5.80it/s, loss=0.675]
  8%|8         | 193/2278 [00:34<06:03,  5.73it/s, loss=0.663]
  9%|8         | 194/2278 [00:34<06:04,  5.72it/s, loss=0.663]
  9%|8         | 195/2278 [00:34<06:10,  5.62it/s, loss=0.661]
  9%|8         | 196/2278 [00:35<06:15,  5.54it/s, loss=0.658]
  9%|8         | 197/2278 [00:35<06:03,  5.73it/s, loss=0.668]
  9%|8         | 198/2278 [00:35<05:57,  5.82it/s, loss=0.663]
  9%|8         | 199/2278 [00:35<06:06,  5.67it/s, loss=0.655]
  9%|8         | 200/2278 [00:35<06:12,  5.58it/s, loss=0.661]
  9%|8         | 201/2278 [00:35<06:11,  5.59it/s, loss=0.660]
  9%|8         | 202/2278 [00:36<06:12,  5.57it/s, loss=0.657]
  9%|8         | 203/2278 [00:36<06:00,  5.76it/s, loss=0.657]
  9%|8         | 204/2278 [00:36<06:11,  5.58it/s, loss=0.669]
  9%|8         | 205/2278 [00:36<06:01,  5.73it/s, loss=0.670]
  9%|9         | 206/2278 [00:36<06:01,  5.73it/s, loss=0.670]
  9%|9         | 207/2278 [00:36<05:51,  5.89it/s, loss=0.661]
  9%|9         | 208/2278 [00:37<05:53,  5.86it/s, loss=0.664]
  9%|9         | 209/2278 [00:37<06:01,  5.72it/s, loss=0.660]
  9%|9         | 210/2278 [00:37<06:01,  5.72it/s, loss=0.662]
  9%|9         | 211/2278 [00:37<06:10,  5.57it/s, loss=0.663]
  9%|9         | 212/2278 [00:37<06:16,  5.49it/s, loss=0.656]
  9%|9         | 213/2278 [00:38<06:11,  5.55it/s, loss=0.666]
  9%|9         | 214/2278 [00:38<06:12,  5.54it/s, loss=0.666]
  9%|9         | 215/2278 [00:38<06:14,  5.51it/s, loss=0.660]
  9%|9         | 216/2278 [00:38<06:15,  5.49it/s, loss=0.665]
 10%|9         | 217/2278 [00:38<06:11,  5.55it/s, loss=0.662]
 10%|9         | 218/2278 [00:38<06:13,  5.52it/s, loss=0.661]
 10%|9         | 219/2278 [00:39<06:12,  5.52it/s, loss=0.663]
 10%|9         | 220/2278 [00:39<06:10,  5.55it/s, loss=0.660]
 10%|9         | 221/2278 [00:39<05:51,  5.84it/s, loss=0.660]
 10%|9         | 222/2278 [00:39<05:51,  5.86it/s, loss=0.658]
 10%|9         | 223/2278 [00:39<05:59,  5.72it/s, loss=0.659]
 10%|9         | 224/2278 [00:39<06:05,  5.61it/s, loss=0.663]
 10%|9         | 225/2278 [00:40<06:08,  5.56it/s, loss=0.664]
 10%|9         | 226/2278 [00:40<06:13,  5.50it/s, loss=0.652]
 10%|9         | 227/2278 [00:40<06:14,  5.47it/s, loss=0.661]
 10%|#         | 228/2278 [00:40<06:15,  5.45it/s, loss=0.666]
 10%|#         | 229/2278 [00:40<06:02,  5.65it/s, loss=0.663]
 10%|#         | 230/2278 [00:41<05:54,  5.78it/s, loss=0.665]
 10%|#         | 231/2278 [00:41<05:49,  5.85it/s, loss=0.662]
 10%|#         | 232/2278 [00:41<05:57,  5.73it/s, loss=0.661]
 10%|#         | 233/2278 [00:41<06:02,  5.64it/s, loss=0.658]
 10%|#         | 234/2278 [00:41<06:06,  5.57it/s, loss=0.670]
 10%|#         | 235/2278 [00:41<06:08,  5.55it/s, loss=0.662]
 10%|#         | 236/2278 [00:42<06:02,  5.64it/s, loss=0.663]
 10%|#         | 237/2278 [00:42<06:05,  5.59it/s, loss=0.658]
 10%|#         | 238/2278 [00:42<06:09,  5.52it/s, loss=0.664]
 10%|#         | 239/2278 [00:42<06:10,  5.50it/s, loss=0.664]
 11%|#         | 240/2278 [00:42<06:11,  5.48it/s, loss=0.665]
 11%|#         | 241/2278 [00:43<06:12,  5.47it/s, loss=0.669]
 11%|#         | 242/2278 [00:43<06:10,  5.50it/s, loss=0.653]
 11%|#         | 243/2278 [00:43<06:10,  5.49it/s, loss=0.666]
 11%|#         | 244/2278 [00:43<06:09,  5.50it/s, loss=0.665]
 11%|#         | 245/2278 [00:43<05:54,  5.73it/s, loss=0.661]
 11%|#         | 246/2278 [00:43<05:52,  5.76it/s, loss=0.669]
 11%|#         | 247/2278 [00:44<05:56,  5.69it/s, loss=0.659]
 11%|#         | 248/2278 [00:44<06:00,  5.63it/s, loss=0.662]
 11%|#         | 249/2278 [00:44<05:42,  5.92it/s, loss=0.660]
 11%|#         | 250/2278 [00:44<05:47,  5.84it/s, loss=0.662]
 11%|#1        | 251/2278 [00:44<05:44,  5.88it/s, loss=0.665]
 11%|#1        | 252/2278 [00:44<05:55,  5.69it/s, loss=0.658]
 11%|#1        | 253/2278 [00:45<05:57,  5.66it/s, loss=0.663]
 11%|#1        | 254/2278 [00:45<05:59,  5.62it/s, loss=0.661]
 11%|#1        | 255/2278 [00:45<06:01,  5.59it/s, loss=0.664]
 11%|#1        | 256/2278 [00:45<06:02,  5.58it/s, loss=0.662]
 11%|#1        | 257/2278 [00:45<05:54,  5.70it/s, loss=0.665]
 11%|#1        | 258/2278 [00:46<05:57,  5.65it/s, loss=0.658]
 11%|#1        | 259/2278 [00:46<06:00,  5.60it/s, loss=0.660]
 11%|#1        | 260/2278 [00:46<05:42,  5.89it/s, loss=0.661]
 11%|#1        | 261/2278 [00:46<05:49,  5.77it/s, loss=0.662]
 12%|#1        | 262/2278 [00:46<05:43,  5.86it/s, loss=0.667]
 12%|#1        | 263/2278 [00:46<05:46,  5.82it/s, loss=0.660]
 12%|#1        | 264/2278 [00:47<05:51,  5.73it/s, loss=0.661]
 12%|#1        | 265/2278 [00:47<05:54,  5.67it/s, loss=0.657]
 12%|#1        | 266/2278 [00:47<05:41,  5.89it/s, loss=0.664]
 12%|#1        | 267/2278 [00:47<05:49,  5.75it/s, loss=0.660]
 12%|#1        | 268/2278 [00:47<05:37,  5.95it/s, loss=0.663]
 12%|#1        | 269/2278 [00:47<05:51,  5.72it/s, loss=0.664]
 12%|#1        | 270/2278 [00:48<05:55,  5.65it/s, loss=0.658]
 12%|#1        | 271/2278 [00:48<05:59,  5.58it/s, loss=0.662]
 12%|#1        | 272/2278 [00:48<06:04,  5.50it/s, loss=0.656]
 12%|#1        | 273/2278 [00:48<06:05,  5.48it/s, loss=0.663]
 12%|#2        | 274/2278 [00:48<06:05,  5.48it/s, loss=0.661]
 12%|#2        | 275/2278 [00:48<06:02,  5.53it/s, loss=0.667]
 12%|#2        | 276/2278 [00:49<06:02,  5.53it/s, loss=0.662]
 12%|#2        | 277/2278 [00:49<05:59,  5.57it/s, loss=0.661]
 12%|#2        | 278/2278 [00:49<05:56,  5.61it/s, loss=0.662]
 12%|#2        | 279/2278 [00:49<05:54,  5.64it/s, loss=0.662]
 12%|#2        | 280/2278 [00:49<05:54,  5.63it/s, loss=0.666]
 12%|#2        | 281/2278 [00:50<05:47,  5.75it/s, loss=0.664]
 12%|#2        | 282/2278 [00:50<05:52,  5.66it/s, loss=0.661]
 12%|#2        | 283/2278 [00:50<05:56,  5.60it/s, loss=0.660]
 12%|#2        | 284/2278 [00:50<05:57,  5.58it/s, loss=0.666]
 13%|#2        | 285/2278 [00:50<06:01,  5.52it/s, loss=0.668]
 13%|#2        | 286/2278 [00:50<06:03,  5.48it/s, loss=0.661]
 13%|#2        | 287/2278 [00:51<06:08,  5.40it/s, loss=0.662]
 13%|#2        | 288/2278 [00:51<06:01,  5.50it/s, loss=0.655]
 13%|#2        | 289/2278 [00:51<05:58,  5.54it/s, loss=0.664]
 13%|#2        | 290/2278 [00:51<05:38,  5.87it/s, loss=0.669]
 13%|#2        | 291/2278 [00:51<05:45,  5.74it/s, loss=0.660]
 13%|#2        | 292/2278 [00:51<05:31,  5.99it/s, loss=0.665]
 13%|#2        | 293/2278 [00:52<05:41,  5.81it/s, loss=0.664]
 13%|#2        | 294/2278 [00:52<05:47,  5.71it/s, loss=0.661]
 13%|#2        | 295/2278 [00:52<05:52,  5.62it/s, loss=0.661]
 13%|#2        | 296/2278 [00:52<05:35,  5.91it/s, loss=0.667]
 13%|#3        | 297/2278 [00:52<05:36,  5.89it/s, loss=0.651]
 13%|#3        | 298/2278 [00:53<05:35,  5.90it/s, loss=0.661]
 13%|#3        | 299/2278 [00:53<05:41,  5.79it/s, loss=0.662]
 13%|#3        | 300/2278 [00:53<05:46,  5.70it/s, loss=0.660]
 13%|#3        | 301/2278 [00:53<05:51,  5.63it/s, loss=0.666]
 13%|#3        | 302/2278 [00:53<05:51,  5.61it/s, loss=0.667]
 13%|#3        | 303/2278 [00:53<05:55,  5.56it/s, loss=0.656]
 13%|#3        | 304/2278 [00:54<05:51,  5.62it/s, loss=0.660]
 13%|#3        | 305/2278 [00:54<05:56,  5.53it/s, loss=0.659]
 13%|#3        | 306/2278 [00:54<05:49,  5.64it/s, loss=0.658]
 13%|#3        | 307/2278 [00:54<05:42,  5.75it/s, loss=0.660]
 14%|#3        | 308/2278 [00:54<05:46,  5.68it/s, loss=0.650]
 14%|#3        | 309/2278 [00:55<05:54,  5.55it/s, loss=0.665]
 14%|#3        | 310/2278 [00:55<05:55,  5.53it/s, loss=0.657]
 14%|#3        | 311/2278 [00:55<05:55,  5.53it/s, loss=0.663]
 14%|#3        | 312/2278 [00:55<05:48,  5.64it/s, loss=0.657]
 14%|#3        | 313/2278 [00:55<05:51,  5.58it/s, loss=0.661]
 14%|#3        | 314/2278 [00:55<05:53,  5.55it/s, loss=0.660]
 14%|#3        | 315/2278 [00:56<05:51,  5.59it/s, loss=0.664]
 14%|#3        | 316/2278 [00:56<05:53,  5.55it/s, loss=0.662]
 14%|#3        | 317/2278 [00:56<05:55,  5.51it/s, loss=0.659]
 14%|#3        | 318/2278 [00:56<05:55,  5.51it/s, loss=0.663]
 14%|#4        | 319/2278 [00:56<05:45,  5.67it/s, loss=0.659]
 14%|#4        | 320/2278 [00:56<05:47,  5.63it/s, loss=0.660]
 14%|#4        | 321/2278 [00:57<05:46,  5.65it/s, loss=0.654]
 14%|#4        | 322/2278 [00:57<05:49,  5.60it/s, loss=0.657]
 14%|#4        | 323/2278 [00:57<05:50,  5.58it/s, loss=0.658]
 14%|#4        | 324/2278 [00:57<05:48,  5.61it/s, loss=0.657]
 14%|#4        | 325/2278 [00:57<05:42,  5.70it/s, loss=0.659]
 14%|#4        | 326/2278 [00:58<05:46,  5.63it/s, loss=0.654]
 14%|#4        | 327/2278 [00:58<05:50,  5.56it/s, loss=0.654]
 14%|#4        | 328/2278 [00:58<05:52,  5.54it/s, loss=0.661]
 14%|#4        | 329/2278 [00:58<05:46,  5.63it/s, loss=0.660]
 14%|#4        | 330/2278 [00:58<05:28,  5.93it/s, loss=0.659]
 15%|#4        | 331/2278 [00:58<05:31,  5.88it/s, loss=0.661]
 15%|#4        | 332/2278 [00:59<05:41,  5.69it/s, loss=0.659]
 15%|#4        | 333/2278 [00:59<05:38,  5.75it/s, loss=0.661]
 15%|#4        | 334/2278 [00:59<05:41,  5.70it/s, loss=0.667]
 15%|#4        | 335/2278 [00:59<05:42,  5.67it/s, loss=0.660]
 15%|#4        | 336/2278 [00:59<05:51,  5.52it/s, loss=0.664]
 15%|#4        | 337/2278 [00:59<05:54,  5.47it/s, loss=0.657]
 15%|#4        | 338/2278 [01:00<05:54,  5.47it/s, loss=0.654]
 15%|#4        | 339/2278 [01:00<05:41,  5.68it/s, loss=0.654]
 15%|#4        | 340/2278 [01:00<05:41,  5.68it/s, loss=0.664]
 15%|#4        | 341/2278 [01:00<05:44,  5.62it/s, loss=0.661]
 15%|#5        | 342/2278 [01:00<05:47,  5.58it/s, loss=0.656]
 15%|#5        | 343/2278 [01:01<05:48,  5.55it/s, loss=0.664]
 15%|#5        | 344/2278 [01:01<05:48,  5.54it/s, loss=0.665]
 15%|#5        | 345/2278 [01:01<05:39,  5.69it/s, loss=0.657]
 15%|#5        | 346/2278 [01:01<05:42,  5.65it/s, loss=0.664]
 15%|#5        | 347/2278 [01:01<05:47,  5.55it/s, loss=0.659]
 15%|#5        | 348/2278 [01:01<05:46,  5.57it/s, loss=0.660]
 15%|#5        | 349/2278 [01:02<05:49,  5.52it/s, loss=0.663]
 15%|#5        | 350/2278 [01:02<05:51,  5.49it/s, loss=0.657]
 15%|#5        | 351/2278 [01:02<05:46,  5.56it/s, loss=0.663]
 15%|#5        | 352/2278 [01:02<05:48,  5.52it/s, loss=0.658]
 15%|#5        | 353/2278 [01:02<05:50,  5.49it/s, loss=0.658]
 16%|#5        | 354/2278 [01:03<05:51,  5.48it/s, loss=0.653]
 16%|#5        | 355/2278 [01:03<05:49,  5.50it/s, loss=0.659]
 16%|#5        | 356/2278 [01:03<05:52,  5.45it/s, loss=0.658]
 16%|#5        | 357/2278 [01:03<05:34,  5.74it/s, loss=0.663]
 16%|#5        | 358/2278 [01:03<05:40,  5.64it/s, loss=0.661]
 16%|#5        | 359/2278 [01:03<05:46,  5.54it/s, loss=0.662]
 16%|#5        | 360/2278 [01:04<05:46,  5.53it/s, loss=0.659]
 16%|#5        | 361/2278 [01:04<05:49,  5.49it/s, loss=0.658]
 16%|#5        | 362/2278 [01:04<05:51,  5.45it/s, loss=0.653]
 16%|#5        | 363/2278 [01:04<05:43,  5.57it/s, loss=0.662]
 16%|#5        | 364/2278 [01:04<05:36,  5.68it/s, loss=0.656]
 16%|#6        | 365/2278 [01:04<05:31,  5.77it/s, loss=0.659]
 16%|#6        | 366/2278 [01:05<05:32,  5.74it/s, loss=0.656]
 16%|#6        | 367/2278 [01:05<05:37,  5.67it/s, loss=0.652]
 16%|#6        | 368/2278 [01:05<05:39,  5.63it/s, loss=0.657]
 16%|#6        | 369/2278 [01:05<05:40,  5.61it/s, loss=0.661]
 16%|#6        | 370/2278 [01:05<05:47,  5.50it/s, loss=0.659]
 16%|#6        | 371/2278 [01:06<05:41,  5.59it/s, loss=0.651]
 16%|#6        | 372/2278 [01:06<05:37,  5.65it/s, loss=0.652]
 16%|#6        | 373/2278 [01:06<05:25,  5.85it/s, loss=0.657]
 16%|#6        | 374/2278 [01:06<05:19,  5.95it/s, loss=0.659]
 16%|#6        | 375/2278 [01:06<05:24,  5.87it/s, loss=0.657]
 17%|#6        | 376/2278 [01:06<05:35,  5.68it/s, loss=0.666]
 17%|#6        | 377/2278 [01:07<05:39,  5.59it/s, loss=0.655]
 17%|#6        | 378/2278 [01:07<05:43,  5.54it/s, loss=0.657]
 17%|#6        | 379/2278 [01:07<05:51,  5.40it/s, loss=0.657]
 17%|#6        | 380/2278 [01:07<05:38,  5.61it/s, loss=0.656]
 17%|#6        | 381/2278 [01:07<05:40,  5.58it/s, loss=0.653]
 17%|#6        | 382/2278 [01:08<05:44,  5.51it/s, loss=0.661]
 17%|#6        | 383/2278 [01:08<05:39,  5.59it/s, loss=0.670]
 17%|#6        | 384/2278 [01:08<05:36,  5.62it/s, loss=0.659]
 17%|#6        | 385/2278 [01:08<05:43,  5.51it/s, loss=0.663]
 17%|#6        | 386/2278 [01:08<05:44,  5.49it/s, loss=0.655]
 17%|#6        | 387/2278 [01:08<05:27,  5.78it/s, loss=0.657]
 17%|#7        | 388/2278 [01:09<05:32,  5.69it/s, loss=0.662]
 17%|#7        | 389/2278 [01:09<05:26,  5.78it/s, loss=0.659]
 17%|#7        | 390/2278 [01:09<05:14,  6.01it/s, loss=0.661]
 17%|#7        | 391/2278 [01:09<05:23,  5.84it/s, loss=0.662]
 17%|#7        | 392/2278 [01:09<05:27,  5.76it/s, loss=0.660]
 17%|#7        | 393/2278 [01:09<05:32,  5.66it/s, loss=0.655]
 17%|#7        | 394/2278 [01:10<05:30,  5.69it/s, loss=0.659]
 17%|#7        | 395/2278 [01:10<05:32,  5.65it/s, loss=0.663]
 17%|#7        | 396/2278 [01:10<05:36,  5.60it/s, loss=0.661]
 17%|#7        | 397/2278 [01:10<05:36,  5.59it/s, loss=0.656]
 17%|#7        | 398/2278 [01:10<05:35,  5.61it/s, loss=0.666]
 18%|#7        | 399/2278 [01:11<05:34,  5.62it/s, loss=0.663]
 18%|#7        | 400/2278 [01:11<05:38,  5.54it/s, loss=0.656]
 18%|#7        | 401/2278 [01:11<05:36,  5.57it/s, loss=0.658]
 18%|#7        | 402/2278 [01:11<05:36,  5.58it/s, loss=0.656]
 18%|#7        | 403/2278 [01:11<05:33,  5.63it/s, loss=0.660]
 18%|#7        | 404/2278 [01:11<05:38,  5.53it/s, loss=0.662]
 18%|#7        | 405/2278 [01:12<05:35,  5.58it/s, loss=0.660]
 18%|#7        | 406/2278 [01:12<05:36,  5.56it/s, loss=0.656]
 18%|#7        | 407/2278 [01:12<05:34,  5.59it/s, loss=0.662]
 18%|#7        | 408/2278 [01:12<05:37,  5.53it/s, loss=0.657]
 18%|#7        | 409/2278 [01:12<05:39,  5.50it/s, loss=0.651]
 18%|#7        | 410/2278 [01:13<05:36,  5.55it/s, loss=0.658]
 18%|#8        | 411/2278 [01:13<05:34,  5.58it/s, loss=0.655]
 18%|#8        | 412/2278 [01:13<05:34,  5.58it/s, loss=0.653]
 18%|#8        | 413/2278 [01:13<05:33,  5.59it/s, loss=0.666]
 18%|#8        | 414/2278 [01:13<05:32,  5.60it/s, loss=0.662]
 18%|#8        | 415/2278 [01:13<05:32,  5.61it/s, loss=0.653]
 18%|#8        | 416/2278 [01:14<05:29,  5.65it/s, loss=0.661]
 18%|#8        | 417/2278 [01:14<05:30,  5.63it/s, loss=0.662]
 18%|#8        | 418/2278 [01:14<05:32,  5.60it/s, loss=0.656]
 18%|#8        | 419/2278 [01:14<05:31,  5.61it/s, loss=0.656]
 18%|#8        | 420/2278 [01:14<05:32,  5.58it/s, loss=0.652]
 18%|#8        | 421/2278 [01:14<05:39,  5.48it/s, loss=0.658]
 19%|#8        | 422/2278 [01:15<05:36,  5.52it/s, loss=0.657]
 19%|#8        | 423/2278 [01:15<05:40,  5.45it/s, loss=0.668]
 19%|#8        | 424/2278 [01:15<05:43,  5.39it/s, loss=0.666]
 19%|#8        | 425/2278 [01:15<05:36,  5.50it/s, loss=0.653]
 19%|#8        | 426/2278 [01:15<05:34,  5.54it/s, loss=0.661]
 19%|#8        | 427/2278 [01:16<05:34,  5.53it/s, loss=0.661]
 19%|#8        | 428/2278 [01:16<05:33,  5.55it/s, loss=0.658]
 19%|#8        | 429/2278 [01:16<05:30,  5.59it/s, loss=0.661]
 19%|#8        | 430/2278 [01:16<05:29,  5.61it/s, loss=0.660]
 19%|#8        | 431/2278 [01:16<05:26,  5.65it/s, loss=0.658]
 19%|#8        | 432/2278 [01:16<05:27,  5.63it/s, loss=0.657]
 19%|#9        | 433/2278 [01:17<05:27,  5.63it/s, loss=0.654]
 19%|#9        | 434/2278 [01:17<05:32,  5.54it/s, loss=0.654]
 19%|#9        | 435/2278 [01:17<05:37,  5.46it/s, loss=0.656]
 19%|#9        | 436/2278 [01:17<05:35,  5.50it/s, loss=0.650]
 19%|#9        | 437/2278 [01:17<05:36,  5.47it/s, loss=0.657]
 19%|#9        | 438/2278 [01:18<05:32,  5.53it/s, loss=0.658]
 19%|#9        | 439/2278 [01:18<05:31,  5.55it/s, loss=0.655]
 19%|#9        | 440/2278 [01:18<05:31,  5.55it/s, loss=0.660]
 19%|#9        | 441/2278 [01:18<05:31,  5.54it/s, loss=0.654]
 19%|#9        | 442/2278 [01:18<05:30,  5.55it/s, loss=0.659]
 19%|#9        | 443/2278 [01:18<05:29,  5.56it/s, loss=0.657]
 19%|#9        | 444/2278 [01:19<05:26,  5.62it/s, loss=0.661]
 20%|#9        | 445/2278 [01:19<05:27,  5.60it/s, loss=0.663]
 20%|#9        | 446/2278 [01:19<05:25,  5.63it/s, loss=0.655]
 20%|#9        | 447/2278 [01:19<05:31,  5.53it/s, loss=0.653]
 20%|#9        | 448/2278 [01:19<05:32,  5.50it/s, loss=0.658]
 20%|#9        | 449/2278 [01:20<05:34,  5.47it/s, loss=0.652]
 20%|#9        | 450/2278 [01:20<05:31,  5.51it/s, loss=0.661]
 20%|#9        | 451/2278 [01:20<05:28,  5.57it/s, loss=0.658]
 20%|#9        | 452/2278 [01:20<05:28,  5.56it/s, loss=0.661]
 20%|#9        | 453/2278 [01:20<05:26,  5.59it/s, loss=0.661]
 20%|#9        | 454/2278 [01:20<05:25,  5.61it/s, loss=0.661]
 20%|#9        | 455/2278 [01:21<05:25,  5.61it/s, loss=0.653]
 20%|##        | 456/2278 [01:21<05:23,  5.62it/s, loss=0.653]
 20%|##        | 457/2278 [01:21<05:25,  5.59it/s, loss=0.653]
 20%|##        | 458/2278 [01:21<05:26,  5.57it/s, loss=0.661]
 20%|##        | 459/2278 [01:21<05:23,  5.63it/s, loss=0.660]
 20%|##        | 460/2278 [01:21<05:20,  5.68it/s, loss=0.656]
 20%|##        | 461/2278 [01:22<05:22,  5.64it/s, loss=0.656]
 20%|##        | 462/2278 [01:22<05:22,  5.64it/s, loss=0.650]
 20%|##        | 463/2278 [01:22<05:26,  5.56it/s, loss=0.656]
 20%|##        | 464/2278 [01:22<05:26,  5.56it/s, loss=0.662]
 20%|##        | 465/2278 [01:22<05:26,  5.56it/s, loss=0.656]
 20%|##        | 466/2278 [01:23<05:23,  5.60it/s, loss=0.657]
 21%|##        | 467/2278 [01:23<05:19,  5.68it/s, loss=0.655]
 21%|##        | 468/2278 [01:23<05:19,  5.66it/s, loss=0.656]
 21%|##        | 469/2278 [01:23<05:20,  5.65it/s, loss=0.655]
 21%|##        | 470/2278 [01:23<05:20,  5.64it/s, loss=0.654]
 21%|##        | 471/2278 [01:23<05:17,  5.68it/s, loss=0.652]
 21%|##        | 472/2278 [01:24<05:18,  5.67it/s, loss=0.655]
 21%|##        | 473/2278 [01:24<05:14,  5.73it/s, loss=0.654]
 21%|##        | 474/2278 [01:24<05:18,  5.66it/s, loss=0.663]
 21%|##        | 475/2278 [01:24<05:20,  5.62it/s, loss=0.659]
 21%|##        | 476/2278 [01:24<05:20,  5.62it/s, loss=0.657]
 21%|##        | 477/2278 [01:25<05:23,  5.57it/s, loss=0.651]
 21%|##        | 478/2278 [01:25<05:23,  5.56it/s, loss=0.658]
 21%|##1       | 479/2278 [01:25<05:23,  5.56it/s, loss=0.655]
 21%|##1       | 480/2278 [01:25<05:24,  5.55it/s, loss=0.659]
 21%|##1       | 481/2278 [01:25<05:22,  5.57it/s, loss=0.657]
 21%|##1       | 482/2278 [01:25<05:21,  5.59it/s, loss=0.661]
 21%|##1       | 483/2278 [01:26<05:22,  5.57it/s, loss=0.654]
 21%|##1       | 484/2278 [01:26<05:18,  5.63it/s, loss=0.657]
 21%|##1       | 485/2278 [01:26<05:23,  5.54it/s, loss=0.660]
 21%|##1       | 486/2278 [01:26<05:21,  5.57it/s, loss=0.661]
 21%|##1       | 487/2278 [01:26<05:22,  5.55it/s, loss=0.655]
 21%|##1       | 488/2278 [01:26<05:21,  5.56it/s, loss=0.657]
 21%|##1       | 489/2278 [01:27<05:18,  5.62it/s, loss=0.651]
 22%|##1       | 490/2278 [01:27<05:22,  5.54it/s, loss=0.654]
 22%|##1       | 491/2278 [01:27<05:25,  5.50it/s, loss=0.654]
 22%|##1       | 492/2278 [01:27<05:25,  5.48it/s, loss=0.653]
 22%|##1       | 493/2278 [01:27<05:23,  5.51it/s, loss=0.653]
 22%|##1       | 494/2278 [01:28<05:24,  5.50it/s, loss=0.657]
 22%|##1       | 495/2278 [01:28<05:23,  5.51it/s, loss=0.656]
 22%|##1       | 496/2278 [01:28<05:25,  5.48it/s, loss=0.656]
 22%|##1       | 497/2278 [01:28<05:25,  5.47it/s, loss=0.655]
 22%|##1       | 498/2278 [01:28<05:24,  5.48it/s, loss=0.660]
 22%|##1       | 499/2278 [01:28<05:22,  5.51it/s, loss=0.659]Converges at iteration 10
Epoch 0 Validation Accuracy 0.07651263465216954 Test Accuracy 0.05884410427339876

 22%|##1       | 499/2278 [01:54<06:48,  4.35it/s, loss=0.657]

Evaluating Performance with Link Prediction (Optional)¶

In practice, it is more common to evaluate the link prediction model to see whether it can predict new edges. There are different evaluation metrics such as AUC or various metrics from information retrieval. Ultimately, they require the model to predict one scalar score given a node pair among a set of node pairs.

Assuming that you have the following test set with labels, where test_pos_src and test_pos_dst are ground truth node pairs with edges in between (or positive pairs), and test_neg_src and test_neg_dst are ground truth node pairs without edges in between (or negative pairs).

# Positive pairs
# These are randomly generated as an example.  You will need to
# replace them with your own ground truth.
n_test_pos = 1000
test_pos_src, test_pos_dst = (
    torch.randint(0, graph.num_nodes(), (n_test_pos,)),
    torch.randint(0, graph.num_nodes(), (n_test_pos,)),
)
# Negative pairs.  Likewise, you will need to replace them with your
# own ground truth.
test_neg_src = test_pos_src
test_neg_dst = torch.randint(0, graph.num_nodes(), (n_test_pos,))

First you need to compute the node representations for all the nodes with the inference method above:

node_reprs = inference(model, graph, node_features)

Since the predictor is a dot product, you can now easily compute the score of positive and negative test pairs to compute metrics such as AUC:

h_pos_src = node_reprs[test_pos_src]
h_pos_dst = node_reprs[test_pos_dst]
h_neg_src = node_reprs[test_neg_src]
h_neg_dst = node_reprs[test_neg_dst]
score_pos = (h_pos_src * h_pos_dst).sum(1)
score_neg = (h_neg_src * h_neg_dst).sum(1)
test_preds = torch.cat([score_pos, score_neg]).cpu().numpy()
test_labels = (
    torch.cat([torch.ones_like(score_pos), torch.zeros_like(score_neg)])
    .cpu()
    .numpy()
)

auc = sklearn.metrics.roc_auc_score(test_labels, test_preds)
print("Link Prediction AUC:", auc)

Out:

Link Prediction AUC: 0.5023085

Conclusion¶

In this tutorial, you have learned how to train a multi-layer GraphSAGE for link prediction with neighbor sampling.

# Thumbnail credits: Link Prediction with Neo4j, Mark Needham
# sphinx_gallery_thumbnail_path = '_static/blitz_4_link_predict.png'

Total running time of the script: ( 2 minutes 0.344 seconds)

Gallery generated by Sphinx-Gallery