dgl.knn_graph

dgl.knn_graph(x, k, algorithm='bruteforce-blas', dist='euclidean')[source]

Construct a graph from a set of points according to k-nearest-neighbor (KNN) and return.

The function transforms the coordinates/features of a point set into a directed homogeneous graph. The coordinates of the point set is specified as a matrix whose rows correspond to points and columns correspond to coordinate/feature dimensions.

The nodes of the returned graph correspond to the points, where the predecessors of each point are its k-nearest neighbors measured by the chosen distance.

If x is a 3D tensor, then each submatrix will be transformed into a separate graph. DGL then composes the graphs into a large graph of multiple connected components.

See the benchmark for a complete benchmark result.

Parameters
  • x (Tensor) –

    The point coordinates. It can be either on CPU or GPU.

    • If is 2D, x[i] corresponds to the i-th node in the KNN graph.

    • If is 3D, x[i] corresponds to the i-th KNN graph and x[i][j] corresponds to the j-th node in the i-th KNN graph.

  • k (int) – The number of nearest neighbors per node.

  • algorithm (str, optional) –

    Algorithm used to compute the k-nearest neighbors.

    • ’bruteforce-blas’ will first compute the distance matrix using BLAS matrix multiplication operation provided by backend frameworks. Then use topk algorithm to get k-nearest neighbors. This method is fast when the point set is small but has \(O(N^2)\) memory complexity where \(N\) is the number of points.

    • ’bruteforce’ will compute distances pair by pair and directly select the k-nearest neighbors during distance computation. This method is slower than ‘bruteforce-blas’ but has less memory overhead (i.e., \(O(Nk)\) where \(N\) is the number of points, \(k\) is the number of nearest neighbors per node) since we do not need to store all distances.

    • ’bruteforce-sharemem’ (CUDA only) is similar to ‘bruteforce’ but use shared memory in CUDA devices for buffer. This method is faster than ‘bruteforce’ when the dimension of input points is not large. This method is only available on CUDA device.

    • ’kd-tree’ will use the kd-tree algorithm (CPU only). This method is suitable for low-dimensional data (e.g. 3D point clouds)

    • ’nn-descent’ is an approximate approach from paper Efficient k-nearest neighbor graph construction for generic similarity measures. This method will search for nearest neighbor candidates in “neighbors’ neighbors”.

    (default: ‘bruteforce-blas’)

  • dist (str, optional) – The distance metric used to compute distance between points. It can be the following metrics: * ‘euclidean’: Use Euclidean distance (L2 norm) \(\sqrt{\sum_{i} (x_{i} - y_{i})^{2}}\). * ‘cosine’: Use cosine distance. (default: ‘euclidean’)

Returns

The constructed graph. The node IDs are in the same order as x.

Return type

DGLGraph

Examples

The following examples use PyTorch backend.

>>> import dgl
>>> import torch

When x is a 2D tensor, a single KNN graph is constructed.

>>> x = torch.tensor([[0.0, 0.0, 1.0],
...                   [1.0, 0.5, 0.5],
...                   [0.5, 0.2, 0.2],
...                   [0.3, 0.2, 0.4]])
>>> knn_g = dgl.knn_graph(x, 2)  # Each node has two predecessors
>>> knn_g.edges()
(tensor([0, 1, 2, 2, 2, 3, 3, 3]), tensor([0, 1, 1, 2, 3, 0, 2, 3]))

When x is a 3D tensor, DGL constructs multiple KNN graphs and and then composes them into a graph of multiple connected components.

>>> x1 = torch.tensor([[0.0, 0.0, 1.0],
...                    [1.0, 0.5, 0.5],
...                    [0.5, 0.2, 0.2],
...                    [0.3, 0.2, 0.4]])
>>> x2 = torch.tensor([[0.0, 1.0, 1.0],
...                    [0.3, 0.3, 0.3],
...                    [0.4, 0.4, 1.0],
...                    [0.3, 0.8, 0.2]])
>>> x = torch.stack([x1, x2], dim=0)
>>> knn_g = dgl.knn_graph(x, 2)  # Each node has two predecessors
>>> knn_g.edges()
(tensor([0, 1, 2, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 7, 7]),
 tensor([0, 1, 1, 2, 3, 0, 2, 3, 4, 5, 6, 7, 4, 6, 5, 7]))