DataLoader

class dgl.graphbolt.DataLoader(datapipe, num_workers=0, persistent_workers=True, overlap_feature_fetch=True, overlap_graph_fetch=False, max_uva_threads=6144)[source]

Bases: DataLoader

Multiprocessing DataLoader.

Iterates over the data pipeline with everything before feature fetching (i.e. dgl.graphbolt.FeatureFetcher) in subprocesses, and everything after feature fetching in the main process. The datapipe is modified in-place as a result.

When the copy_to operation is placed earlier in the data pipeline, the num_workers argument is required to be 0 as utilizing CUDA in multiple worker processes is not supported.

Parameters:
  • datapipe (DataPipe) – The data pipeline.

  • num_workers (int, optional) – Number of worker processes. Default is 0.

  • persistent_workers (bool, optional) – If True, the data loader will not shut down the worker processes after a dataset has been consumed once. This allows to maintain the workers instances alive.

  • overlap_feature_fetch (bool, optional) – If True, the data loader will overlap the UVA feature fetcher operations with the rest of operations by using an alternative CUDA stream. Default is True.

  • overlap_graph_fetch (bool, optional) – If True, the data loader will overlap the UVA graph fetching operations with the rest of operations by using an alternative CUDA stream. Default is False.

  • max_uva_threads (int, optional) – Limits the number of CUDA threads used for UVA copies so that the rest of the computations can run simultaneously with it. Setting it to a too high value will limit the amount of overlap while setting it too low may cause the PCI-e bandwidth to not get fully utilized. Manually tuned default is 6144, meaning around 3-4 Streaming Multiprocessors.