GDELTDataset

class dgl.data.GDELTDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]

Bases: dgl.data.dgl_dataset.DGLBuiltinDataset

GDELT dataset for event-based temporal graph

The Global Database of Events, Language, and Tone (GDELT) dataset. This contains events happend all over the world (ie every protest held anywhere in Russia on a given day is collapsed to a single entry). This Dataset consists ofevents collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

Reference:

Statistics:

  • Train examples: 2,304

  • Valid examples: 288

  • Test examples: 384

Parameters
  • mode (str) – Must be one of (‘train’, ‘valid’, ‘test’). Default: ‘train’

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

start_time

Start time of the temporal graph

Type

int

end_time

End time of the temporal graph

Type

int

is_temporal

Does the dataset contain temporal graphs

Type

bool

Examples

>>> # get train, valid, test dataset
>>> train_data = GDELTDataset()
>>> valid_data = GDELTDataset(mode='valid')
>>> test_data = GDELTDataset(mode='test')
>>>
>>> # length of train set
>>> train_size = len(train_data)
>>>
>>> for g in train_data:
....    e_feat = g.edata['rel_type']
....    # your code here
....
>>>
__getitem__(t)[source]

Get graph by with events before time t + self.start_time

Parameters

t (int) – Time, its value must be in range [0, self.end_time - self.start_time]

Returns

The graph contains:

  • edata['rel_type']: edge type

Return type

dgl.DGLGraph

__len__()[source]

Number of graphs in the dataset.

Returns

Return type

int