create_dataset

selfeeg.utils.utils.create_dataset(folder_name: str = 'Simulated_EEG', Sample_range: list = [512, 1025], Chans: int = 8, p: list = 0.8, return_labels: bool = False, seed: int = 1234) ndarray | None[source]

creates a simulated EEG dataset for normal abnormal binary classification.

Samples have random length within a given range.

Once called, the function will generate 1000 files in a new directory. Samples will have name ‘A_B_C_D.pickle’ with:

  1. A = dataset ID

  2. B = subject ID

  3. C = session ID

  4. D = trial ID.

In total, create_dataset will generate files associated to:

  1. 5 datasets (200 files per dataset)

  2. 40 subjects per dataset

  3. 5 sessions per subject

  4. 1 trial per session.

All files will store a dictionary with two keys:

  1. ‘data’ = the array with random length and given channels (channels in column dimension)

  2. ‘label’ = an integer with a random binary label (0=normal, 1=abnormal).

EEG files have values in uV, with range at most in [-550,550] uV.

Parameters:
  • folder_name (str, optional) –

    A string with the optional name of the subdirectory to store the generated files.

    Default = ‘Simulated_EEG’

  • Sample_range (list, optional) –

    A length 2 list with the possible minimum and maximum length of the generated EEGs.

    Default = [512, 1025]

  • Chans (int, optional) –

    An integer defining the number of channels each EEG must have.

    Default = 8

  • p (float, optional) –

    A scalar in range [0, 1] with the probability of a sample being normal.

    Default = 0.8

  • seed (int, optional) –

    A seed to set for reproducibility.

    Default = 1234

Returns:

classes (ArrayLike) – An array with the generated label. Index association is based on the file sorted by names.

Example

>>> import selfeeg.utils
>>> import glob
>>> utils.create_dataset()
>>> print(len(glob.glob('Simulated_EEG/*'))==1000) #shoud return True