create_dataset

selfeeg.utils.utils.create_dataset(folder_name: str = 'Simulated_EEG', Sample_range: list = [512, 1025], Chans: int = 8, p: list = 0.8, return_labels: bool = False, seed: int = 1234) → ndarray | None[source]

creates a simulated EEG dataset for normal abnormal binary classification.

Samples have random length within a given range.

Once called, the function will generate 1000 files in a new directory. Samples will have name ‘A_B_C_D.pickle’ with:

A = dataset ID

B = subject ID

C = session ID

D = trial ID.

In total, create_dataset will generate files associated to:

5 datasets (200 files per dataset)

40 subjects per dataset

5 sessions per subject

1 trial per session.

All files will store a dictionary with two keys:

‘data’ = the array with random length and given channels (channels in column dimension)

‘label’ = an integer with a random binary label (0=normal, 1=abnormal).

EEG files have values in uV, with range at most in [-550,550] uV.

Parameters:

folder_name (str, optional) –
A string with the optional name of the subdirectory to store the generated files.

Default = ‘Simulated_EEG’
Sample_range (list, optional) –
A length 2 list with the possible minimum and maximum length of the generated EEGs.

Default = [512, 1025]
Chans (int, optional) –
An integer defining the number of channels each EEG must have.

Default = 8
p (float, optional) –
A scalar in range [0, 1] with the probability of a sample being normal.

Default = 0.8
seed (int, optional) –
A seed to set for reproducibility.

Default = 1234

Returns:

classes (ArrayLike) – An array with the generated label. Index association is based on the file sorted by names.

Example

>>> import selfeeg.utils
>>> import glob
>>> utils.create_dataset()
>>> print(len(glob.glob('Simulated_EEG/*'))==1000) #shoud return True