datasets package

Submodules

datasets.eye_fundus module

datasets.eye_scans module

class datasets.eye_scans.EyeScans(batch_size, real_word_data, ratio=[0.8, 0.1], seed=42, train_data_dir=None, val_data_dir=None, test_dataset_dir=None, num_workers=4, transforms=None)[source]

Bases: LightningDataModule

Parameters:

batch_size (int)
real_word_data (bool)
ratio (list[float])
seed (int)
train_data_dir (str | None)
val_data_dir (str | None)
test_dataset_dir (str | None)
num_workers (int)
transforms (Compose)

setup(stage=None)[source]

Set up the eye scans dataset for training, validation, and testing.

Parameters:: stage (str, optional) – The stage of the dataset setup. Can be “fit” for training, “test” for testing, or None for both. Defaults to None.

train_dataloader()[source]

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

fit()
prepare_data()
setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

val_dataloader()[source]

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

fit()
validate()
prepare_data()
setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

test_dataloader()[source]

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

test()
prepare_data()
setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

datasets.opthal_anonymized module

datasets.opthal_anonymized.get_csv_dataset(filepath, val_size=0.1, seed=42)[source]

Load dataset from csv file and split it into train validation,test sets from previously splitted train/test dataset.

Parameters:

filepath (str or Path) – The path to the CSV file containing the dataset.
val_size (float or None) – The proportion of the dataset to be used for validation. Default is 0.1 (10%).
seed (int) – The random seed for reproducible train/validation split. Default is 42.

Returns:

A dictionary containing the train, validation, and test sets as pandas DataFrames. If val_size is None, only the train and test sets are returned.

Return type:

dict[str, pd.DataFrame]

class datasets.opthal_anonymized.OpthalAnonymizedDataset(diagnosis, df, images_dir, image_size=(256, 512), transform=None, convert_image_to=None, seed=None)[source]

Bases: Dataset

Dataset class for OpthalAnonymizedDataset.

Parameters:

diagnosis (Literal["precancerous", "fluid", "benign", "reference"]) – The diagnosis category for the dataset.
df (pd.DataFrame) – The DataFrame containing the dataset information.
images_dir (str | Path) – The directory path where the images are stored.
image_size (tuple[int, int], optional) – The desired size of the images. Defaults to (256, 512).
transform (nn.Module, optional) – The transformation to apply to the images. Defaults to None.
convert_image_to (Any, optional) – The function or transformation to convert the images to a specific format. Defaults to None.
seed (int, optional) – The random seed for shuffling the dataset. Defaults to None.

datasets.prepare_dataset module

datasets.prepare_dataset.get_patients_paths(dataset_dir)[source]

Returns a list of paths to patients directories in the given directory.

Parameters:: dataset_dir (Path) – The directory containing the dataset.
Returns:: A list of Path objects representing the paths to patients directories.
Return type:: list[Path]

datasets.prepare_dataset.get_images_paths(images_dir)[source]

Returns a list of paths to images in the given directory.

Parameters:: images_dir (Path) – The directory containing the images.
Returns:: A list of paths to images.
Return type:: Optional[list[Path]]

datasets.prepare_dataset.get_lesion_eyes_paths(dataset_dir)[source]

Returns a list of paths to all reference (second, healthy) eyes images in the dataset directory.

Parameters:: dataset_dir (Path) – The directory path of the dataset.
Returns:: A list of paths to the reference eyes images.
Return type:: Optional[list[Path]]

datasets.prepare_dataset.get_reference_eyes_paths(dataset_dir)[source]

Returns a list of paths to images of healthy eyes with reference healthy eye.

Parameters:: dataset_dir (Path) – The directory path of the dataset.
Returns:: A list of paths to images of healthy eyes.
Return type:: Optional[list[Path]]

datasets.prepare_dataset.resize_images_and_save(images_paths, output_dir_path, size, max_images=None)[source]

Resizes images from given paths and saves them to the given directory.

Parameters:

images_paths (list[Path]) – List of paths to the images.
output_dir_path (str) – Path to the output directory.
size (tuple[int, int]) – The desired size of the images after resizing.
max_images (int | None) – The maximum number of images to resize and save. If None, all images will be processed.

Return type:

None

datasets.prepare_dataset.copy_images_to_dir(images_paths, destination_dir)[source]

Copies images from given paths to given directory.

Parameters:

images_paths (list[Path]) – List of paths to the images.
destination_dir (Path) – Path to the destination directory.

Returns:

None

Return type:

None

datasets.prepare_opthal_anonym module

datasets.preprocess module

datasets.preprocess.crop_image(image, new_size, orientation='center')[source]

Crop image to new_size with specified orientation.

Parameters:

image (Image) – The image to be cropped.
new_size (tuple[int, int]) – The new size of the cropped image.
orientation (Literal["center", "left", "right"], optional) – The orientation of the cropped image. Valid values are “center”, “left”, or “right”.

Returns:

The cropped image.

Return type:

Image

Raises:

ValueError – If the specified orientation is invalid.

datasets.preprocess.load_dataset_csv_file(file_path)[source]

Load a dataset from a CSV file and add a column with the file path of each image.

Parameters:: file_path (str or Path) – The path to the CSV file.
Returns:: The loaded dataset with an additional “file_path” column.
Return type:: pd.DataFrame

Module contents

This module provides datasets and data loading methods related to eye scans and ophthalmology.

class datasets.EyeScans(batch_size, real_word_data, ratio=[0.8, 0.1], seed=42, train_data_dir=None, val_data_dir=None, test_dataset_dir=None, num_workers=4, transforms=None)[source]

Bases: LightningDataModule

Parameters:

batch_size (int)
real_word_data (bool)
ratio (list[float])
seed (int)
train_data_dir (str | None)
val_data_dir (str | None)
test_dataset_dir (str | None)
num_workers (int)
transforms (Compose)

setup(stage=None)[source]

Set up the eye scans dataset for training, validation, and testing.

Parameters:: stage (str, optional) – The stage of the dataset setup. Can be “fit” for training, “test” for testing, or None for both. Defaults to None.

train_dataloader()[source]

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

fit()
prepare_data()
setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

val_dataloader()[source]

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

fit()
validate()
prepare_data()
setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

test_dataloader()[source]

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

test()
prepare_data()
setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

class datasets.OpthalAnonymizedDataset(diagnosis, df, images_dir, image_size=(256, 512), transform=None, convert_image_to=None, seed=None)[source]

Bases: Dataset

Dataset class for OpthalAnonymizedDataset.

Parameters:

diagnosis (Literal["precancerous", "fluid", "benign", "reference"]) – The diagnosis category for the dataset.
df (pd.DataFrame) – The DataFrame containing the dataset information.
images_dir (str | Path) – The directory path where the images are stored.
image_size (tuple[int, int], optional) – The desired size of the images. Defaults to (256, 512).
transform (nn.Module, optional) – The transformation to apply to the images. Defaults to None.
convert_image_to (Any, optional) – The function or transformation to convert the images to a specific format. Defaults to None.
seed (int, optional) – The random seed for shuffling the dataset. Defaults to None.

datasets.get_csv_dataset(filepath, val_size=0.1, seed=42)[source]

Load dataset from csv file and split it into train validation,test sets from previously splitted train/test dataset.

Parameters:

filepath (str or Path) – The path to the CSV file containing the dataset.
val_size (float or None) – The proportion of the dataset to be used for validation. Default is 0.1 (10%).
seed (int) – The random seed for reproducible train/validation split. Default is 42.

Returns:

A dictionary containing the train, validation, and test sets as pandas DataFrames. If val_size is None, only the train and test sets are returned.

Return type:

dict[str, pd.DataFrame]