datasets package
Submodules
datasets.eye_fundus module
datasets.eye_scans module
- class datasets.eye_scans.EyeScans(batch_size, real_word_data, ratio=[0.8, 0.1], seed=42, train_data_dir=None, val_data_dir=None, test_dataset_dir=None, num_workers=4, transforms=None)[source]
Bases:
LightningDataModule- Parameters:
- setup(stage=None)[source]
Set up the eye scans dataset for training, validation, and testing.
- Parameters:
stage (str, optional) – The stage of the dataset setup. Can be “fit” for training, “test” for testing, or None for both. Defaults to None.
- train_dataloader()[source]
An iterable or collection of iterables specifying training samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- val_dataloader()[source]
An iterable or collection of iterables specifying validation samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data().fit()validate()prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
Note
If you don’t need a validation dataset and a
validation_step(), you don’t need to implement this method.
- test_dataloader()[source]
An iterable or collection of iterables specifying test samples.
For more information about multiple dataloaders, see this section.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
test()prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Note
If you don’t need a test dataset and a
test_step(), you don’t need to implement this method.
datasets.opthal_anonymized module
- datasets.opthal_anonymized.get_csv_dataset(filepath, val_size=0.1, seed=42)[source]
Load dataset from csv file and split it into train validation,test sets from previously splitted train/test dataset.
- Parameters:
- Returns:
A dictionary containing the train, validation, and test sets as pandas DataFrames. If val_size is None, only the train and test sets are returned.
- Return type:
- class datasets.opthal_anonymized.OpthalAnonymizedDataset(diagnosis, df, images_dir, image_size=(256, 512), transform=None, convert_image_to=None, seed=None)[source]
Bases:
DatasetDataset class for OpthalAnonymizedDataset.
- Parameters:
diagnosis (Literal["precancerous", "fluid", "benign", "reference"]) – The diagnosis category for the dataset.
df (pd.DataFrame) – The DataFrame containing the dataset information.
images_dir (str | Path) – The directory path where the images are stored.
image_size (tuple[int, int], optional) – The desired size of the images. Defaults to (256, 512).
transform (nn.Module, optional) – The transformation to apply to the images. Defaults to None.
convert_image_to (Any, optional) – The function or transformation to convert the images to a specific format. Defaults to None.
seed (int, optional) – The random seed for shuffling the dataset. Defaults to None.
datasets.prepare_dataset module
- datasets.prepare_dataset.get_patients_paths(dataset_dir)[source]
Returns a list of paths to patients directories in the given directory.
- Parameters:
dataset_dir (Path) – The directory containing the dataset.
- Returns:
A list of Path objects representing the paths to patients directories.
- Return type:
list[Path]
- datasets.prepare_dataset.get_images_paths(images_dir)[source]
Returns a list of paths to images in the given directory.
- Parameters:
images_dir (Path) – The directory containing the images.
- Returns:
A list of paths to images.
- Return type:
Optional[list[Path]]
- datasets.prepare_dataset.get_lesion_eyes_paths(dataset_dir)[source]
Returns a list of paths to all reference (second, healthy) eyes images in the dataset directory.
- Parameters:
dataset_dir (Path) – The directory path of the dataset.
- Returns:
A list of paths to the reference eyes images.
- Return type:
Optional[list[Path]]
- datasets.prepare_dataset.get_reference_eyes_paths(dataset_dir)[source]
Returns a list of paths to images of healthy eyes with reference healthy eye.
- Parameters:
dataset_dir (Path) – The directory path of the dataset.
- Returns:
A list of paths to images of healthy eyes.
- Return type:
Optional[list[Path]]
datasets.prepare_opthal_anonym module
datasets.preprocess module
- datasets.preprocess.crop_image(image, new_size, orientation='center')[source]
Crop image to new_size with specified orientation.
- Parameters:
- Returns:
The cropped image.
- Return type:
Image
- Raises:
ValueError – If the specified orientation is invalid.
- datasets.preprocess.load_dataset_csv_file(file_path)[source]
Load a dataset from a CSV file and add a column with the file path of each image.
- Parameters:
file_path (str or Path) – The path to the CSV file.
- Returns:
The loaded dataset with an additional “file_path” column.
- Return type:
pd.DataFrame
Module contents
This module provides datasets and data loading methods related to eye scans and ophthalmology.
- class datasets.EyeScans(batch_size, real_word_data, ratio=[0.8, 0.1], seed=42, train_data_dir=None, val_data_dir=None, test_dataset_dir=None, num_workers=4, transforms=None)[source]
Bases:
LightningDataModule- Parameters:
- setup(stage=None)[source]
Set up the eye scans dataset for training, validation, and testing.
- Parameters:
stage (str, optional) – The stage of the dataset setup. Can be “fit” for training, “test” for testing, or None for both. Defaults to None.
- train_dataloader()[source]
An iterable or collection of iterables specifying training samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- val_dataloader()[source]
An iterable or collection of iterables specifying validation samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data().fit()validate()prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
Note
If you don’t need a validation dataset and a
validation_step(), you don’t need to implement this method.
- test_dataloader()[source]
An iterable or collection of iterables specifying test samples.
For more information about multiple dataloaders, see this section.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
test()prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Note
If you don’t need a test dataset and a
test_step(), you don’t need to implement this method.
- class datasets.OpthalAnonymizedDataset(diagnosis, df, images_dir, image_size=(256, 512), transform=None, convert_image_to=None, seed=None)[source]
Bases:
DatasetDataset class for OpthalAnonymizedDataset.
- Parameters:
diagnosis (Literal["precancerous", "fluid", "benign", "reference"]) – The diagnosis category for the dataset.
df (pd.DataFrame) – The DataFrame containing the dataset information.
images_dir (str | Path) – The directory path where the images are stored.
image_size (tuple[int, int], optional) – The desired size of the images. Defaults to (256, 512).
transform (nn.Module, optional) – The transformation to apply to the images. Defaults to None.
convert_image_to (Any, optional) – The function or transformation to convert the images to a specific format. Defaults to None.
seed (int, optional) – The random seed for shuffling the dataset. Defaults to None.
- datasets.get_csv_dataset(filepath, val_size=0.1, seed=42)[source]
Load dataset from csv file and split it into train validation,test sets from previously splitted train/test dataset.
- Parameters:
- Returns:
A dictionary containing the train, validation, and test sets as pandas DataFrames. If val_size is None, only the train and test sets are returned.
- Return type: