Usage Guide
Running Medicraft Project
Medicraft is designed to facilitate medical imaging analysis by defining pipelines in configuration files. You can run the project by specifying the necessary parameters and steps in a configuration file.
Note
In this project, the dataset is needed. However, it is private and contains sensitive data. To access this dataset, you must contact the author. Permission will be required for the data to be made available, and it will be considered on an individual basis.
To run the project, you need to create a config.yml file and use the following command:
python src/main.py -f config.yml
Usage Guide
The configuration file is divided into several sections:
general: Contains general settings like image size and model parameters.
data: Specifies data paths, validation split, and seed for data splitting.
experiment: Defines the sequence of steps (loop) for training and validation.
output: Specifies the directory for saving results.
Example Configuration Structure
Here is an example structure of the configuration file:
general:
total_steps: 50
image_size: [256,512]
models:
unet:
dim: 64
dim_mults: [1, 2, 4, 8]
channels: 1
diffusion:
timesteps: 1000
classifier:
architecture: resnet34
pretrained: False
data:
validation_split: 0.1 / 0.9
csv_file_path: <dataset-csv-file-path>
split_seed: 42
experiment:
loop:
- name: train_generator
diagnosis: reference
lr: 1e-4
num_steps: 10
batch_size: 32
save_and_sample_every: 2000
- name: generate_samples
relative_dataset_results_dir: <relative-dataset-results-directory>
num_samples: 1000
batch_size: 8
base_on: reference
model_version: latest
wandb: false
- name: validate
repeat: false
classification:
loss_fn: cross_entropy
epochs: 15
lr: 1e-4
train_data_type: real
train_dataset_dir: <train-dataset-directory>
val_dataset_dir: <validation-dataset-directory>
test_dataset_dir : <test-dataset-directory>
logger_experiment_name: <experiment-name>
output:
results_dir: .results
copy_results_to: <path-to-external-storage>
The entire process consists of 50 total steps. Within these, there are 10 training steps, implying that the training loop will iterate 5 times to complete all steps. The sequence of actions in each iteration is as follows:
Train the Generator: In the initial phase of each iteration, the generator model is trained. This step adjusts the model parameters to improve the quality of generated samples.
Generate Samples: After training the generator, the next step is to produce new samples using the updated generator model. These samples are used to evaluate the performance and progression of the training.
Validate the Model: Finally, the model undergoes validation. This step is crucial for assessing the model’s performance on a validation dataset. Unlike the training and sample generation steps, the validation step is executed only once, due to repeat=False parameter.
By following this sequence, the process ensures that the generator is progressively improved, new samples are evaluated, and the model’s performance is validated in a systematic manner.
For additional examples and further details, refer to the Config examples section.