Usage Guide

Running Medicraft Project

Medicraft is designed to facilitate medical imaging analysis by defining pipelines in configuration files. You can run the project by specifying the necessary parameters and steps in a configuration file.

Note

In this project, the dataset is needed. However, it is private and contains sensitive data. To access this dataset, you must contact the author. Permission will be required for the data to be made available, and it will be considered on an individual basis.

To run the project, you need to create a config.yml file and use the following command:

python src/main.py -f config.yml

Usage Guide

The configuration file is divided into several sections:

general: Contains general settings like image size and model parameters.
data: Specifies data paths, validation split, and seed for data splitting.
experiment: Defines the sequence of steps (loop) for training and validation.
output: Specifies the directory for saving results.

Example Configuration Structure

Here is an example structure of the configuration file:

general:
  total_steps: 50
  image_size: [256,512]
  models:
    unet:
      dim: 64
      dim_mults: [1, 2, 4, 8]
      channels: 1
    diffusion:
      timesteps: 1000
    classifier:
      architecture: resnet34
      pretrained: False

data:
  validation_split: 0.1 / 0.9
  csv_file_path:  <dataset-csv-file-path>
  split_seed: 42

experiment:
  loop:
    - name: train_generator
      diagnosis: reference
      lr: 1e-4
      num_steps: 10
      batch_size: 32
      save_and_sample_every: 2000
    - name: generate_samples
      relative_dataset_results_dir: <relative-dataset-results-directory>
      num_samples: 1000
      batch_size: 8
      base_on: reference
      model_version: latest
      wandb: false
    - name: validate
      repeat: false
      classification:
        loss_fn: cross_entropy
        epochs: 15
        lr: 1e-4
      train_data_type: real
      train_dataset_dir: <train-dataset-directory>
      val_dataset_dir: <validation-dataset-directory>
      test_dataset_dir : <test-dataset-directory>
      logger_experiment_name: <experiment-name>

output:
  results_dir: .results
  copy_results_to: <path-to-external-storage>

The entire process consists of 50 total steps. Within these, there are 10 training steps, implying that the training loop will iterate 5 times to complete all steps. The sequence of actions in each iteration is as follows:

Train the Generator: In the initial phase of each iteration, the generator model is trained. This step adjusts the model parameters to improve the quality of generated samples.
Generate Samples: After training the generator, the next step is to produce new samples using the updated generator model. These samples are used to evaluate the performance and progression of the training.
Validate the Model: Finally, the model undergoes validation. This step is crucial for assessing the model’s performance on a validation dataset. Unlike the training and sample generation steps, the validation step is executed only once, due to repeat=False parameter.

By following this sequence, the process ensures that the generator is progressively improved, new samples are evaluated, and the model’s performance is validated in a systematic manner.

For additional examples and further details, refer to the Config examples section.