User Guide#

This guide assumes that you have installed WSInfer. If you have not, please see Installing and getting started.

It also assumes that you have a directory with at least one whole slide image. If you do not, you can download a sample image from https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/.

The rest of this page assumes that slides are saved to the directory slides.

Citation

If you find our work useful, please cite our paper https://doi.org/10.1038/s41698-024-00499-9.

Kaczmarzyk, J.R., O’Callaghan, A., Inglis, F. et al. Open and reusable deep learning for pathology with WSInfer and QuPath. npj Precis. Onc. 8, 9 (2024). https://doi.org/10.1038/s41698-024-00499-9

Video walkthrough#

In this video, Jakub demonstrates from beginning to end how one can use WSInfer to detect tumor regions in a whole slide image. He shows how to install PyTorch and WSInfer, use the command line wsinfer tool, and visualize results in QuPath.

Getting help#

If you read the documentation but still have questions, need help, have feedback, found a bug, or just want to chat, please submit a new issue on our GitHub repo!

Get help on the command line#

Most command line tools in macOS and Linux can help you with the --help flag. For example

wsinfer --help

and

wsinfer run --help

That will show you different subcommands, options, and expected inputs.

List available models#

WSInfer includes a Zoo of pretrained models. List them with the wsinfer-zoo command line tool, which is installed automatically with WSInfer. Please not the difference in the names wsinfer-zoo and wsinfer.

wsinfer-zoo ls

Run model inference#

The model inference workflow will separate each slide into patches and run model inference on all patches. The results directory will include the model outputs, patch coordinates, and metadata about the run.

To list available --model options, use wsinfer-zoo ls.

Here is an example of the minimum command-line for wsinfer run (with arguments --wsi-dir, --results-dir, and --model).

wsinfer run \
   --wsi-dir slides/ \
   --results-dir results/ \
   --model breast-tumor-resnet34.tcga-brca \

See wsinfer run --help for more a list of options and how to use them.

The option --wsi-dir is a directory containing only whole slide images. The option --results-dir is the path in which outputs are saved. The option --model is the name of a model available in WSInfer. The model weights and configuration are downloaded from HuggingFace Hub. If you would like to use your own model, see Use your own model.

Outputs of model inference#

The results directory will have several directories in it. We’ll go over them now.

results
├── masks
│   ├── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.jpg
│   └── TCGA-3L-AA1B-01Z-00-DX1.8923A151-A690-40B7-9E5A-FCBEDFC2394F.jpg
├── model-outputs-csv
│   ├── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.csv
│   └── TCGA-3L-AA1B-01Z-00-DX1.8923A151-A690-40B7-9E5A-FCBEDFC2394F.csv
├── model-outputs-geojson
│   ├── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.json
│   └── TCGA-3L-AA1B-01Z-00-DX1.8923A151-A690-40B7-9E5A-FCBEDFC2394F.json
├── patches
│   ├── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.h5
│   └── TCGA-3L-AA1B-01Z-00-DX1.8923A151-A690-40B7-9E5A-FCBEDFC2394F.h5
└── run_metadata_20231110T235210.json

This hierarchy is inspired by CLAM’s outputs. The masks directory contains JPEG images with thumbnails of the images and contours of the tissue and holes. The directory model-outputs-csv contains one CSV per slide, and each CSV contains the patchwise model outputs. Each row is a different patch. Here are the feirst few rows of a sample CSV

minx,miny,width,height,prob_Tumor
4200,27300,2100,2100,6.4415544e-05
4200,29400,2100,2100,9.763688e-05
4200,31500,2100,2100,0.03654445

The directory model-outputs-geojson contains the same information as the CSVs but in GeoJSON format. GeoJSON is well-suited for spatial data, and QuPath can read it! Just drag and drop the GeoJSON file into the QuPath window, and all of the patches and their model outputs will be appear. The directory patches contains HDF5 files of the patch coordinates. Last, there is a JSON file containing metadata about this run. This has a timestamp in the filename in case you run inference multiple times to the same directory.

Run model inference in containers#

See https://hub.docker.com/r/kaczmarj/wsinfer/tags for all available containers.

The “base” image kaczmarj/wsinfer includes wsinfer and all of its runtime dependencies. It does not, however, include the downloaded model weights. Running a model will automatically download the weight, but these weights will be removed once the container is stopped.

Note

The image kaczmarj/wsinfer does not include downloaded models. The models are downloaded automatically to ~/.cache but will be lost when the container is stopped if ~/.cache is not mounted.

Apptainer/Singularity#

We use apptainer in this example. You can replace that name with singularity if you do not have apptainer.

Pull the container:

apptainer pull docker://kaczmarj/wsinfer:latest

Run inference:

apptainer run \
   --nv \
   --bind $(pwd) \
   --env CUDA_VISIBLE_DEVICES=0 \
   wsinfer_latest.sif run \
      --wsi-dir slides/ \
      --results-dir results/ \
      --model breast-tumor-resnet34.tcga-brca

Docker#

This requires Docker >=19.03 and the program nvidia-container-runtime-hook. Please see the Docker documentation for more information. If you do not have a GPU installed, you can use CPU by removing --gpus all from the command.

We use --user $(id -u):$(id -g) to run the container as a non-root user (as ourself). This way, the output files are owned by us. Without specifying this option, the output files would be owned by the root user.

When mounting data, keep in mind that the workdir in the Docker container is /work (one can override this with --workdir). Relative paths must be relative to the workdir.

One should mount their $HOME directory onto the container. The registry of trained models (a JSON file) is downloaded to ~/.wsinfer-zoo-registry.json, and trained models are downloaded to ~/.cache/huggingface/.

Note

Mount $HOME into the container.

Note

Using --num_workers > 0 will require a --shm-size > 256mb. If the shm size is too low, a “bus error” will be thrown.

Pull the Docker image:

docker pull kaczmarj/wsinfer:latest

Run inference:

docker run --rm -it \
   --user $(id -u):$(id -g) \
   --mount type=bind,source=$HOME,target=$HOME \
   --mount type=bind,source=$(pwd),target=/work/ \
   --gpus all \
   --env CUDA_VISIBLE_DEVICES=0 \
   --env HOME=$HOME \
   --shm-size 512m \
   kaczmarj/wsinfer:latest run \
      --wsi-dir /work/slides/ \
      --results-dir /work/results/ \
      --model breast-tumor-resnet34.tcga-brca

Use your own model#

WSInfer uses JSON configuration files to specify information required to run a patch classification model.

You can validate this configuration JSON file with

wsinfer-zoo validate-config config.json

Once you create the configuration file, use the config with wsinfer run:

wsinfer run --wsi-dir slides/ --results-dir results/ --model-path path/to/torchscript.pt --config config.json

Convert model outputs to Stony Brook format (QuIP)#

The QuIP whole slide image viewer uses a particular format consisting of JSON and table files.

wsinfer tosbu \
   --wsi-dir slides/ \
   --execution-id UNIQUE_ID_HERE \
   --study-id STUDY_ID_HERE \
   --make-color-text \
   --num-processes 16 \
   results/ \
   results/model-outputs-sbubmi/