wsinfer.patchlib#

Submodules#

Attributes#

Functions#

get_avg_mpp(→ float)

Return the average MPP of a whole slide image.

get_wsi_cls(...)

get_multipolygon_from_binary_arr(...)

Create a Shapely Polygon from a binary array.

get_patch_coordinates_within_polygon(...)

Get coordinates of patches within a polygon.

segment_tissue(→ numpy.typing.NDArray[numpy.bool_])

Create a binary tissue mask from an image.

segment_and_patch_one_slide(, median_filter_size, ...)

Get non-overlapping patch coordinates in tissue regions of a whole slide image.

save_hdf5(→ None)

Write patch coordinates to HDF5 file.

draw_contours_on_thumbnail(→ PIL.Image.Image)

Draw contours onto an image.

segment_and_patch_directory_of_slides(, ...)

Get non-overlapping patch coordinates in tissue regions for a directory of whole

Package Contents#

wsinfer.patchlib.get_avg_mpp(slide_path: pathlib.Path | str) float[source]#

Return the average MPP of a whole slide image.

The value is in units of micrometers per pixel and is the average of the X and Y dimensions.

Raises:

CannotReadSpacing if the spacing cannot be read.

wsinfer.patchlib.get_wsi_cls() type[openslide.OpenSlide] | type[tiffslide.TiffSlide][source]#
wsinfer.patchlib.get_multipolygon_from_binary_arr(arr: numpy.typing.NDArray[numpy.int_], scale: tuple[float, float] | None = None) tuple[shapely.MultiPolygon, Sequence[numpy.typing.NDArray[numpy.int_]], numpy.typing.NDArray[numpy.int_]] | None[source]#

Create a Shapely Polygon from a binary array.

Parameters:
  • arr (array) – Binary array where non-zero values indicate presence of tissue.

  • scale (tuple of two floats, optional) – If specified, this is the factor by which coordinates are multiplied to recover the coordinates at the base resolution of the whole slide image.

Returns:

  • polygon – A shapely MultiPolygon object representing tissue regions.

  • contours – A sequence of arrays representing unscaled contours of tissue.

  • hierarchy – An array of the hierarchy of contours.

wsinfer.patchlib.get_patch_coordinates_within_polygon(slide_width: int, slide_height: int, patch_size: int, half_patch_size: int, polygon: shapely.Polygon, overlap: float = 0.0) numpy.typing.NDArray[numpy.int_][source]#

Get coordinates of patches within a polygon.

Parameters:
  • slide_width (int) – The width of the slide in pixels at base resolution.

  • slide_height (int) – The height of the slide in pixels at base resolution.

  • patch_size (int) – The size of a patch in pixels.

  • half_patch_size (int) – Half of the length of a patch in pixels.

  • polygon (Polygon) – A shapely Polygon representing the presence of tissue.

  • overlap (float) – The proportion of the patch_size to overlap. A value of 0.5 would have an overlap of 50%. A value of 0.2 would have an overlap of 20%. Negative values will add space between patches. A value of -1 would skip every other patch. Value must be in (-inf, 1). The default value of 0.0 produces non-overlapping patches.

Returns:

Array with shape (N, 2), where N is the number of tiles. Each row in this array contains the coordinates of the top-left of a tile: (minx, miny).

Return type:

coordinates

wsinfer.patchlib.segment_tissue(im_arr: numpy.typing.NDArray, median_filter_size: int = 7, binary_threshold: int = 7, closing_kernel_size: int = 6, min_object_size_px: int = 512, min_hole_size_px: int = 1024) numpy.typing.NDArray[numpy.bool_][source]#

Create a binary tissue mask from an image.

Parameters:
  • im_arr (array-like) – RGB image array (uint8) with shape (rows, cols, 3).

  • median_filter_size (int) – The kernel size for median filtering. Must be odd and greater than one.

  • binary_threshold (int) – The pixel threshold for image binarization.

  • closing_kernel_size (int) – The kernel size for morphological closing (in pixel units).

  • min_object_size_px (int) – The minimum area of an object in pixels. If an object is smaller than this area, it is removed and is made into background.

  • min_hole_size_px (int) – The minimum area of a hole in pixels. If a hole is smaller than this area, it is filled and is made into foreground.

Returns:

Boolean array, where True values indicate presence of tissue.

Return type:

mask

wsinfer.patchlib.logger[source]#
wsinfer.patchlib.MASKS_DIR = 'masks'[source]#
wsinfer.patchlib.PATCHES_DIR = 'patches'[source]#
wsinfer.patchlib.segment_and_patch_one_slide(slide_path: str | pathlib.Path, save_dir: str | pathlib.Path, patch_size_px: int, patch_spacing_um_px: float, thumbsize: tuple[int, int] = (2048, 2048), median_filter_size: int = 7, binary_threshold: int = 7, closing_kernel_size: int = 6, min_object_size_um2: float = 200**2, min_hole_size_um2: float = 190**2, overlap: float = 0.0) None[source]#

Get non-overlapping patch coordinates in tissue regions of a whole slide image.

Patch coordinates are saved to an HDF5 file in {save_dir}/patches/, and a tissue detection image is saved to {save_dir}/masks/ for quality control.

In general, this function takes the following steps:

  1. Get a low-resolution thumbnail of the image.

  2. Binarize the image to identify tissue regions.

  3. Process this binary image to remove artifacts.

  4. Create a regular grid of non-overlapping patches of specified size.

  5. Keep patches whose centroids are in tissue regions.

Parameters:
  • slide_path (str or Path) – The path to the whole slide image file.

  • save_dir (str or Path) – The directory in which to save patching results.

  • patch_size_px (int) – The length of one side of a square patch in pixels.

  • patch_spacing_um_px (float) – The physical spacing of patches in micrometers per pixels. This value multiplied by patch_size_px gives the physical length of a patch in micrometers.

  • thumbsize (tuple of two integers) – The size of the thumbnail to use for tissue detection. This specifies the largest possible bounding box of the thumbnail, and a thumbnail is taken to fit this space while maintaining the original aspect ratio of the whole slide image. Larger thumbnails will take longer to process but will result in better tissue masks.

  • median_filter_size (int) – The size of the kernel for median filtering. This value must be odd and greater than one. This is in units of pixels in the thumbnail.

  • binary_threshold (int) – The value at which the image in binarized. A higher value will keep less tissue.

  • closing_kernel_size (int) – The size of the kernel for a morphological closing operation. This is in units of pixels in the thumbnail.

  • min_object_size_um2 (float) – The minimum area of an object to keep, in units of micrometers squared. Any disconnected objects smaller than this area will be removed.

  • min_hole_size_um2 (float) – The minimum size of a hole to keep, in units of micrometers squared. Any hole smaller than this area will be filled and be considered tissue.

Return type:

None

wsinfer.patchlib.save_hdf5(path: str | pathlib.Path, coords: numpy.typing.NDArray[numpy.int_], patch_size: int, patch_spacing_um_px: float, compression: str | None = 'gzip') None[source]#

Write patch coordinates to HDF5 file.

This is designed to be interoperable with HDF5 files created by CLAM.

Parameters:
  • path (str or Path) – Path to save the HDF5 file.

  • coords (array) – Nx2 array of coordinates, where N is the number of patches. Each row of the array must be minx and miny to specify the top-left of the patch.

  • patch_size (int) – The size of patches in pixels at level 0 of the slide (base resolution).

  • patch_spacing_um_px (float) – The physical spacing of the patch in micrometers per pixel.

  • compression (str, optional) – Compression to use for storing coordinates. Default is “gzip”.

Return type:

None

wsinfer.patchlib.draw_contours_on_thumbnail(thumb: PIL.Image.Image, contours: Sequence[numpy.typing.NDArray[numpy.int_]], hierarchy: numpy.typing.NDArray[numpy.int_]) PIL.Image.Image[source]#

Draw contours onto an image.

Parameters:
  • thumb (Image.Image) – The thumbnail of the whole slide of the same size as the binary image used during contour detection.

  • contours (sequence of arrays) – The contours result of cv.findContours.

  • hierarchy (array) – The hierarchy result of cv.findContours.

Returns:

An image with contours burned in.

Return type:

Image.Image

wsinfer.patchlib.segment_and_patch_directory_of_slides(wsi_dir: str | pathlib.Path, save_dir: str | pathlib.Path, patch_size_px: int, patch_spacing_um_px: float, thumbsize: tuple[int, int] = (2048, 2048), median_filter_size: int = 7, binary_threshold: int = 7, closing_kernel_size: int = 6, min_object_size_um2: float = 200**2, min_hole_size_um2: float = 190**2, overlap: float = 0.0) None[source]#

Get non-overlapping patch coordinates in tissue regions for a directory of whole slide images.

Patch coordinates are saved to HDF5 files in {save_dir}/patches/, and tissue detection images are saved to {save_dir}/masks/ for quality control.

In general, this function takes the following steps for each whole slide image:

  1. Get a low-resolution thumbnail of the image.

  2. Binarize the image to identify tissue regions.

  3. Process this binary image to remove artifacts.

  4. Create a regular grid of non-overlapping patches of specified size.

  5. Keep patches whose centroids are in tissue regions.

Parameters:
  • wsi_dir (str or Path) – The directory of whole slide images. This must only contain whole slide images.

  • save_dir (str or Path) – The directory in which to save patching results.

  • patch_size_px (int) – The length of one side of a square patch in pixels.

  • patch_spacing_um_px (float) – The physical spacing of patches in micrometers per pixels. This value multiplied by patch_size_px gives the physical length of a patch in micrometers.

  • thumbsize (tuple of two integers) – The size of the thumbnail to use for tissue detection. This specifies the largest possible bounding box of the thumbnail, and a thumbnail is taken to fit this space while maintaining the original aspect ratio of the whole slide image. Larger thumbnails will take longer to process but will result in better tissue masks.

  • median_filter_size (int) – The size of the kernel for median filtering. This value must be odd and greater than one. This is in units of pixels in the thumbnail.

  • binary_threshold (int) – The value at which the image in binarized. A higher value will keep less tissue.

  • closing_kernel_size (int) – The size of the kernel for a morphological closing operation. This is in units of pixels in the thumbnail.

  • min_object_size_um2 (float) – The minimum area of an object to keep, in units of micrometers squared. Any disconnected objects smaller than this area will be removed.

  • min_hole_size_um2 (float) – The minimum size of a hole to keep, in units of micrometers squared. Any hole smaller than this area will be filled and be considered tissue.

Return type:

None