Skip to content

Reference

The following sections detail the configuration parameters for each step of the pipeline.

General Settings

This section defines the fundamental paths and naming conventions for the analysis run, including the source image directory and the output location.

general:
  image_dir: C:/path/to/images
  analysis_name: experiment_01
  analysis_root_dir: C:/analysis_out
  log_dir: null
Key Type Description
image_dir Path Source directory with tiff images. In local mode, image_dir should point to a folder on a local system. In Globus mode, image_dir should reflect the remote directory on the Globus endpoint.
analysis_name str Name of the analysis run.
analysis_root_dir Path Analysis output base directory. Analysis will be saved in analysis_root_dir/analysis_name referred to as analysis_dir.
log_dir Path (optional) Custom log directory. Defaults to analysis_dir/logs.

ROI Definition

This section configures the automatic detection of tissue cores. It specifies the image used for detection and the parameters for the Segment Anything Model (SAM2) to accurately identify core boundaries.

core_detection:
  detection_image: "BLCA-1_1.0.4_R000_DAPI__FINAL_F.ome.tif"
  core_info_file_path: null
  im_level: 6
Key Type Description
detection_image str Required. Name of the image in image_dir used for defining cores.
roi_info_file_path Path (optional) Custom path for core coordinates. Defaults to analysis_dir/rois.pkl.
im_level float (optional) Pyramid level to read from the image for core definition. If not specified, all pyramid levels are read from the input image.

ROI Cutting

This section controls the extraction of individual ROIs from the original whole-slide images. It allows for precise channel selection, definition of output directories, and configuration of core processing parameters such as margins and masking.

core_cutting:
  cores_dir_tif: null
  cores_dir_output: null

  include_channels:
  exclude_channels:
    - 008_ECad
  use_markers:
  ignore_markers:
    - Antibody1
  margin: 0
  mask_value: 0
  transfer_cleanup_enabled: True
  temp_roi_delete: True
Key Type Description
cores_dir_tif Path (optional) Temporary folder to store extracted TIFFs for each core. Defaults to analysis_dir/temp.
cores_dir_output Path (optional) Final destination for SpatialData (Zarr) outputs. Defaults to analysis_dir/cores.
include_channels list[str] (optional) List of channel names to include.
exclude_channels list[str] (optional) List of channel names to exclude.
use_markers list[str] (optional) List to restrict markers to analyze.
ignore_markers list[str] (optional) List to ignore markers.
margin int (optional) Number of pixels to pad around each bounding box when cutting cores. Defaults to 0.
mask_value int (optional) Value used to fill background for polygonal core masks. Defaults to 0.
transfer_cleanup_enabled bool (optional) Whether to delete temporary files downloaded via Globus after the run. Defaults to False.
temp_roi_delete bool (optional) Whether to delete TIFFs from the temporary storage in cores_dir_tif after core assembly. Defaults to False. For details see Input Data.

Parameters include_channels, exclude_channels, use_markers and ignore_markers provide a fine-grained control over which imaging channels are included in processing. See Channel Selection Logic for details.

Quality Control

This section manages quality control parameters, specifically defining prefixes for exclusion masks. These masks are used to filter out artifacts or unwanted regions from downstream analysis.

qc:
  prefix: qc_exclude
Key Type Description
prefix str (optional) Prefix for shapes used for quality control exclusion. Shapes named {prefix}_{marker} will be used to mask out objects for specific markers.

Image Processing

additional_elements:

  - category: image_enhancer
    type: normalize
    input: [DAPI, HLA1]
    output: "${input}_norm"
    parameters:
      low: 1
      high: 99.8
    keep: false

  - category: object_segmenter
    type: instanseg
    parameters:
      model: fluorescence_nuclei_and_cells
      pixel_size: 0.3
      resolve_cell_and_nucleus: true
      cleanup_fragments: true
      clean_cache: true
      normalise: false
    input:
      - DAPI_norm
      - HLA1_norm
    output:
      - instanseg_nucleus
      - instanseg_cell
    keep: true

  - category: mask_builder
    type: ring
    input:
        - instanseg_nucleus
    output: ring
    parameters:
      outer: 8
      inner: 2
    keep: true

The pipeline allows for flexible image processing steps defined in the Processors list. Each entry in this list is a processing unit that takes inputs (images or labels), performs an operation, and produces outputs. Processing steps are executed sequentially, meaning the order of the list determines the flow of data, and the output of one step is typically required for the next.

Structure of an Element

Key Type Description
category str The category of the operation. Options: image_enhancer, object_segmenter, mask_builder.
type str The specific operation name (e.g., normalize, instanseg).
input str | list The name(s) of input images or channels.
output str | list The name(s) assigned to the results. Supports variable expansion like ${input}_norm.
parameters dict (optional) Specific parameters for the operation.
keep bool (optional) Whether to save the output to the final Zarr file (true) or keep it temporary (false). Defaults to false.

For a complete list of available operations and their parameters, see Processors.

Quantification

The quant section allows you to define one or more quantification tasks. Each entry in the list corresponds to a separate AnnData table that will be generated and stored in the SpatialData object. This is useful if you have multiple segmentation results (e.g., cells and nuclei, or different cell segmentation models) and want to quantify them independently. Example below specifies only a single AnnData table to be created ('instanseg_table').

quant:
  - name: instanseg_table
    masks:
      nucleus: instanseg_nucleus
      cell: instanseg_cell
      ring: ring
      cyto: cytoplasm
    layer_connection: instanseg_cell
    morphological_properties:
      - label
      - centroid
      - area
    intensity_properties:
      - median
    markers_to_quantify:
      - DAPI
      - HLA1
    add_qc_masks: True
Key Type Description
name str Name of the output AnnData table to be saved in the SpatialData object.
masks dict[str, str] Dictionary mapping suffixes (e.g., nucleus, cell) to the actual mask layer names in the SpatialData object.
layer_connection str (optional) The mask layer name to which the table should be linked (e.g. for visualization in Napari Spatialdata plugin).
morphological_properties list[str] (optional) List of morphological features to calculate. 'Label' is added automatically if absent from the custom list to identify objects. Defaults to ["label", "centroid", "area", "eccentricity", "solidity", "perimeter", "euler_number"].
intensity_properties list[str] (optional) List of intensity metrics to calculate. Defaults to ['mean', 'median'].
markers_to_quantify list[str] (optional) List of specific markers to quantify itensity properties. If omitted, all available channels are quantified.
qc_to_table bool (optional) If True, uses polygons defined in the QC step to create a mask layer in the AnnData table indicating which objects are from the accepted regions. Defaults to False.

For morphological_properties, any property supported by skimage.measure.regionprops can be used.

For intensity_properties, currently implemented metrics include mean, median min, max, std and sum. If you require other metrics, please open an issue on GitHub.

Storage Settings

This section defines the storage parameters for the resulting SpatialData objects (Zarr files). It controls performance-related settings such as chunk sizes and the generation of multi-scale image pyramids.

sdata_storage:
  chunk_size: [1, 512, 512]
  max_pyramid_level: 3
  downscale: 2
Key Type Description
chunk_size list[int] (optional) Dimensions of the data chunks stored in the Zarr array Defaults to [1, 512, 512].
max_pyramid_level int (optional) Number of multiscale pyramid levels to generate. Defaults to 4.
downscale int (optional) Downsampling factor applied between consecutive pyramid levels. Defaults to 2.