Reference
The following sections detail the configuration parameters for each step of the pipeline.
General Settings
This section defines the fundamental paths and naming conventions for the analysis run, including the source image directory and the output location.
general:
image_dir: C:/path/to/images
analysis_name: experiment_01
analysis_root_dir: C:/analysis_out
log_dir: null
| Key | Type | Description |
|---|---|---|
image_dir |
Path |
Source directory with tiff images. In local mode, image_dir should point to a folder on a local system. In Globus mode, image_dir should reflect the remote directory on the Globus endpoint. |
analysis_name |
str |
Name of the analysis run. |
analysis_root_dir |
Path |
Analysis output base directory. Analysis will be saved in analysis_root_dir/analysis_name referred to as analysis_dir. |
log_dir |
Path (optional) |
Custom log directory. Defaults to analysis_dir/logs. |
ROI Definition
This section configures the automatic detection of tissue cores. It specifies the image used for detection and the parameters for the Segment Anything Model (SAM2) to accurately identify core boundaries.
core_detection:
detection_image: "BLCA-1_1.0.4_R000_DAPI__FINAL_F.ome.tif"
core_info_file_path: null
im_level: 6
| Key | Type | Description |
|---|---|---|
detection_image |
str |
Required. Name of the image in image_dir used for defining cores. |
roi_info_file_path |
Path (optional) |
Custom path for core coordinates. Defaults to analysis_dir/rois.pkl. |
im_level |
float (optional) |
Pyramid level to read from the image for core definition. If not specified, all pyramid levels are read from the input image. |
ROI Cutting
This section controls the extraction of individual ROIs from the original whole-slide images. It allows for precise channel selection, definition of output directories, and configuration of core processing parameters such as margins and masking.
core_cutting:
cores_dir_tif: null
cores_dir_output: null
include_channels:
exclude_channels:
- 008_ECad
use_markers:
ignore_markers:
- Antibody1
margin: 0
mask_value: 0
transfer_cleanup_enabled: True
temp_roi_delete: True
| Key | Type | Description |
|---|---|---|
cores_dir_tif |
Path (optional) |
Temporary folder to store extracted TIFFs for each core. Defaults to analysis_dir/temp. |
cores_dir_output |
Path (optional) |
Final destination for SpatialData (Zarr) outputs. Defaults to analysis_dir/cores. |
include_channels |
list[str] (optional) |
List of channel names to include. |
exclude_channels |
list[str] (optional) |
List of channel names to exclude. |
use_markers |
list[str] (optional) |
List to restrict markers to analyze. |
ignore_markers |
list[str] (optional) |
List to ignore markers. |
margin |
int (optional) |
Number of pixels to pad around each bounding box when cutting cores. Defaults to 0. |
mask_value |
int (optional) |
Value used to fill background for polygonal core masks. Defaults to 0. |
transfer_cleanup_enabled |
bool (optional) |
Whether to delete temporary files downloaded via Globus after the run. Defaults to False. |
temp_roi_delete |
bool (optional) |
Whether to delete TIFFs from the temporary storage in cores_dir_tif after core assembly. Defaults to False. For details see Input Data. |
Parameters include_channels, exclude_channels, use_markers and ignore_markers provide a fine-grained control over which imaging channels are included in processing. See Channel Selection Logic for details.
Quality Control
This section manages quality control parameters, specifically defining prefixes for exclusion masks. These masks are used to filter out artifacts or unwanted regions from downstream analysis.
qc:
prefix: qc_exclude
| Key | Type | Description |
|---|---|---|
prefix |
str (optional) |
Prefix for shapes used for quality control exclusion. Shapes named {prefix}_{marker} will be used to mask out objects for specific markers. |
Image Processing
additional_elements:
- category: image_enhancer
type: normalize
input: [DAPI, HLA1]
output: "${input}_norm"
parameters:
low: 1
high: 99.8
keep: false
- category: object_segmenter
type: instanseg
parameters:
model: fluorescence_nuclei_and_cells
pixel_size: 0.3
resolve_cell_and_nucleus: true
cleanup_fragments: true
clean_cache: true
normalise: false
input:
- DAPI_norm
- HLA1_norm
output:
- instanseg_nucleus
- instanseg_cell
keep: true
- category: mask_builder
type: ring
input:
- instanseg_nucleus
output: ring
parameters:
outer: 8
inner: 2
keep: true
The pipeline allows for flexible image processing steps defined in the Processors list. Each entry in this list is a processing unit that takes inputs (images or labels), performs an operation, and produces outputs. Processing steps are executed sequentially, meaning the order of the list determines the flow of data, and the output of one step is typically required for the next.
Structure of an Element
| Key | Type | Description |
|---|---|---|
category |
str |
The category of the operation. Options: image_enhancer, object_segmenter, mask_builder. |
type |
str |
The specific operation name (e.g., normalize, instanseg). |
input |
str | list |
The name(s) of input images or channels. |
output |
str | list |
The name(s) assigned to the results. Supports variable expansion like ${input}_norm. |
parameters |
dict (optional) |
Specific parameters for the operation. |
keep |
bool (optional) |
Whether to save the output to the final Zarr file (true) or keep it temporary (false). Defaults to false. |
For a complete list of available operations and their parameters, see Processors.
Quantification
The quant section allows you to define one or more quantification tasks.
Each entry in the list corresponds to a separate AnnData table that will be generated and stored in the SpatialData object.
This is useful if you have multiple segmentation results (e.g., cells and nuclei, or different cell segmentation models) and want to quantify them independently.
Example below specifies only a single AnnData table to be created ('instanseg_table').
quant:
- name: instanseg_table
masks:
nucleus: instanseg_nucleus
cell: instanseg_cell
ring: ring
cyto: cytoplasm
layer_connection: instanseg_cell
morphological_properties:
- label
- centroid
- area
intensity_properties:
- median
markers_to_quantify:
- DAPI
- HLA1
add_qc_masks: True
| Key | Type | Description |
|---|---|---|
name |
str |
Name of the output AnnData table to be saved in the SpatialData object. |
masks |
dict[str, str] |
Dictionary mapping suffixes (e.g., nucleus, cell) to the actual mask layer names in the SpatialData object. |
layer_connection |
str (optional) |
The mask layer name to which the table should be linked (e.g. for visualization in Napari Spatialdata plugin). |
morphological_properties |
list[str] (optional) |
List of morphological features to calculate. 'Label' is added automatically if absent from the custom list to identify objects. Defaults to ["label", "centroid", "area", "eccentricity", "solidity", "perimeter", "euler_number"]. |
intensity_properties |
list[str] (optional) |
List of intensity metrics to calculate. Defaults to ['mean', 'median']. |
markers_to_quantify |
list[str] (optional) |
List of specific markers to quantify itensity properties. If omitted, all available channels are quantified. |
qc_to_table |
bool (optional) |
If True, uses polygons defined in the QC step to create a mask layer in the AnnData table indicating which objects are from the accepted regions. Defaults to False. |
For morphological_properties, any property supported by skimage.measure.regionprops can be used.
For intensity_properties, currently implemented metrics include mean, median min, max, std and sum. If you require other metrics, please open an issue on GitHub.
Storage Settings
This section defines the storage parameters for the resulting SpatialData objects (Zarr files). It controls performance-related settings such as chunk sizes and the generation of multi-scale image pyramids.
sdata_storage:
chunk_size: [1, 512, 512]
max_pyramid_level: 3
downscale: 2
| Key | Type | Description |
|---|---|---|
| chunk_size | list[int] (optional) | Dimensions of the data chunks stored in the Zarr array Defaults to [1, 512, 512]. |
| max_pyramid_level | int (optional) | Number of multiscale pyramid levels to generate. Defaults to 4. |
| downscale | int (optional) | Downsampling factor applied between consecutive pyramid levels. Defaults to 2. |