Extracts shape and spatial features (HIF features) from a slide mask.
This CLI extracts two sets of features. The first set are 'whole slide features', where
the entire mask label is considred as a single region and features are extracted. These features
are useful for determining things like total area of x tissue.
The second set of features are 'regional features', where each label is split up according to
their connectivity and features are extracted from these smaller regions.
These features are useful for determining things like solidity of the top ten largest
regions of tissue y. Pixel intensity values from the WSI are unused. In order to generate
connected regions, skimage generates a mask itself where different values coorespond
to different regions, which removes the tissue type information from the original mask.
So, the original mask is passed as an intensity image to ensure that each region can be
associated with a tissue type.
Args:
slide_mask_urlpath (str): URL/path to slide mask (*.tif)
label_cols (List[str]): list of labels that coorespond to those in slide_mask_urlpath
output_urlpath (str): output URL/path prefix
include_smaller_regions (bool): include the smaller regions (not just larget)
storage_options (dict): storage options to pass to read functions
output_storage_options (dict): storage options to pass to write functions
local_config (str): local config YAML file
Returns:
Name | Type |
Description |
dict |
|
output .tif path and the number of shapes for which features were generated
|
Source code in src/luna/pathology/cli/extract_shape_features.py
| @timed
@save_metadata
def cli(
slide_mask_urlpath: str = "???",
label_cols: List[str] = "???", # type: ignore
output_urlpath: str = "???", # type: ignore
include_smaller_regions: bool = False,
storage_options: dict = {},
output_storage_options: dict = {},
local_config: str = "",
):
"""Extracts shape and spatial features (HIF features) from a slide mask.
This CLI extracts two sets of features. The first set are 'whole slide features', where
the entire mask label is considred as a single region and features are extracted. These features
are useful for determining things like total area of x tissue.
The second set of features are 'regional features', where each label is split up according to
their connectivity and features are extracted from these smaller regions.
These features are useful for determining things like solidity of the top ten largest
regions of tissue y. Pixel intensity values from the WSI are unused. In order to generate
connected regions, skimage generates a mask itself where different values coorespond
to different regions, which removes the tissue type information from the original mask.
So, the original mask is passed as an intensity image to ensure that each region can be
associated with a tissue type.
Args:
slide_mask_urlpath (str): URL/path to slide mask (*.tif)
label_cols (List[str]): list of labels that coorespond to those in slide_mask_urlpath
output_urlpath (str): output URL/path prefix
include_smaller_regions (bool): include the smaller regions (not just larget)
storage_options (dict): storage options to pass to read functions
output_storage_options (dict): storage options to pass to write functions
local_config (str): local config YAML file
Returns:
dict: output .tif path and the number of shapes for which features were generated
"""
config = get_config(vars())
with open(config["slide_mask_urlpath"], "rb", **config["storage_options"]) as of:
mask = tifffile.imread(of)
mask_values = {k: v + 1 for v, k in enumerate(config["label_cols"])}
result_df = extract_shape_features(
mask, mask_values, config["include_smaller_regions"]
)
fs, urlpath = fsspec.core.url_to_fs(
config["output_urlpath"], **config["output_storage_options"]
)
output_fpath = Path(urlpath) / "shape_features.csv"
with fs.open(output_fpath, "w") as of:
result_df.to_csv(of)
properties = {"shape_features": output_fpath, "num_shapes": len(result_df)}
logger.info(properties)
return properties
|