Skip to content

extract_shape_features

Extracts shape and spatial features (HIF features) from a slide mask. This CLI extracts two sets of features. The first set are 'whole slide features', where the entire mask label is considred as a single region and features are extracted. These features are useful for determining things like total area of x tissue.

The second set of features are 'regional features', where each label is split up according to their connectivity and features are extracted from these smaller regions. These features are useful for determining things like solidity of the top ten largest regions of tissue y. Pixel intensity values from the WSI are unused. In order to generate connected regions, skimage generates a mask itself where different values coorespond to different regions, which removes the tissue type information from the original mask. So, the original mask is passed as an intensity image to ensure that each region can be associated with a tissue type.

Args: slide_mask_urlpath (str): URL/path to slide mask (*.tif) label_cols (List[str]): list of labels that coorespond to those in slide_mask_urlpath output_urlpath (str): output URL/path prefix include_smaller_regions (bool): include the smaller regions (not just larget) storage_options (dict): storage options to pass to read functions output_storage_options (dict): storage options to pass to write functions local_config (str): local config YAML file

Returns:

Name Type Description
dict

output .tif path and the number of shapes for which features were generated

Source code in src/luna/pathology/cli/extract_shape_features.py
@timed
@save_metadata
def cli(
    slide_mask_urlpath: str = "???",
    label_cols: List[str] = "???",  # type: ignore
    output_urlpath: str = "???",  # type: ignore
    include_smaller_regions: bool = False,
    storage_options: dict = {},
    output_storage_options: dict = {},
    local_config: str = "",
):
    """Extracts shape and spatial features (HIF features) from a slide mask.
    This CLI extracts two sets of features. The first set are 'whole slide features', where
    the entire mask label is considred as a single region and features are extracted. These features
    are useful for determining things like total area of x tissue.

    The second set of features are 'regional features', where each label is split up according to
    their connectivity and features are extracted from these smaller regions.
    These features are useful for determining things like solidity of the top ten largest
    regions of tissue y. Pixel intensity values from the WSI are unused. In order to generate
    connected regions, skimage generates a mask itself where different values coorespond
    to different regions, which removes the tissue type information from the original mask.
    So, the original mask is passed as an intensity image to ensure that each region can be
    associated with a tissue type.

     Args:
        slide_mask_urlpath (str): URL/path to slide mask (*.tif)
        label_cols (List[str]): list of labels that coorespond to those in slide_mask_urlpath
        output_urlpath (str): output URL/path prefix
        include_smaller_regions (bool): include the smaller regions (not just larget)
        storage_options (dict): storage options to pass to read functions
        output_storage_options (dict): storage options to pass to write functions
        local_config (str): local config YAML file

    Returns:
        dict: output .tif path and the number of shapes for which features were generated

    """
    config = get_config(vars())

    with open(config["slide_mask_urlpath"], "rb", **config["storage_options"]) as of:
        mask = tifffile.imread(of)

    mask_values = {k: v + 1 for v, k in enumerate(config["label_cols"])}
    result_df = extract_shape_features(
        mask, mask_values, config["include_smaller_regions"]
    )

    fs, urlpath = fsspec.core.url_to_fs(
        config["output_urlpath"], **config["output_storage_options"]
    )

    output_fpath = Path(urlpath) / "shape_features.csv"
    with fs.open(output_fpath, "w") as of:
        result_df.to_csv(of)

    properties = {"shape_features": output_fpath, "num_shapes": len(result_df)}

    logger.info(properties)
    return properties