Tile Generation Tutorial (File Edition)¶
Welcome to the tile generation tutorial!
As a whole slide image is too large for deep learning model training, a slide is often divded into a set of small tiles, and used for training. For tile-based whole slide image analysis, generating tiles and labels is an important and laborious step. With LUNA tiling CLIs and tutorials, you can easily generate tile labels and get your data ready for downstream analysis. In this notebook, we will see how to generate tiles and labels using LUNA tiling CLIs. Here are the main steps we will review:
- Load slides
- Generate tiles, labels
- Collect tiles for model training
Through out this notebook, we will use different method parameter files. Please refer to the example parameter files in the configs
directory to follow these steps.
import os
HOME = os.environ['HOME']
LUNA_HOME = f"{HOME}/vmount"
PROJECT = "PRO-12-123"
SLIDE_ID = "01OV002-bd8cdc70-3d46-40ae-99c4-90ef77"
DATASET_DIR = f"{LUNA_HOME}/{PROJECT}/data/toy_data_set"
ANNOTATION_DIR = f"{DATASET_DIR}/table/ANNOTATIONS"
TILING_DIR = f"{LUNA_HOME}/{PROJECT}/tiling"
SLIDE = f"{DATASET_DIR}/{SLIDE_ID}.svs"
Initially, we'll walk through each CLI step manually-- then run them using the Luna CLI client in parallel
First, we generate tiles given a slide image of size 128 at 20x, and save them
!generate_tiles {SLIDE} \
--tile_size 128 \
--requested_magnification 10 \
--output-urlpath {TILING_DIR}/test/tiles
saving to /home/pollardw/vmount/PRO-12-123/tiling/test/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.tiles.parquet 2023-08-03 02:29:40.526 | DEBUG | luna.common.utils:wrapper:146 - cli ran in 9.95s
!detect_tissue {SLIDE} \
{TILING_DIR}/test/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.tiles.parquet \
--requested_magnification 2 \
--filter_query "otsu_score > 0.1" \
--output-urlpath {TILING_DIR}/test/detect
2023-08-03 02:29:44.980 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:282 - Slide dimensions (53760, 54840) 2023-08-03 02:29:44.980 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:286 - Thumbnail scale factor: 20 2023-08-03 02:29:47.116 | DEBUG | luna.common.utils:wrapper:146 - get_downscaled_thumbnail ran in 2.14s 2023-08-03 02:29:47.117 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:289 - Sample array size: (2742, 2688, 3) 2023-08-03 02:29:47.123 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:292 - Slide dimensions (53760, 54840) 2023-08-03 02:29:47.124 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:296 - Thumbnail scale factor: 20 2023-08-03 02:29:49.334 | DEBUG | luna.common.utils:wrapper:146 - get_downscaled_thumbnail ran in 2.21s 2023-08-03 02:29:49.334 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:299 - Sample array size: (2742, 2688, 3) 2023-08-03 02:29:50.509 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:307 - Enhancing image... 2023-08-03 02:29:51.597 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:319 - HSV space conversion... 2023-08-03 02:29:53.011 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:329 - Calculating max saturation... 2023-08-03 02:29:53.597 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:339 - Calculate and filter shadow mask... 2023-08-03 02:29:54.267 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:347 - Filter out shadow/dust/etc... 2023-08-03 02:29:55.782 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:359 - Calculating otsu threshold... 2023-08-03 02:29:56.107 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:362 - Calculating stain vectors... 2023-08-03 02:29:57.876 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:365 - Calculating stain background thresholds... 2023-08-03 02:29:57.876 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:366 - Channel 0 2023-08-03 02:30:00.109 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:372 - Channel 1 2023-08-03 02:30:02.053 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:381 - Saving otsu mask 2023-08-03 02:30:02.681 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:391 - Saving stain thumbnail 2023-08-03 02:30:05.850 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:398 - Saving stain masks 2023-08-03 02:30:06.770 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:421 - Starting otsu thresholding, threshold=0.7305114077818627 2023-08-03 02:31:22.478 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:468 - Filtering based on query: otsu_score > 0.1 2023-08-03 02:31:22.487 | INFO | luna.pathology.cli.run_tissue_detection:detect_tissue:471 - address ... otsu_score 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 292 x3_y81_z10 ... 0.12 295 x3_y84_z10 ... 0.16 296 x3_y85_z10 ... 0.32 297 x3_y86_z10 ... 0.64 298 x3_y87_z10 ... 0.43 ... ... ... ... 9816 x93_y65_z10 ... 0.32 9817 x93_y66_z10 ... 0.34 9818 x93_y67_z10 ... 0.58 9819 x93_y68_z10 ... 0.68 9820 x93_y69_z10 ... 0.33 [5088 rows x 8 columns] saving to /home/pollardw/vmount/PRO-12-123/tiling/test/detect/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.tiles.parquet 2023-08-03 02:31:22.559 | DEBUG | luna.common.utils:wrapper:146 - cli ran in 97.81s
!label_tiles \
"{DATASET_DIR}/table/ANNOTATIONS/slide_annotation_dataset_TCGA collection_ov_regional.parquet" \
"{TILING_DIR}/test/detect/{SLIDE_ID}.tiles.parquet" \
{SLIDE_ID} \
--output-urlpath "{TILING_DIR}/test/label"
2023-08-03 02:31:25.464 | INFO | luna.pathology.cli.generate_tile_labels:generate_tile_labels:88 - slide_id=01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 /home/pollardw/vmount/PRO-12-123/data/toy_data_set/table/ANNOTATIONS/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.annotation.geojson TCGA collection ov_regional 100%|█████████████████████████████████████| 5088/5088 [00:00<00:00, 6020.64it/s] 2023-08-03 02:31:26.365 | INFO | luna.pathology.cli.generate_tile_labels:generate_tile_labels:157 - level_0 ... intersection_area address ... x26_y56_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.053094 x26_y57_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.341454 x27_y56_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.655530 x27_y57_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.898266 x28_y55_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.258913 ... ... ... ... x65_y72_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.419049 x65_y73_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.553953 x65_y74_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.102413 x66_y70_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.187448 x66_y71_z10 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 ... 0.062544 [170 rows x 11 columns] 2023-08-03 02:31:26.395 | DEBUG | luna.common.utils:wrapper:146 - cli ran in 0.93s
!save_tiles {SLIDE} \
{TILING_DIR}/test/label/{SLIDE_ID}.regional_label.tiles.parquet \
--num_cores 4 \
--batch_size 200 \
--dataset-id PRO_TILES \
--output-urlpath {TILING_DIR}/test/saved_tiles
2023-08-03 02:31:30.548 | INFO | luna.pathology.cli.save_tiles:save_tiles:127 - Now generating tiles with batch_size=200! Traceback (most recent call last): ] | 0% Completed | 0.4s File "/opt/conda/bin/save_tiles", line 8, in <module> sys.exit(fire_cli()) File "/opt/conda/lib/python3.9/site-packages/luna/pathology/cli/save_tiles.py", line 176, in fire_cli fire.Fire(cli) File "/opt/conda/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/conda/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/conda/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/opt/conda/lib/python3.9/site-packages/luna/common/utils.py", line 144, in wrapper result = func(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/luna/common/utils.py", line 65, in wrapper result = func(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/luna/pathology/cli/save_tiles.py", line 76, in cli df = save_tiles( File "/opt/conda/lib/python3.9/site-packages/luna/common/utils.py", line 126, in wrapper result = func(**new_args_dict) File "/opt/conda/lib/python3.9/site-packages/luna/pathology/cli/save_tiles.py", line 128, in save_tiles df = _save_tiles(df, slide_urlpath, output_urlpath, batch_size, storage_options, output_storage_options) File "/opt/conda/lib/python3.9/site-packages/luna/common/utils.py", line 126, in wrapper result = func(**new_args_dict) File "/opt/conda/lib/python3.9/site-packages/luna/pathology/cli/save_tiles.py", line 168, in _save_tiles for result in future.result(): File "/opt/conda/lib/python3.9/site-packages/distributed/client.py", line 284, in result raise exc.with_traceback(tb) File "/opt/conda/lib/python3.9/site-packages/luna/pathology/cli/save_tiles.py", line 153, in f_many return [ File "/opt/conda/lib/python3.9/site-packages/luna/pathology/cli/save_tiles.py", line 155, in <listcomp> x.address, AttributeError: 'Tile' object has no attribute 'address'
from luna.common.utils import LunaCliClient
def pipeline (slide_id, input_slide, input_annotations):
client = LunaCliClient("~/vmount/PRO-12-123/2_tiling-file", slide_id)
client.bootstrap("slide", input_slide)
client.bootstrap("annotations", input_annotations)
client.configure("generate_tiles", "slide",
tile_size=128,
requested_magnification=10
).run("source_tiles")
client.configure("detect_tissue", "slide", "source_tiles",
filter_query="otsu_score > 0.1",
requested_magnification=2
).run("detected_tiles")
client.configure("label_tiles", "annotations", "detected_tiles").run("labled_tiles")
client.configure( "save_tiles", "slide", "labled_tiles",
num_cores=4, batch_size=200, dataset_id='PRO_TILES_LABELED'
).run("saved_tiles")
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
df_slides = pd.read_parquet("../PRO-12-123/data/toy_data_set/table/SLIDES/slide_ingest_PRO-12-123.parquet")
with ThreadPoolExecutor(5) as pool:
for index, row in df_slides.iterrows():
print (index)
pool.submit(pipeline, index, row.slide_image, "../PRO-12-123/data/toy_data_set/table/ANNOTATIONS")
0
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[7], line 11 8 for index, row in df_slides.iterrows(): 9 print (index) ---> 11 pool.submit(pipeline, index, row.slide_image, "../PRO-12-123/data/toy_data_set/table/ANNOTATIONS") File /opt/conda/lib/python3.9/site-packages/pandas/core/generic.py:5902, in NDFrame.__getattr__(self, name) 5895 if ( 5896 name not in self._internal_names_set 5897 and name not in self._metadata 5898 and name not in self._accessors 5899 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5900 ): 5901 return self[name] -> 5902 return object.__getattribute__(self, name) AttributeError: 'Series' object has no attribute 'slide_image'
import pandas as pd
df_tiles = pd.read_parquet("~/vmount/PRO-12-123/datasets/PRO_TILES_LABELED/").query("intersection_area > 0")
print (df_tiles['regional_label'].value_counts())
df_tiles
Congratulations! Now you have 2120 tumor, 860 stroma, and 751 fat tiles images and labels ready to train your model.