Luna Common¶
adapters
¶
FileWriteAdatper
¶
Bases: WriteAdapter
Source code in src/luna/common/adapters.py
__init__(store_url, bucket)
¶
Return a WriteAdapter for a given file I/O scheme and URL
Parameters:
Name | Type | Description | Default |
---|---|---|---|
store_url |
str
|
root URL for the storage location (e.g. s3://localhost:9000 or file:///data) |
required |
bucket |
str
|
the "bucket" or "parent folder" for the storage location |
required |
Source code in src/luna/common/adapters.py
write(input_data, prefix)
¶
Perform write operation to a posix file system
Will not perform write if
the content length matches (full copy) and the input modification time is earlier than the ingest time (with a 1 min. grace period)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
str
|
path to input file |
required |
prefix |
str
|
relative path prefix for destination |
required |
Returns: dict: key-value pairs containing metadata about the write operation
Source code in src/luna/common/adapters.py
IOAdapter
¶
Interface for IO
Exposes a write and read method via scheme specific classes:
IOAdapter.writer(
IOAdapter.reader(
Source code in src/luna/common/adapters.py
writer(store_url, bucket)
¶
Return a WriteAdapter for a given file I/O scheme and URL
Parameters:
Name | Type | Description | Default |
---|---|---|---|
store_url |
str
|
root URL for the storage location (e.g. s3://localhost:9000 or file:///data) |
required |
bucket |
str
|
the "bucket" or "parent folder" for the storage location |
required |
Returns WriteAdapter: object capable of writing to the location at store_url
Source code in src/luna/common/adapters.py
MinioWriteAdatper
¶
Bases: WriteAdapter
Source code in src/luna/common/adapters.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
__init__(store_url, bucket, secure=False)
¶
Return a WriteAdapter for a given file I/O scheme and URL
Parameters:
Name | Type | Description | Default |
---|---|---|---|
store_url |
str
|
root URL for the storage location (e.g. s3://localhost:9000 or file:///data) |
required |
bucket |
str
|
the "bucket" or "parent folder" for the storage location |
required |
Source code in src/luna/common/adapters.py
write(input_data, prefix)
¶
Perform write operation to a s3 file system
Will not perform write if
the content length matches (full copy) and the input modification time is earlier than the ingest time (with a 1 min. grace period)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
str
|
path to input file |
required |
prefix |
str
|
relative path prefix for destination |
required |
Returns: dict: key-value pairs containing metadata about the write operation
Source code in src/luna/common/adapters.py
NoWriteAdapter
¶
Bases: WriteAdapter
Source code in src/luna/common/adapters.py
write(input_data, prefix)
¶
Returns input_data as written data
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
str
|
path to input file |
required |
prefix |
str
|
relative path prefix for destination, ignored |
required |
Returns: dict: key-value pairs containing metadata about the write operation
Source code in src/luna/common/adapters.py
config
¶
Created on October 17, 2019
@author: pashaa@mskcc.org
ConfigSet
¶
This is a singleton class that can load a collection of configurations from yaml files.
ConfigSet loads configurations from yaml files only once on first invocation of this class with the specified yaml file. The class then maintains the configuration in memory in a singleton instance. All new invocations of this class will serve up the same configuration.
Each configuration in the collection is identified by a logical name.
If a new invocation of this class is created with an existing logical name and a different yaml file, the singleton instance replaces the existing configuration with the newly specified yaml file for the given logical name.
Source code in src/luna/common/config.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 |
|
__init__(name=None, config_file=None, schema_file=None)
¶
:param name logical name to be given for this configuration. This argument only needs to be provided on first invocation (optional). :param config_file the config file to load. This argument only needs to be provided on first invocation (optional). :param schema_file a schema file for the yaml configuration (optional) :raises yamale.yamale_error.YamaleError if config file is invalid when validated against the schema
Source code in src/luna/common/config.py
clear()
¶
get_config_set(name)
¶
:param name: logical name of the configuration :return: a dictonary of top-level keys in the config stored in this instance. :raises: ValueError if a configuration with the specified name was never loaded
Source code in src/luna/common/config.py
get_keys(name)
¶
:param name: logical name of the configuration :return: a list of top-level keys in the config stored in this instance. :raises: ValueError if a configuration with the specified name was never loaded
Source code in src/luna/common/config.py
get_names()
¶
:return: a list of logical names of the configs stored in this instance.
get_value(path)
¶
Gets the value for the specified jsonpath from the specified configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
path to a value in a configuration. The path must be of the form "name::jsonpath" |
required |
see |
config.yaml to generate a jsonpath. See https
|
//pypi.org/project/jsonpath-ng/ |
required |
jsonpath |
expressions may be tested here - https
|
//jsonpath.com/ |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
value from config file |
Raises:
Type | Description |
---|---|
ValueError
|
if no match is found for the specified exception or a configuration with |
Source code in src/luna/common/config.py
has_value(path)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
path to a value in a configuration. The path must be of the form |
required |
"name |
:jsonpath" where name is the logical name of the configuration and jsonpath is the jsonpath to value. |
required | |
see |
config.yaml to generate a jsonpath. See https
|
//pypi.org/project/jsonpath-ng/ jsonpath expressions |
required |
may |
be tested here - https
|
//jsonpath.com/ |
required |
Returns:
Name | Type | Description |
---|---|---|
boolean |
true if value is not an empty string, else false. |
Raises:
Type | Description |
---|---|
ValueError
|
if a configuration with the specified name was never loaded |
Source code in src/luna/common/config.py
connectors
¶
DremioClientAuthMiddleware
¶
Bases: ClientMiddleware
A ClientMiddleware that extracts the bearer token from the authorization header returned by the Dremio Flight Server Endpoint. Parameters
factory : ClientHeaderAuthMiddlewareFactory The factory to set call credentials if an authorization header with bearer token is returned by the Dremio server.
Source code in src/luna/common/connectors.py
DremioClientAuthMiddlewareFactory
¶
Bases: ClientMiddlewareFactory
A factory that creates DremioClientAuthMiddleware(s).
Source code in src/luna/common/connectors.py
DremioDataframeConnector
¶
A connector that interfaces with a Dremio instance/cluster via Apache Arrow Flight for fast read performance Parameters
scheme: connection scheme hostname: host of main dremio name flightport: which port dremio exposes to flight requests dremio_user: username to use dremio_password: associated password connection_args: anything else to pass to the FlightClient initialization
Source code in src/luna/common/connectors.py
get_table(space, table_name)
¶
Return the virtual table at project(or "space").table_name as a pandas dataframe Parameters:
space: Project ID/Space to read from table_name: Table name to load
Source code in src/luna/common/connectors.py
run_query(sqlquery)
¶
Return the virtual table at project(or "space").table_name as a pandas dataframe Parameters:
project: Project ID to read from table_name: Table name to load
Source code in src/luna/common/connectors.py
dask
¶
configure_dask_client(**kwargs)
¶
Instantiate a Dask client according to the given configuration. This should only be called once in a given program. The client created here can always be retrieved (where needed) using get_or_create_dask_client().
Source code in src/luna/common/dask.py
dask_job(job_name)
¶
The simplier version of a dask job decorator, which only provides the worker_client as a runner to the calling function
Examples:
Source code in src/luna/common/dask.py
prune_empty_delayed(tasks)
¶
A less-than-ideal method to prune empty tasks from dask tasks Here we're trading CPU and time for memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tasks |
list
|
list of delayed dask tasks |
required |
Returns:
Type | Description |
---|---|
list[dask.delayed]: a reduced list of delayed dask tasks |
Source code in src/luna/common/dask.py
with_event_loop(func)
¶
This method decorates functions run on dask workers with an async function call Namely, this allows us to manage the execution of a function a bit better, and especially, to exit job execution if things take too long (1hr)
Here, the function func is run in a background thread, and has access to the dask schedular through the 'runner'. Critically, sumbission to this runner/client looks the same regardless of if it occurs in a sub-process/thread
Mostly, this is a workaround to impliment some form of timeout when running very long-tasks on dask. While one cannot (or should not) kill the running thread, Dask will cleanup the child tasks eventually once all jobs finish.
Examples:
Source code in src/luna/common/dask.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
stats
¶
compute_stats_1d(vec, fx_name_prefix, n_percentiles=4)
¶
Computes 1d (histogram)-like summary statistics
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vec |
array
|
a 1-d vector input |
required |
fx_name_prefix |
str
|
Prefix for feature names |
required |
n_percentiles |
int
|
Number of percentiles to compute, default 4 = 0 (min), 25, 50, 75, 100 (max) |
4
|
Returns:
Name | Type | Description |
---|---|---|
dict |
summary statistics |
Source code in src/luna/common/stats.py
utils
¶
LunaCliCall
¶
Source code in src/luna/common/utils.py
run(step_name)
¶
Run (execute) CLI Call given a 'step_name', add step to parent CLI Client once completed. Args: step_name (str): Name of the CLI call, determines output directory, can act as inputs to other CLI steps
Source code in src/luna/common/utils.py
LunaCliClient
¶
Source code in src/luna/common/utils.py
__init__(base_dir, uuid)
¶
Initialize Luna CLI Client with a base directory (the root working directory) and a UUID to track results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
base_dir |
str
|
parent working directory |
required |
uuid |
str
|
some unique string for this instance |
required |
Source code in src/luna/common/utils.py
bootstrap(step_name, data_path)
¶
Add data (boostrap a root CLI call).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
step_name |
str
|
Name of the (boostrap) CLI call, determines output directory, can act as inputs to other CLI steps |
required |
data_path |
str
|
Input data path |
required |
Source code in src/luna/common/utils.py
configure(cli_resource, *args, **kwargs)
¶
Configure a CLI step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cli_resource |
str
|
CLI Resource string like |
required |
args |
list
|
List of CLI arguements |
()
|
kwargs |
list
|
List of CLI parameters |
{}
|
Returns: LunaCliCall
Source code in src/luna/common/utils.py
get_output_dir(step_name)
¶
Get output_dir based on base_dir, uuid, and step name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
step_name |
str
|
parent working directory |
required |
Returns: output_dir (str)
Source code in src/luna/common/utils.py
apply_csv_filter(input_paths, subset_csv=None, storage_options={})
¶
Filters a list of input_paths based on include/exclude logic given for either the full path, filename, or filestem.
If using "include" logic, only matching entries with include=True are kept. If using "exclude" logic, only matching entries with exclude=True are removed.
The origional list is returned if the given subset_csv is None or empty.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_paths |
list[str]
|
list of input paths to filter |
required |
subset_csv |
str
|
path to a csv with subset/filter information/flags |
None
|
Returns list[str]: filtered list Raises: RuntimeError: If the given subset_csv is invalid
Source code in src/luna/common/utils.py
generate_uuid(urlpath, prefix, storage_options={})
¶
Returns hash of the file given path, preceded by the prefix. :param path: file path e.g. file:/path/to/file :param prefix: list e.g. ["SVGEOJSON","default-label"] :return: string uuid
Source code in src/luna/common/utils.py
generate_uuid_binary(content, prefix)
¶
Returns hash of the binary, preceded by the prefix. :param content: binary :param prefix: list e.g. ["FEATURE"] :return: string uuid
Source code in src/luna/common/utils.py
generate_uuid_dict(json_str, prefix)
¶
Returns hash of the json string, preceded by the prefix. :param json_str: str representation of json :param prefix: list e.g. ["SVGEOJSON","default-label"] :return: v
Source code in src/luna/common/utils.py
get_absolute_path(module_path, relative_path)
¶
Given the path to a module file and the path, relative to the module file, of another file that needs to be referenced in the module, this method returns the absolute path of the file that needs to be referenced.
This method makes it possible to resolve absolute paths to files in any environment a module and the referenced files are deployed to.
:param module_path path to the module. Use 'file' from the module. :param relative_path path to the file that needs to be referenced by the module. The path must be relative to the module. :return absolute path to file with the specified relative_path
Source code in src/luna/common/utils.py
get_config(cli_kwargs)
¶
Get the config with merged OmegaConf files
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cli_kwargs |
dict
|
CLI keyword arguments |
required |
Source code in src/luna/common/utils.py
get_dataset_url()
¶
Retrieve a "dataset URL" from the environment, may look like http://localhost:6077 or file:///absolute/path/to/dataset/dir
Source code in src/luna/common/utils.py
grouper(iterable, n)
¶
Turn an iterable into an iterable of iterables
'None' should not be a member of the input iterable as it is removed to handle the fillvalues
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iterable |
iterable
|
an iterable |
required |
n |
int
|
sie of chunks |
required |
Returns:
Type | Description |
---|---|
iterable[iterable] |
Source code in src/luna/common/utils.py
load_func(dotpath)
¶
Load function in module from a parsed yaml string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dotpath |
str
|
module/function name written as a string (ie torchvision.models.resnet34) |
required |
Returns: The inferred module itself, not the string representation
Source code in src/luna/common/utils.py
local_cache_urlpath(file_key_write_mode={}, dir_key_write_mode={})
¶
Decorator for caching url/paths locally
Source code in src/luna/common/utils.py
post_to_dataset(input_feature_data, waystation_url, dataset_id, keys)
¶
Interface feature data to a parquet dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_feature_data |
str
|
path to input data |
required |
waystation_url |
str
|
URL of dataset root (either file or using waystation) |
required |
dataset_id |
str
|
Dataset name/ID |
required |
keys |
dict
|
corresponding segment keys |
required |
Source code in src/luna/common/utils.py
save_metadata(func)
¶
This decorator saves metadata in output_url
Source code in src/luna/common/utils.py
timed(func)
¶
This decorator prints the execution time for the decorated function.
Source code in src/luna/common/utils.py
validate_dask_address(addr)
¶
Return True if addr
appears to be a valid address for a dask scheduler.
The typical format for this will be something like 'tcp://192.168.0.37:8786', but there could be a hostname instead of an IP address, and maybe some other URL schemes are supported. This function will be used to check whether a user-defined dask scheduler address is plausible, or obviously invalid.