Usage

Commands

mind

Welcome to MSK MIND!

mind [OPTIONS] COMMAND [ARGS]...

Options

--host <host>

[default: http://localhost:8080]

download

download data.

QUERY - SQL select statement or Atlas DSL string.

returns

link to download the data bundle.

mind download [OPTIONS] QUERY

Arguments

QUERY

Required argument

list-columns

show available columns given database and table.

DB - database name.

TABLE - table name in the database.

returns

list of column names and comments

mind list-columns [OPTIONS] DB TABLE

Arguments

DB

Required argument

TABLE

Required argument

list-databases

show available databases.

returns

list of available databases.

mind list-databases [OPTIONS]

list-tables

show available tables given a database.

DB - database name.

returns

list of table names and comments

mind list-tables [OPTIONS] DB

Arguments

DB

Required argument

query

query data.

QUERY - SQL select statement or Atlas DSL string.

returns

domain metadata.

mind query [OPTIONS] QUERY

Arguments

QUERY

Required argument

Examples

A set of Zeppelin notebooks with CLI examples are provided. The notebooks can be found on the hdp sandbox at http://<staging_ip>:9995/#/

  • Get patient data where patients have clinical stage ‘3C’

$ mind query "SELECT * FROM patient WHERE diagnosis_clinical_stage_group = '3C'"

{'payload': [{'patient.age_at_diagnosis': 24525,
          'patient.diagnosis_clinical_stage_group': '99',
          'patient.diagnosis_pathology_stage_group': '3C',
          'patient.gender': 'FEMALE',
          'patient.icdo_histology_code': 'M8980/3',
          'patient.icdo_site_code': 'C569',
          'patient.patient_dmp_id': 'P-0039384',
          'patient.patient_id': 'SPECTRUM-OV-001',
          'patient.patient_last_known_alive_age': 25139,
          'patient.project_id': 'OV',
          'patient.race': 'WHITE',
          'patient.vital_status': '1.0'},
...
'status': 'OK'}
  • Get url to patient data bundle

$ mind download "SELECT * FROM patient WHERE diagnosis_clinical_stage_group = '3C'" --download

{'payload': 'http://<vm_ip>:50070/data/tmp/1587571607403.gz',
 'status': 'OK'}
  • Get operational metadata for files

$ mind query "from hive_table where name like '*clinical*' select name, owner"

{'payload': [{'name': 'clinical_patient', 'owner': 'raj_ops'},
             {'name': 'clinical_diagnosis', 'owner': 'raj_ops'}],
'status': 'OK'}


$ mind query "from hdfs_path where name like '*genomic*' select name, qualifiedName, path"

{'payload': [{'name': '/user/hive/genomic_cna',
          'path': 'hdfs://sandbox-hdp.hortonworks.com:8020/user/hive/genomic_cna',
          'qualifiedName': 'hdfs://sandbox-hdp.hortonworks.com:8020/user/hive/genomic_cna@Sandbox'},
         {'name': '/user/hive/genomic_maf',
          'path': 'hdfs://sandbox-hdp.hortonworks.com:8020/user/hive/genomic_maf',
          'qualifiedName': 'hdfs://sandbox-hdp.hortonworks.com:8020/user/hive/genomic_maf@Sandbox'},
         {'name': '/user/hive/genomic_bam',
          'path': 'hdfs://sandbox-hdp.hortonworks.com:8020/user/hive/genomic_bam',
          'qualifiedName': 'hdfs://sandbox-hdp.hortonworks.com:8020/user/hive/genomic_bam@Sandbox'}],
'status': 'OK'}
  • Get url to the data bundle

$ mind download "hive_table where name like '*genomic*' and createTime >= '2020-04-20'"

{'payload': 'http://<vm_ip>:50070/data/tmp/1588078078927.gz',
'status': 'OK'}
  • List available databases.

$ mind list-databases

clinical
genomic
  • List available tables.

$ mind list-tables clinical

| name       | description                                                                   |
|------------+-------------------------------------------------------------------------------|
| medication | None                                                                          |
| patient    | (Patient level) de-identified patient IDs, demographics info, survival status |
| diagnosis  | None                                                                          |
  • List available columns.

$ mind list-columns clinical patient

| name                         | type   | description   |
|------------------------------+--------+---------------|
| dmp_patient_id               | string | None          |
| patient_last_known_alive_age | int    | None          |
| project_id                   | string | None          |
| gender                       | string | None          |
| race                         | string | None          |
| patient_id                   | string | None          |
| vital_status                 | string | None          |