Expression Functions¶
Overview¶
EDP expressions can use Python-like functions to pull data from any dataset, calculate statistics, or run advanced algorithms. Users are recommended to read the Expressions documentation for an in-depth review of use cases.
Function Details¶
annotate¶
Annotate a record with a template. Output data type: object
Syntax
annotate(record, template, debug, include_errors)
record: (object) The record to be annotated
template: (str) The ID of the template
debug: (bool) Enable debug mode (default: False)
include_errors: (bool) Include errors in output (default: True)
beacon¶
Retrieves the beacon results for any entity. Output data type: object. Output object properties:
failed_count: The number of datasets that failed (timed-out)
failed: List of datasets that failed (timed-out)
not_found_count: The number of datasets without results
found_count: The number of datasets with results
found: List of datasets with results
not_found: List of datasets without results
Syntax
beacon(entity, entity_type, beacon_set, datasets, visibility)
entity: The entity value
entity_type: A valid entity type
beacon_set (optional): A valid beacon set ID
datasets (optional): A list of datasets to beacon
visibility (optional): Which datasets to beacon (default: vault)
classify_variant¶
Classify a variant using one of multiple classifiers. Output data type: object
Syntax
classify_variant(variant, classifier)
variant: The variant
classifier: The desired classifier (default: “germline”)
coerce_list¶
Coerce a value to a list. Single items will become a single value list. Lists will remain lists. None will return an empty list. Output data type: auto (list)
Syntax
value: The value to coerce to a list
concat¶
Combine text from multiple lists or strings. Output data type: string
Syntax
concat(values, delimiter)
values: The list of values to concatenate
delimiter (default: “”): The character to use in between values
crossmap¶
Convert a variant or genomic region entity between different genome builds using the Ensembl CrossMap tool. The functionality of this expression is the same as UCSC’s liftOver tool. Output data type: string
Syntax
crossmap(entity, target_build)
entity: The entity (either a valid quartzbio variant BUILD-CHROMOSOME-START-STOP-ALT or genomic region BUILD-CHROMOSOME-START-STOP)
target_build: The target genome build (GRCH37 or GRCH38)
Examples
crossmap("GRCH38-13-32338647-32338647-T", "GRCH37")
dataset_count¶
Calculate the total number of results (or “hits”) for a given query. Returns the number of results. Output data type: integer
Syntax
dataset_count(dataset, entities, filters, query)
dataset: Any dataset with query permissions
entities (optional): A list of entity tuples: [(entity_type, entity)]
filters (optional): A valid filter block
query (optional): A query string
dataset_entity_top_terms¶
Retrieve the top entities for any entity field in a dataset. Returns a list of strings, in order of occurrence or None if the dataset can not be queried by this entity. Output data type: string (list)
Syntax
dataset_entity_top_terms(dataset, entity, limit, filters, query)
dataset: Any dataset with query permissions
entity: The entity_type to return within the dataset
limit (optional): The number of terms to retrieve (default: 1000)
filters (optional): Dataset filters
query (optional): A query string
Examples
dataset_entity_top_terms("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCH38", "gene")
dataset_field_percentiles¶
Calculates the percentiles for any integer field. Returns an object containing the desired percentiles. Output data type: object
Syntax
dataset_field_percentiles(dataset, field, percents, entities, filters, query)
dataset: Any dataset with query permissions
field: The field within the dataset
percents: The percentiles to calculate (default: 1, 5, 25, 50, 75, 95, 99)
entities (optional): A list of entity tuples: [(entity_type, entity)]
filters (optional): Dataset filters
query (optional): A query string
dataset_field_stats¶
Calculates statistics for any numeric field. Returns an object containing field statistics. Output data type: object. Output object properties:
count: The total number of values
max: The maximum value observed
sum: The sum of all values
avg: The average value
min: The minimum value observed
Syntax
dataset_field_stats(dataset, field, entities, filters, query)
dataset: Any dataset with query permissions
field: The field within the dataset
entities (optional): A list of entity tuples: [(entity_type, entity)]
filters (optional): Dataset filters
query (optional): A query string
dataset_field_terms_count¶
Retrieve the number of unique terms for any string field in a dataset. Returns the number of unique terms. Output data type: integer
Syntax
dataset_field_terms_count(dataset, field, entities, filters, query)
dataset: Any dataset with query permissions
field: The field within the dataset
entities (optional): A list of entity tuples: [(entity_type, entity)]
filters (optional): Dataset filters
query (optional): A query string
Examples
dataset_field_terms_count("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", "clinical_significance")
dataset_field_top_terms¶
Retrieve the top terms for any string field in a dataset. Returns a list of objects containing the term and number of times it occurs, in order of occurrence. Output data type: object (list). Output object properties:
count: Number of times it occurs
term: Term value
Syntax
dataset_field_top_terms(dataset, field, limit, entities, filters, query)
dataset: Any dataset with query permissions
field: The field within the dataset
limit (optional): The number of terms to retrieve (default: 10)
entities (optional): A list of entity tuples: [(entity_type, entity)]
filters (optional): Dataset filters
query (optional): A query string
Examples
dataset_field_top_terms("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", "clinical_significance")
dataset_field_values¶
Retrieves a list of non-empty values for a dataset field. Returns a list of values from the specified field. Output data type: auto (list)
Syntax
dataset_field_values(dataset, field, limit, entities, filters, query)
dataset: Any dataset with query permissions
field: The field within the dataset
limit (optional): The number of values to return (default: 10)
entities (optional): A list of entity tuples: [(entity_type, entity)]
filters (optional): Dataset filters
query (optional): A query string
dataset_query¶
Query any dataset with optional filters and/or entities. Returns a list of results. Output data type: object (list)
Syntax
dataset_query(dataset, fields, limit, entities, filters, query)
dataset: Any dataset with query permissions
fields (optional): Fields to retrieve (default: all)
limit (optional): The number of values to return (default: 1)
entities (optional): A list of entity tuples: [(entity_type, entity)]
filters (optional): Dataset filters
query (optional): A query string
Examples
dataset_query("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", fields=["clinical_significance"], query="*cancer*")
dataset_query("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", entities=[["variant", "GRCH38-13-32357842-32357842-TA"]])
datetime_format¶
Format datetime strings. By default, it returns an ISO 8601 format date time string. To override, provide an optional input_format or output_format to be used. Output data type: string
Syntax
datetime_format(value, input_format, output_format)
value: (str) A string containing a date/time stamp
input_format: (str) The input format of the date (e.g. “%d/%m/%y %H:%M”)
output_format: (str) The output format of the date (ISO 8601 format is the default: “%Y-%m-%dT%H:%M:%S”)
entity_ids¶
Retrieve one or more entity IDs for a query. Output data type: string
Syntax
entity_ids(entity_type, entity)
entity_type: The entity type to retrieve
entity: The entity or query string
error¶
Raise a FunctionError. Output data type: error
Syntax
message: An error message to raise
explode¶
Split N values from M list fields into N records. If _id is in the original record, each new record will have an integer appended to the _id with the index of each exploded record. Output data type: object (list)
Syntax
record: (object) The record to be splitted
fields: (list or tuple) the fields IDs
findall¶
Returns all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, returns a list of groups. Output data type: string (list)
Syntax
findall(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
pattern: The regular expression pattern
string: The string to search
regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
genomic_sequence¶
Retrieves a specific sequence from the genome. Output data type: string
Syntax
genomic_sequence(genomic_region)
genomic_region: A valid genomic region in the form: BUILD-CHROMOSOME-START-STOP
Examples
genomic_sequence("GRCh37-5-36241400-36241700")
get¶
Get the value at any depth of a nested object based on the path described by path
. If path doesn’t exist, default
is returned. Output data type: auto
Syntax
obj: (list|dict) The object to process
path: (str|list) List or
.
delimited string of path describing path.default (keyword): Default value to return if path doesn’t exist. Defaults to
None
.
melt¶
Convert a wide dataset to a long dataset by “melting” one or more fields into “key” and “value” fields. All fields must have the same data type. Output data type: object (list)
Syntax
melt(record, fields, key_field, value_field, melt_list_values)
record: (object) The record to be melted
fields: (list or tuple) the fields IDs
key_field: (str) key field (default: “key”)
value_field: (str) value field (default: “value”)
melt_list_values: (bool) (default: False)
normalize_aa_change¶
Normalize an amino acid change (beta). Output data type: string
Syntax
normalize_aa_change(aa_change, ref, alt)
aa_change: The aa_change
ref: (optional) Reference allele
alt: (optional) Alternate allele
normalize_variant¶
Normalize a variant ID (minimal representation and left shifting). Output data type: string
Syntax
normalize_variant(variant)
variant: The variant
now¶
Retrieves the current date and time. Output data type: string
Syntax
timezone (default: EST): The timezone to use for the date
template (default: ISO 8601): The format in which to represent the date/time, defaults to ISO 8601 format (%Y-%m-%dT%H:%M:%S)
predict_variant_effects¶
Predict the effects of a variant using Veppy. Output data type: object (list). Output object properties:
so_term: The Sequence Ontology term
impact: The effect impact
so_accession: The Sequence Ontology accession number
transcript: The affected transcript ID
lof: True if the mutation is predicted to cause the protein to lose its function
Syntax
predict_variant_effects(variant, default_transcript, gene_model)
variant: The variant
default_transcript (optional): If True, return effects for just the default transcript. If a specific transcript, then limits results to this transcript only. Otherwise returns effects for all transcripts.
gene_model (optional): The desired gene model: refseq (default) or ensembl
Examples
predict_variant_effects("GRCH38-7-117559590-117559593-A")
prevalence¶
Calculates the frequency that a value occurs within a population. Typically used to calculate the prevalence of variants or genes across samples in a dataset. Returns the frequency of occurrence. Please note: in large datasets, the result is approximate and can have an error of up to 5%. Output data type: double
Syntax
prevalence(dataset, entity, sample_field, filters)
dataset: Any dataset with discover permissions
entity: A single entity tuple: (entity_type, entity)
sample_field: The field containing the sample IDs
filters (optional): Filters to apply on the dataset
search¶
Scan through string looking for the first location where the regular expression pattern produces a match. Returns True on a match and False if no position in the string matches the pattern. Output data type: boolean
Syntax
search(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
pattern: The regular expression pattern
string: The string to search
regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
search_groups¶
Scan through string looking for the first location where the regular expression pattern produces a match. Returns a list of strings corresponding to the groups in the pattern. Output data type: string (list)
Syntax
search_groups(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
pattern: The regular expression pattern
string: The string to search
regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
split¶
Split text based on a delimiter and optionally strip whitespace. Output data type: string (list)
Syntax
split(value, delimiter, regex, strip, regex_ignorecase, regex_dotall, regex_multiline)
value: The string to split
delimiter (default: any whitespace): The character(s) to split on
regex (default: None): A valid Python regular expression pattern to split on.
strip (default: True): Strip whitespace from each resulting value
regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
sub¶
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. Output data type: string
Syntax
sub(pattern, repl, string, count, regex_ignorecase, regex_dotall, regex_multiline)
pattern: The regular expression pattern
repl: The string to replace matches with
string: The string to search
count: (default: 0) The maximum number of pattern occurrences to be replaced.If zero, all occurrences will be replaces.
regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
tabulate¶
Converts a list of objects into a table (i.e. a two-dimensional array). Output data type: object (list)
Syntax
tabulate(objects, fields, header)
objects: The list of objects
fields (optional): List of fields to include (default: all)
header (optional): Include a header row (default: True)
today¶
Returns the current date. Output data type: string
Syntax
today(timezone, template)
timezone (default: EST): The timezone to use for the date
template (default: YYYY-MM-DD): The format in which to represent the date
translate_variant¶
Translate variant into a protein change. Output data type: object. Output object properties:
protein_length: Number of amino acids in the protein
cdna_change: cDNA change
protein_change: Protein change
protein_coordinates: A dictionary containing start and stop coordinatesand the affected transcript id
gene: HUGO gene symbol
transcript: The transcript ID
effects: list of effects
Syntax
translate_variant(variant, gene_model, transcript, include_effects)
variant: The variant
gene_model (optional): The desired gene model: refseq (default) or ensembl
transcript (optional): Limits results to this transcript only
include_effects (optional): Returns the effects of the variant using Veppy
Examples
translate_variant("GRCH38-7-117559590-117559593-A")
translate_variant("GRCH38-7-117559590-117559593-A", gene_model="ensembl")
translate_variant("GRCH38-7-117559590-117559593-A", transcript="NM_000492.3")
translate_variant("GRCH38-7-117559590-117559593-A", include_effects=True)
user¶
Returns the currently authenticated user.Output data type: object
Output object properties:
name: The user’s full name.
email: The user’s email address