# Expression Functions ## Overview EDP expressions can use Python-like functions to pull data from any dataset, calculate statistics, or run advanced algorithms. Users are recommended to read the [Expressions documentation](https://quartzbio.freshdesk.com/en/support/solutions/articles/73000606023) for an in-depth review of use cases. ## Function Details ### **annotate** Annotate a record with a template. Output data type: object **Syntax** ```text annotate(record, template, debug, include_errors) ``` - record: (object) The record to be annotated - template: (str) The ID of the template - debug: (bool) Enable debug mode (default: False) - include\_errors: (bool) Include errors in output (default: True) ### **beacon** Retrieves the beacon results for any entity. Output data type: object. Output object properties: - failed\_count: The number of datasets that failed (timed-out) - failed: List of datasets that failed (timed-out) - not\_found\_count: The number of datasets without results - found\_count: The number of datasets with results - found: List of datasets with results - not\_found: List of datasets without results **Syntax** ```text beacon(entity, entity_type, beacon_set, datasets, visibility) ``` - entity: The entity value - entity\_type: A valid entity type - beacon\_set (optional): A valid beacon set ID - datasets (optional): A list of datasets to beacon - visibility (optional): Which datasets to beacon (default: vault) ### **classify_variant** Classify a variant using one of multiple classifiers. Output data type: object **Syntax** ```text classify_variant(variant, classifier) ``` - variant: The variant - classifier: The desired classifier (default: "germline") ### **coerce_list** Coerce a value to a list. Single items will become a single value list. Lists will remain lists. None will return an empty list. Output data type: auto (list) **Syntax** - value: The value to coerce to a list ### **concat** Combine text from multiple lists or strings. Output data type: string **Syntax** ```text concat(values, delimiter) ``` - values: The list of values to concatenate - delimiter (default: ""): The character to use in between values ### **crossmap** Convert a variant or genomic region entity between different genome builds using the [Ensembl CrossMap](http://crossmap.sourceforge.net/) tool. The functionality of this expression is the same as UCSC's liftOver tool. Output data type: string **Syntax** ```text crossmap(entity, target_build) ``` - entity: The entity (either a valid quartzbio variant BUILD-CHROMOSOME-START-STOP-ALT or genomic region BUILD-CHROMOSOME-START-STOP) - target\_build: The target genome build (GRCH37 or GRCH38) **Examples** `crossmap("GRCH38-13-32338647-32338647-T", "GRCH37")` ### **dataset_count** Calculate the total number of results (or "hits") for a given query. Returns the number of results. Output data type: integer **Syntax** ```text dataset_count(dataset, entities, filters, query) ``` - dataset: Any dataset with query permissions - entities (optional): A list of entity tuples: \[(entity\_type, entity)\] - filters (optional): A valid filter block - query (optional): A query string ### **dataset_entity_top_terms** Retrieve the top entities for any entity field in a dataset. Returns a list of strings, in order of occurrence or None if the dataset can not be queried by this entity. Output data type: string (list) **Syntax** ```text dataset_entity_top_terms(dataset, entity, limit, filters, query) ``` - dataset: Any dataset with query permissions - entity: The entity\_type to return within the dataset - limit (optional): The number of terms to retrieve (default: 1000) - filters (optional): Dataset filters - query (optional): A query string **Examples** `dataset_entity_top_terms("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCH38", "gene")` ### **dataset_field_percentiles** Calculates the percentiles for any integer field. Returns an object containing the desired percentiles. Output data type: object **Syntax** ```text dataset_field_percentiles(dataset, field, percents, entities, filters, query) ``` - dataset: Any dataset with query permissions - field: The field within the dataset - percents: The percentiles to calculate (default: 1, 5, 25, 50, 75, 95, 99) - entities (optional): A list of entity tuples: \[(entity\_type, entity)\] - filters (optional): Dataset filters - query (optional): A query string ### **dataset_field_stats** Calculates statistics for any numeric field. Returns an object containing field statistics. Output data type: object. Output object properties: - count: The total number of values - max: The maximum value observed - sum: The sum of all values - avg: The average value - min: The minimum value observed **Syntax** ```text dataset_field_stats(dataset, field, entities, filters, query) ``` - dataset: Any dataset with query permissions - field: The field within the dataset - entities (optional): A list of entity tuples: \[(entity\_type, entity)\] - filters (optional): Dataset filters - query (optional): A query string ### **dataset_field_terms_count** Retrieve the number of unique terms for any string field in a dataset. Returns the number of unique terms. Output data type: integer **Syntax** ```text dataset_field_terms_count(dataset, field, entities, filters, query) ``` - dataset: Any dataset with query permissions - field: The field within the dataset - entities (optional): A list of entity tuples: \[(entity\_type, entity)\] - filters (optional): Dataset filters - query (optional): A query string **Examples** `dataset_field_terms_count("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", "clinical_significance")` ### **dataset_field_top_terms** Retrieve the top terms for any string field in a dataset. Returns a list of objects containing the term and number of times it occurs, in order of occurrence. Output data type: object (list). Output object properties: - count: Number of times it occurs - term: Term value **Syntax** ```text dataset_field_top_terms(dataset, field, limit, entities, filters, query) ``` - dataset: Any dataset with query permissions - field: The field within the dataset - limit (optional): The number of terms to retrieve (default: 10) - entities (optional): A list of entity tuples: \[(entity\_type, entity)\] - filters (optional): Dataset filters - query (optional): A query string **Examples** `dataset_field_top_terms("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", "clinical_significance")` ### **dataset_field_values** Retrieves a list of non-empty values for a dataset field. Returns a list of values from the specified field. Output data type: auto (list) **Syntax** ```text dataset_field_values(dataset, field, limit, entities, filters, query) ``` - dataset: Any dataset with query permissions - field: The field within the dataset - limit (optional): The number of values to return (default: 10) - entities (optional): A list of entity tuples: \[(entity\_type, entity)\] - filters (optional): Dataset filters - query (optional): A query string ### **dataset_query** Query any dataset with optional filters and/or entities. Returns a list of results. Output data type: object (list) **Syntax** ```text dataset_query(dataset, fields, limit, entities, filters, query) ``` - dataset: Any dataset with query permissions - fields (optional): Fields to retrieve (default: all) - limit (optional): The number of values to return (default: 1) - entities (optional): A list of entity tuples: \[(entity\_type, entity)\] - filters (optional): Dataset filters - query (optional): A query string **Examples** `dataset_query("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", fields=["clinical_significance"], query="*cancer*")` `dataset_query("quartzbio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", entities=[["variant", "GRCH38-13-32357842-32357842-TA"]])` ### **datetime_format** Format datetime strings. By default, it returns an ISO 8601 format date time string. To override, provide an optional input\_format or output\_format to be used. Output data type: string **Syntax** ```text datetime_format(value, input_format, output_format) ``` - value: (str) A string containing a date/time stamp - input\_format: (str) The input format of the date (e.g. "%d/%m/%y %H:%M") - output\_format: (str) The output format of the date (ISO 8601 format is the default: "%Y-%m-%dT%H:%M:%S") ### **entity_ids** Retrieve one or more entity IDs for a query. Output data type: string **Syntax** ```text entity_ids(entity_type, entity) ``` - entity\_type: The entity type to retrieve - entity: The entity or query string ### **error** Raise a FunctionError. Output data type: error **Syntax** - message: An error message to raise ### **explode** Split N values from M list fields into N records. If \_id is in the original record, each new record will have an integer appended to the \_id with the index of each exploded record. Output data type: object (list) **Syntax** - record: (object) The record to be splitted - fields: (list or tuple) the fields IDs ### **findall** Returns all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, returns a list of groups. Output data type: string (list) **Syntax** ```text findall(pattern, string, regex_ignorecase, regex_dotall, regex_multiline) ``` - pattern: The regular expression pattern - string: The string to search - regex\_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching. - regex\_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline. - regex\_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string. ### **genomic_sequence** Retrieves a specific sequence from the genome. Output data type: string **Syntax** ```text genomic_sequence(genomic_region) ``` - genomic\_region: A valid genomic region in the form: BUILD-CHROMOSOME-START-STOP **Examples** `genomic_sequence("GRCh37-5-36241400-36241700")` ### **get** Get the value at any depth of a nested object based on the path described by `path`. If path doesn't exist, `default` is returned. Output data type: auto **Syntax** - obj: (list|dict) The object to process - path: (str|list) List or `.` delimited string of path describing path. - default (keyword): Default value to return if path doesn't exist. Defaults to `None`. ### **melt** Convert a wide dataset to a long dataset by "melting" one or more fields into "key" and "value" fields. All fields must have the same data type. Output data type: object (list) **Syntax** ```text melt(record, fields, key_field, value_field, melt_list_values) ``` - record: (object) The record to be melted - fields: (list or tuple) the fields IDs - key\_field: (str) key field (default: "key") - value\_field: (str) value field (default: "value") - melt\_list\_values: (bool) (default: False) ### **normalize_aa_change** Normalize an amino acid change (beta). Output data type: string **Syntax** ```text normalize_aa_change(aa_change, ref, alt) ``` - aa\_change: The aa\_change - ref: (optional) Reference allele - alt: (optional) Alternate allele ### **normalize_variant** Normalize a variant ID (minimal representation and left shifting). Output data type: string **Syntax** ```text normalize_variant(variant) ``` - variant: The variant ### **now** Retrieves the current date and time. Output data type: string **Syntax** - timezone (default: EST): The timezone to use for the date - template (default: ISO 8601): The format in which to represent the date/time, defaults to ISO 8601 format (%Y-%m-%dT%H:%M:%S) ### **predict_variant_effects** Predict the effects of a variant using Veppy. Output data type: object (list). Output object properties: - so\_term: The Sequence Ontology term - impact: The effect impact - so\_accession: The Sequence Ontology accession number - transcript: The affected transcript ID - lof: True if the mutation is predicted to cause the protein to lose its function **Syntax** ```text predict_variant_effects(variant, default_transcript, gene_model) ``` - variant: The variant - default\_transcript (optional): If True, return effects for just the default transcript. If a specific transcript, then limits results to this transcript only. Otherwise returns effects for all transcripts. - gene\_model (optional): The desired gene model: refseq (default) or ensembl **Examples** `predict_variant_effects("GRCH38-7-117559590-117559593-A")` ### **prevalence** Calculates the frequency that a value occurs within a population. Typically used to calculate the prevalence of variants or genes across samples in a dataset. Returns the frequency of occurrence. Please note: in large datasets, the result is approximate and can have an error of up to 5%.  Output data type: double **Syntax** ```text prevalence(dataset, entity, sample_field, filters) ``` - dataset: Any dataset with discover permissions - entity: A single entity tuple: (entity\_type, entity) - sample\_field: The field containing the sample IDs - filters (optional): Filters to apply on the dataset ### **search** Scan through string looking for the first location where the regular expression pattern produces a match. Returns True on a match and False if no position in the string matches the pattern. Output data type: boolean **Syntax** ```text search(pattern, string, regex_ignorecase, regex_dotall, regex_multiline) ``` - pattern: The regular expression pattern - string: The string to search - regex\_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching. - regex\_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline. - regex\_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string. ### **search_groups** Scan through string looking for the first location where the regular expression pattern produces a match. Returns a list of strings corresponding to the groups in the pattern. Output data type: string (list) **Syntax** ```text search_groups(pattern, string, regex_ignorecase, regex_dotall, regex_multiline) ``` - pattern: The regular expression pattern - string: The string to search - regex\_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching. - regex\_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline. - regex\_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string. ### **split** Split text based on a delimiter and optionally strip whitespace. Output data type: string (list) **Syntax** ```text split(value, delimiter, regex, strip, regex_ignorecase, regex_dotall, regex_multiline) ``` - value: The string to split - delimiter (default: any whitespace): The character(s) to split on - regex (default: None): A valid Python regular expression pattern to split on. - strip (default: True): Strip whitespace from each resulting value - regex\_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching. - regex\_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline. - regex\_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string. ### **sub** Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. Output data type: string **Syntax** ```text sub(pattern, repl, string, count, regex_ignorecase, regex_dotall, regex_multiline) ``` - pattern: The regular expression pattern - repl: The string to replace matches with - string: The string to search - count: (default: 0) The maximum number of pattern occurrences to be replaced.If zero, all occurrences will be replaces. - regex\_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching. - regex\_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline. - regex\_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string. ### **tabulate** Converts a list of objects into a table (i.e. a two-dimensional array). Output data type: object (list) **Syntax** ```text tabulate(objects, fields, header) ``` - objects: The list of objects - fields (optional): List of fields to include (default: all) - header (optional): Include a header row (default: True) ### **today** Returns the current date. Output data type: string **Syntax** ```text today(timezone, template) ``` - timezone (default: EST): The timezone to use for the date - template (default: YYYY-MM-DD): The format in which to represent the date ### **translate_variant** Translate variant into a protein change. Output data type: object. Output object properties: - protein\_length: Number of amino acids in the protein - cdna\_change: cDNA change - protein\_change: Protein change - protein\_coordinates: A dictionary containing start and stop coordinatesand the affected transcript id - gene: HUGO gene symbol - transcript: The transcript ID - effects: list of effects **Syntax** ```text translate_variant(variant, gene_model, transcript, include_effects) ``` - variant: The variant - gene\_model (optional): The desired gene model: refseq (default) or ensembl - transcript (optional): Limits results to this transcript only - include\_effects (optional): Returns the effects of the variant using Veppy **Examples** `translate_variant("GRCH38-7-117559590-117559593-A")` `translate_variant("GRCH38-7-117559590-117559593-A", gene_model="ensembl")` `translate_variant("GRCH38-7-117559590-117559593-A", transcript="NM_000492.3")` `translate_variant("GRCH38-7-117559590-117559593-A", include_effects=True)` ### **user** Returns the currently authenticated user.Output data type: object Output object properties: - name: The user's full name. - email: The user's email address