Overview
EDP expressions can use Python-like functions to pull data from any dataset, calculate statistics, or run advanced algorithms. Users are recommended to read the Expressions documentation for an in-depth review of use cases.
Function List
All available functions are listed below:
Function | Data Type | Description |
---|---|---|
annotate | object | Annotate a record with a template. annotate(record, template, debug, include_errors) |
beacon | object | Retrieves the beacon results for any entity. beacon(entity, entity_type, beacon_set, datasets, visibility) |
classify_variant | object | Classify a variant using one of multiple classifiers. classify_variant(variant, classifier) |
coerce_list | auto (list) | Coerce a value to a list. Single items will become a single value list. Lists will remain lists. None will return an empty list. coerce_list(value) |
concat | string | Combine text from multiple lists or strings. concat(values, delimiter) |
crossmap | string | Convert a variant or genomic region entity between different genome builds using the Ensembl CrossMap tool. The functionality of this expression is the same as UCSC’s liftOver tool. crossmap(entity, target_build) |
dataset_count | integer | Calculate the total number of results (or “hits”) for a given query. Returns the number of results. dataset_count(dataset, entities, filters, query) |
dataset_entity_top_terms | string (list) | Retrieve the top entities for any entity field in a dataset. Returns a list of strings, in order of occurrence or None if the dataset can not be queried by this entity. dataset_entity_top_terms(dataset, entity, limit, filters, query) |
dataset_field_percentiles | object | Calculates the percentiles for any integer field. Returns an object containing the desired percentiles. dataset_field_percentiles(dataset, field, percents, entities, filters, query) |
dataset_field_stats | object | Calculates statistics for any numeric field. Returns an object containing field statistics. dataset_field_stats(dataset, field, entities, filters, query) |
dataset_field_terms_count | integer | Retrieve the number of unique terms for any string field in a dataset. Returns the number of unique terms. dataset_field_terms_count(dataset, field, entities, filters, query) |
dataset_field_top_terms | object (list) | Retrieve the top terms for any string field in a dataset. Returns a list of objects containing the term and number of times it occurs, in order of occurrence. dataset_field_top_terms(dataset, field, limit, entities, filters, query) |
dataset_field_values | auto (list) | Retrieves a list of non-empty values for a dataset field. Returns a list of values from the specified field. dataset_field_values(dataset, field, limit, entities, filters, query) |
dataset_query | object (list) | Query any dataset with optional filters and/or entities. Returns a list of results. dataset_query(dataset, fields, limit, entities, filters, query) |
datetime_format | string | Format datetime strings. By default, it returns an ISO 8601 format date time string. To override, provide an optional input_format or output_format to be used. datetime_format(value, input_format, output_format) |
entity_ids | string | Retrieve one or more entity IDs for a query. entity_ids(entity_type, entity) |
error | error | Raise a FunctionError error(message) |
explode | object (list) | Split N values from M list fields into N records. If _id is in the original record, each new record will have an integer appended to the _id with the index of each exploded record. explode(record, fields) |
findall | string (list) | Returns all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, returns a list of groups. findall(pattern, string, regex_ignorecase, regex_dotall, regex_multiline) |
genomic_sequence | string | Retrieves a specific sequence from the genome. genomic_sequence(genomic_region) |
get | auto | Get the value at any depth of a nested object based on the path described by path. If path doesn’t exist, default is returned. get(obj, path, default) |
melt | object (list) | Convert a wide dataset to a long dataset by “melting” one or more fields into “key” and “value” fields. All fields must have the same data type. melt(record, fields, key_field, value_field, melt_list_values) |
normalize_aa_change | string | Normalize an amino acid change (beta) normalize_aa_change(aa_change, ref, alt) |
normalize_variant | string | Normalize a variant ID (minimal representation and left shifting). normalize_variant(variant) |
now | string | Retrieves the current date and time. now(timezone, template) |
predict_variant_effects | object (list) | Predict the effects of a variant using Veppy. predict_variant_effects(variant, default_transcript, gene_model) |
prevalence | double | Calculates the frequency that a value occurs within a population. Typically used to calculate the prevalence of variants or genes across samples in a dataset. Returns the frequency of occurrence. Please note: in large datasets the result is approximate and can have an error of up to 5%. prevalence(dataset, entity, sample_field, filters) |
search | boolean | Scan through string looking for the first location where the regular expression pattern produces a match. Returns True on a match and False if no position in the string matches the pattern. search(pattern, string, regex_ignorecase, regex_dotall, regex_multiline) |
search_groups | string (list) | Scan through string looking for the first location where the regular expression pattern produces a match. Returns a list of strings corresponding to the groups in the pattern. search_groups(pattern, string, regex_ignorecase, regex_dotall, regex_multiline) |
split | string (list) | Split text based on a delimiter and optionally strip whitespace. split(value, delimiter, regex, strip, regex_ignorecase, regex_dotall, regex_multiline) |
sub | string | Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. sub(pattern, repl, string, count, regex_ignorecase, regex_dotall, regex_multiline) |
tabulate | object (list) | Converts a list of objects into a table (i.e. a two-dimensional array). tabulate(objects, fields, header) |
today | string | Returns the current date. today(timezone, template) |
translate_variant | object | Translate variant into a protein change. translate_variant(variant, gene_model, transcript, include_effects) |
user | object | Returns the currently authenticated user. user() |
Function Details
beacon
Retrieves the beacon results for any entity. Output data type: object. Output object properties:
- failed_count: The number of datasets that failed (timed-out)
- failed: List of datasets that failed (timed-out)
- not_found_count: The number of datasets without results
- found_count: The number of datasets with results
- found: List of datasets with results
- not_found: List of datasets without results
coerce_list
Coerce a value to a list. Single items will become a single value list. Lists will remain lists. None will return an empty list. Output data type: auto (list)
crossmap
Convert a variant or genomic region entity between different genome builds using the Ensembl CrossMap tool. The functionality of this expression is the same as UCSC’s liftOver tool. Output data type: string
dataset_count
Calculate the total number of results (or “hits”) for a given query. Returns the number of results. Output data type: integer
dataset_entity_top_terms
Retrieve the top entities for any entity field in a dataset. Returns a list of strings, in order of occurrence or None if the dataset can not be queried by this entity. Output data type: string (list)
Syntax
dataset_entity_top_terms(dataset, entity, limit, filters, query)
- dataset: Any dataset with query permissions
- entity: The entity_type to return within the dataset
- limit (optional): The number of terms to retrieve (default: 1000)
- filters (optional): Dataset filters
- query (optional): A query string
dataset_field_percentiles
Calculates the percentiles for any integer field. Returns an object containing the desired percentiles. Output data type: object
Syntax
dataset_field_percentiles(dataset, field, percents, entities, filters, query)
- dataset: Any dataset with query permissions
- field: The field within the dataset
- percents: The percentiles to calculate (default: 1, 5, 25, 50, 75, 95, 99)
- entities (optional): A list of entity tuples: [(entity_type, entity)]
- filters (optional): Dataset filters
- query (optional): A query string
dataset_field_stats
Calculates statistics for any numeric field. Returns an object containing field statistics. Output data type: object. Output object properties:
- count: The total number of values
- max: The maximum value observed
- sum: The sum of all values
- avg: The average value
- min: The minimum value observed
dataset_field_terms_count
Retrieve the number of unique terms for any string field in a dataset. Returns the number of unique terms. Output data type: integer
dataset_field_top_terms
Retrieve the top terms for any string field in a dataset. Returns a list of objects containing the term and number of times it occurs, in order of occurrence. Output data type: object (list). Output object properties:
- count: Number of times it occurs
- term: Term value
Syntax
dataset_field_top_terms(dataset, field, limit, entities, filters, query)
- dataset: Any dataset with query permissions
- field: The field within the dataset
- limit (optional): The number of terms to retrieve (default: 10)
- entities (optional): A list of entity tuples: [(entity_type, entity)]
- filters (optional): Dataset filters
- query (optional): A query string
dataset_field_values
Retrieves a list of non-empty values for a dataset field. Returns a list of values from the specified field. Output data type: auto (list)
Syntax
dataset_field_values(dataset, field, limit, entities, filters, query)
- dataset: Any dataset with query permissions
- field: The field within the dataset
- limit (optional): The number of values to return (default: 10)
- entities (optional): A list of entity tuples: [(entity_type, entity)]
- filters (optional): Dataset filters
- query (optional): A query string
dataset_query
Query any dataset with optional filters and/or entities. Returns a list of results. Output data type: object (list)
Syntax
dataset_query(dataset, fields, limit, entities, filters, query)
- dataset: Any dataset with query permissions
- fields (optional): Fields to retrieve (default: all)
- limit (optional): The number of values to return (default: 1)
- entities (optional): A list of entity tuples: [(entity_type, entity)]
- filters (optional): Dataset filters
- query (optional): A query string
datetime_format
Format datetime strings. By default, it returns an ISO 8601 format date time string. To override, provide an optional input_format or output_format to be used. Output data type: string
entity_ids
Retrieve one or more entity IDs for a query. Output data type: string
explode
Split N values from M list fields into N records. If _id is in the original record, each new record will have an integer appended to the _id with the index of each exploded record. Output data type: object (list)
findall
Returns all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, returns a list of groups. Output data type: string (list)
Syntax
findall(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
- pattern: The regular expression pattern
- string: The string to search
- regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
- regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
- regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
genomic_sequence
Retrieves a specific sequence from the genome. Output data type: string
get
Get the value at any depth of a nested object based on the path described by path. If path doesn’t exist, default is returned. Output data type: auto
Syntax
get(obj, path, default)
- obj: (list|dict) The object to process
- path: (str|list) List or . delimited string of path describing path.
- default (keyword): Default value to return if path doesn’t exist. Defaults to None.
melt
Convert a wide dataset to a long dataset by “melting” one or more fields into “key” and “value” fields. All fields must have the same data type. Output data type: object (list)
normalize_variant
Normalize a variant ID (minimal representation and left shifting). Output data type: string
predict_variant_effects
Predict the effects of a variant using Veppy. Output data type: object (list). Output object properties:
- so_term: The Sequence Ontology term
- impact: The effect impact
- so_accession: The Sequence Ontology accession number
- transcript: The affected transcript ID
- lof: True if the mutation is predicted to cause the protein to lose its function
Syntax
predict_variant_effects(variant, default_transcript, gene_model)
- variant: The variant
- default_transcript (optional): If True, return effects for just the default transcript. If a specific transcript, then limits results to this transcript only. Otherwise returns effects for all transcripts.
- gene_model (optional): The desired gene model: refseq (default) or ensembl
prevalence
Calculates the frequency that a value occurs within a population. Typically used to calculate the prevalence of variants or genes across samples in a dataset. Returns the frequency of occurrence. Please note: in large datasets, the result is approximate and can have an error of up to 5%. Output data type: double
search
Scan through string looking for the first location where the regular expression pattern produces a match. Returns True on a match and False if no position in the string matches the pattern. Output data type: boolean
Syntax
search(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
- pattern: The regular expression pattern
- string: The string to search
- regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
- regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
- regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
search_groups
Scan through string looking for the first location where the regular expression pattern produces a match. Returns a list of strings corresponding to the groups in the pattern. Output data type: string (list)
Syntax
search_groups(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
- pattern: The regular expression pattern
- string: The string to search
- regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
- regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
- regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
split
Split text based on a delimiter and optionally strip whitespace. Output data type: string (list)
Syntax
split(value, delimiter, regex, strip, regex_ignorecase, regex_dotall, regex_multiline)
- value: The string to split
- delimiter (default: any whitespace): The character(s) to split on
- regex (default: None): A valid Python regular expression pattern to split on.
- strip (default: True): Strip whitespace from each resulting value
- regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
- regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
- regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
sub
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. Output data type: string
Syntax
sub(pattern, repl, string, count, regex_ignorecase, regex_dotall, regex_multiline)
- pattern: The regular expression pattern
- repl: The string to replace matches with
- string: The string to search
- count: (default: 0) The maximum number of pattern occurrences to be replaced.If zero, all occurrences will be replaces.
- regex_ignorecase (default: None): With a “regex” pattern, will perform a case insensitive matching.
- regex_dotall (default: None): With a “regex” pattern, will make the “.” special character match any character at all, including a newline; without this flag, “.” will match anything except a newline.
- regex_multiline (default: None): With a “regex” pattern, when specified, the pattern character “^” matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character “” matches at the end of the string and at the end of each line (immediately preceding each newline). By default, “^” matches only at the beginning of the string, and “” only at the end of the string and immediately before the newline (if any) at the end of the string.
tabulate
Converts a list of objects into a table (i.e. a two-dimensional array). Output data type: object (list)
Syntax
tabulate(objects, fields, header)
- objects: The list of objects
- fields (optional): List of fields to include (default: all)
- header (optional): Include a header row (default: True)
translate_variant
Translate variant into a protein change. Output data type: object. Output object properties:
- protein_length: Number of amino acids in the protein
- cdna_change: cDNA change
- protein_change: Protein change
- protein_coordinates: A dictionary containing start and stop coordinatesand the affected transcript id
- gene: HUGO gene symbol
- transcript: The transcript ID
- effects: list of effects
Syntax
translate_variant(variant, gene_model, transcript, include_effects)
- variant: The variant
- gene_model (optional): The desired gene model: refseq (default) or ensembl
- transcript (optional): Limits results to this transcript only
- include_effects (optional): Returns the effects of the variant using Veppy