# Querying Datasets and Files ## Overview The EDP is designed for easy access to molecular information. It provides an easy-to-use, real-time API for querying any dataset or file on the platform through the EDP Python or R client libraries. Users can also use Bash to query datasets. Users can also apply complex filters when querying datasets and files; to learn more about using filters, users can refer to the [Filters](https://quartzbio.freshdesk.com/en/support/solutions/articles/73000614733) documentation. ## Querying Datasets Dataset query results are returned in pages, similar to a search engine. To narrow down search results, datasets can be filtered on one or more fields. Users can either build queries using a programming language (or even write raw JSON) or by building them directly on any dataset page in the EDP web application. The easiest way to query datasets is by using the EDP Python or R client libraries.  A basic query returns a page of results from the specified public dataset. Users can set the paginate parameter to True to retrieve all records or use the limit parameter to specify how many records to retrieve. Users should note that in the R client, the limit parameter allows users to retrieve a maximum of 10,000 records in a single request. Additionally, the query function accepts the following parameters: | Parameter | Value | Description | |-----------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | filters | objects | A valid filter object. | | facets | objects | A valid facets object | | fields | string | A list of fields to include in the results. | | exclude\_fields | string | A list of fields to exclude in the results. | | ordering | string | A list of fields to order results by. | | query | string | A valid query string. | | limit | integer | The number of results to return per-page. | | offset | integer | The record offset in the result-set. | | paginate | boolean | If True, returns all records. Default is False. | | page\_size | integer | The internal batch size per request. Default is 100, with a maximum size of 10,000. Increasing the page\_size can increase the speed of the query, but large numbers can in some cases cause requests to fail due to the large amount of data coming out in a single response. | | output\_format | string | The output format of the query ('csv', 'tsv', or 'json'). Default is 'json'. | In Python: ```Python # Users can set how many records they want to retrieve with the "limit" parameter Dataset.get_by_full_path('quartzbio:Public:/ClinVar/5.2.0-20210110/Variants-GRCH37').query(limit=1000) # Users can order query results using the ordering argument # Order the query results by clinical_significance ascending Dataset.get_by_full_path('quartzbio:Public:/ClinVar/5.2.0-20210110/Variants-GRCH37').query(ordering='clinical_significance') # Order the query results by clinical_significance descending Dataset.get_by_full_path('quartzbio:Public:/ClinVar/5.2.0-20210110/Variants-GRCH37').query(ordering='-clinical_significance') # Query results can be ordered by multiple columns # Order the query results by clinical_significance descending and gene_symbol ascending Dataset.get_by_full_path('quartzbio:Public:/ClinVar/5.2.0-20210110/Variants-GRCH37').query(ordering=['-clinical_significance', 'gene']) ``` ## Saving Queries Dataset queries can be saved and then used to make queries on datasets with a similar structure. Saved queries can be created for any dataset and can be shared with members of a user's organization. For example, users may save a query for a set of interesting genes. They can then make this query available for all datasets that contain genes. If shared with other users in the organization, they will also be able to apply this query. The Saved Queries API To retrieve Saved Queries that apply to a dataset, or all those available: In Python: ```Python dataset_queries = SavedQuery.all(dataset="") all_saved_queries = SavedQuery.all() ``` To use a saved query, users can retrieve the SavedQuery object and then apply the parameters. In Python: ```Python saved_query = SavedQuery.retrieve("SAVED_QUERY_ID") # Option 1: from the SavedQuery instance (Python only) results = saved_query.query("") # Option 2: from the Dataset.query() function results = Dataset.retrieve("/v2/saved\_queries/{ID} | Retrieve a Saved Query. | This request requires an authorized user with permission. | The response contains a SavedQuery resource. | | Method | HTTP Request | Description | Authorization | Response | |--------|-----------------------------------------------|---------------------------|------------------------------------------------------------------------|----------------------------------------------------| | create | POST https:///v2/saved\_queries | Create a new Saved Query. | This request requires an authorized user with appropriate permissions. | The response contains the new SavedQuery resource. | Request Body: | Property | Value | Description | |-------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------| | name | string | A short name for the Saved Query. | | description | string | A description for the Saved Query. | | dataset | string | The ID or full\_path of a dataset to validate this query parameters against. This is needed on initial creation to ensure valid parameters. | | params | objects | The query parameters (see query parameters above for _query_ method). | | is\_shared | boolean | If True, this query will be shared with other members of you organization | | Method | HTTP Request | Description | Authorization | Response | |--------|------------------------------------------------------|-----------------------|----------------------------------------------------------------------------------|-----------------------------------------------------| | delete | DELETE https:///v2/saved\_queries/{ID} | Delete a Saved Query. | This request requires an authorized user with write permissions on the resource. | The response returns "HTTP 200 OK" when successful. | | Method | HTTP Request | Description | Authorization | Response | |--------|----------------------------------------------|--------------------------------------------------|-------------------------------------------|-------------------------------------------------------| | list | GET https:///v2/saved\_queries | Retrieves all Saved Queries available to a user. | This request requires an authorized user. | The response contains a list of SavedQuery resources. |