Source code for bioservices.chembl

#
#  This file is part of bioservices software
#
#  Copyright (c) 2013-2014 - EBI-EMBL
#
#  File author(s):
#
#
#  Distributed under the GPLv3 License.
#  See accompanying file LICENSE.txt or copy at
#      http://www.gnu.org/licenses/gpl-3.0.html
#
#  website: https://github.com/cokelaer/bioservices
#  documentation: http://packages.python.org/bioservices
#
##############################################################################
"""This module provides a class :class:`ChEMBL`

.. topic:: What is ChEMBL

    :URL:  https://www.ebi.ac.uk/chembl
    :REST: https://www.ebi.ac.uk/chembl/api/data

    .. highlights::

        "Using the ChEMBL web service API users can retrieve data from the ChEMBL
        database in a programmatic fashion. The following list defines the currently
        supported functionality and defines the expected inputs and outputs of each
        method."

        -- From ChEMBL web page Dec 2012

"""
import os
from bioservices.services import REST
import webbrowser
from bioservices import logger

logger.name == __name__
try:
    from urllib.parse import quote
except:
    from urllib import quote

__all__ = ["ChEMBL"]


[docs]class ChEMBL(REST): """New ChEMBL API bioservices 1.6.0 **Resources** ChEMBL database is made of a set of resources. We recommend to look at https://arxiv.org/pdf/1607.00378.pdf Here we first create an instance and retrieve the first 1000 molecules from the database using the **limit** parameter. .. doctest:: >>> from bioservices import ChEMBL >>> c = ChEMBL() >>> res = c.get_molecule(limit=1000) The returned objet is a list of 1000 records, each of them being a dictionary. The **molecule** resource is actually a very large one and one may want to skip some entries. This is possible using the **offset** parameter as follows:: # Retrieve 1000 molecules skipping the first 50 res = c.get_molecule(limit=1000, offset=50) If you want to know all resources available and the number of entries in each resources, use:: status = c.get_status_resources() For instance, you should be able to get the total number of entries in the *mechanism* resource is about 5,000:: print(status['mechanism']) To retrieve all entries from the mechanism resource, you can either set limit to a value large enough:: res = c.get_mechanism(limit=1000000) or simply set it to -1:: res = c.get_mechanism(limit=-1) All resources methods behaves in the same way. Those resources methods are: :meth:`get_activity`, :meth:`get_assay`, :meth:`get_atc_class`, :meth:`get_binding_site`, :meth:`get_biotherapeutic`, :meth:`get_cell_line`, :meth:`get_chembl_id_lookup`, :meth:`get_compound_record`, :meth:`get_compound_structural_alert`, :meth:`get_document`, :meth:`get_document_similarity`, :meth:`get_document_term`, :meth:`get_drug`, :meth:`get_drug_indication`, :meth:`get_go_slim`, :meth:`get_mechanism`, :meth:`get_metabolism`, :meth:`get_molecule`, :meth:`get_molecule_form`, :meth:`get_protein_class`, :meth:`get_source`, :meth:`get_target`, :meth:`get_target_component`, :meth:`get_target_prediction`, :meth:`get_target_relation`, :meth:`get_tissue`. **3 ways of getting items** 1. Retrieve everything:: c.get_molecule(limit=-1) 2. Retrieve a specific entry:: c.get_molecule("CHEMBL24") 3. Retrieve a set of entries:: c.get_molecule(["CHEMBL24","CHEMBL2"]) **Filtering and Ordering** For ordering the results, we provide a simple method :meth:`order_by` that allows to sort the dictionary according to values in a specific key. Any data returned by a resource method (a list of dictionary) can be process through this method:: c = ChEMBL() data = c.get_drug(limit=100) ordered_data = c.order_by(data, 'chirality') If you want to order using a key within a key, for instance order by molecular weight stored in the *molecular_properties* key, use the double underscore method as follows:: c = ChEMBL() data = c.get_drug(limit=100) ordered_data = c.order_by(data, 'molecular_properties__mw_freebase') For filtering, it is possible to apply search filters to any resources. For example, it is possible to return all ChEMBL targets that contain the term 'kinase' in the pref_name attribute:: c.get_target(filters='pref_name__contains=kinase") The pattern for applying a filter is as follows:: [field]__[filter_type]=[value] where field has to be found by the user. Simply introspect the content of an item returned by the resource. For instance:: c.get_target(limit=1) # to get one entry Let us consider the case of the **molecule** resource. You can retrieve the first 10 molecules using e.g.:: res = c.get_molecule(limit=10) If you look at the first entry using res[0], you will get about 38 keys. For instance **molecule_properties** or **molecule_chembl_id**. You can filter the molecules to keep only the molecule_chembl_id that match either CHEMBL25 or CHEMBL1000 using:: res = c.get_molecule(filters='molecule_chembl_id__in=CHEMBL25,CHEMBL1000') For **molecule_properties**, this is actually a dictionary. For instance, inside the **molecule_properties** field, you have the molecular weight (mw_freebase). So to apply this filter, you need to use the following code (to keep molecules with molecular weight greater than 300:: res = c.get_molecule(filters='molecule_properties__mw_freebase__gte=300') Here are the different types of filtering: =============== ============================================= Filter Type Description =============== ============================================= exact (iexact) Exact match with query contains wild card search with query startswith starts with query endswith ends with query regex regulqr expression query gt (gte) Greater than (or equal) lt (lte) Less than (or equal) range Within a range of values in Appears within list of query values isnull Field is null search Special type of filter allowing a full text search based on Solr queries. =============== ============================================= Several filters can be applied at the same time using a list:: filters = ['molecule_properties__mw_freebase__gte=300'] filters += ['molecule_properties__alogp__gte=3'] res = c.get_molecule(filters) **Use Cases: (inspired from ChEMBL documentation)** Search molecules by synonym:: >>> from bioservices import ChEMBL >>> c = ChEMBL() >>> res = c.search_molecule('aspirin') or SMILE, or InChiKey, or CHEMBLID:: >>> res = c.get_molecule("CC(=O)Oc1ccccc1C(=O)O") >>> res = c.get_molecule("BSYNRYMUTXBXSQ-UHFFFAOYSA-N") >>> res = c.get_molecule('CHEMBL25') Several molecules at the same time can also be retrieved using lists:: >>> res = c.get_molecule(['CHEMBL25', 'CHEMBL2']) Search target by gene name:: >>> res = c.search_target("GABRB2") >>> len(res['targets']) 18 or directly in the target synonym field:: >>> res = c.get_target(filters='target_synonym__icontains=GABRB2') .. note:: Not sure what is the difference between icontains vs contains. It looks like icontains is more permissive (you get more entries with icontains). Having a list of molecules ChEMBL IDs in a list, get uniprot accession numbers that map to those compounds:: # First, get some IDs of approved drugs (about 2000 molecules) c = ChEMBL() drugs = c.get_approved_drugs() IDs = [x['molecule_chembl_id'] for x in drugs] # we jump from compounds to targets through activities # Here this is a one to many mapping so we initialise a default # dictionary. compound2target = defaultdict(set) filter = "molecule_chembl_id__in={}" for i in range(0, len(IDs), 50): activities = c.get_activity(filter.format(IDs[i:i+50])) # get target ChEMBL IDs from activities for act in activities: compound2target[act['molecule_chembl_id']].add(act['target_chembl_id']) # What we need is to get targets for all targets found in the previous # step. For each compound/drug there are hundreds of targets though. And # we will call the get_target for each list of hundreds targets. This # will take forever. Instead, because there are *only* 12,000 targets, # let us download all of them ! This took about 4 minutes on this test but # if you use the cache, next time it will be much much quicker. This is # not down at the activities level because there are too many entries targets = c.get_target(limit=-1) # identifies all target chembl id to easily retrieve the entry later on target_names = [target['target_chembl_id'] for target in targets] # retrieve all uniprot accessions for all targets of each compound for compound, targs in compounds2targets.items(): accessions = set() for target in targs: index = target_names.index(target) accessions = accessions.union([comp['accession'] for comp in targets[index]['target_components']]) compounds2targets[compound] = accessions In version 1.6.0 of bioservices, you can simply use:: res = c.compounds2targets(IDs) Get Target type count for all targets:: import collections collections.Counter([x['target_type'] for x in targets] Find compounds similar to given SMILES query with similarity threshold of 85%:: >>> SMILE = "CN(CCCN)c1cccc2ccccc12" >>> c.get_similarity(SMILE, similarity=70) Find compounds similar to aspirin (CHEMBL25) with similarity threshold of 70%:: # search for aspirin in all molecules and from first hist # get the ChEMBL ID >>> molecules = c.search_molecule("aspirin")['molecules'] >>> chembl_id = molecules[0]['molecule_chembl_id'] # now use the :meth:`get_similarity` given the ID >>> res = c.get_similarity(chembl_id, similarity=70) Perform substructure search using SMILES or ChEMBID:: >>> res = c.get_substructure("CN(CCCN)c1cccc2ccccc12") >>> res = c.get_substructure("CHEMBL25") Obtain he pChEMBL value for compound:: res = c.get_activity(filters=['pchembl_value__isnull=False', 'molecule_chembl_id=CHEMBL25']) Obtain he pChEMBL value for compound and target:: res = c.get_activity(filters=['pchembl_value__isnull=False', 'molecule_chembl_id=CHEMBL25', 'target_chembl_id=CHEMBL612545']) Get all approved drugs:: c.get_approved_drugs(max_phase=4) Get approved drugs for lung cancer The ChEMBL API significantly changed in 2018 and the nez version of bioservices (1.6.0) had to change the API as well, which has been simplified. Here below are some correspondances between the previous and the new API. ========================================== ========================== bioservices before 1.6.0 After 1.6.0 ========================================== ========================== get_compounds_substructure get_substructure get_compounds_similar_to_SMILES get_similarity(SMILE) get_compounds_by_chemblId(ID) get_similarity(ID) get_individual_compounds_by_inChiKey get_molecule(inchikey) get_compounds_by_chemblId_form get_molecule_form get_compounds_by_chemblId_drug_mechanism get_mechanism(ID) get_target_by_chemblId(ID) get_target(ID) get_image_of_compounds_by_chemblId get_image etc ========================================== ========================== :references: - https://arxiv.org/pdf/1607.00378.pdf - https://www.ebi.ac.uk/chembl/api/data/docs """ _url = "https://www.ebi.ac.uk/chembl/api/data" def __init__(self, verbose=False, cache=False): super(ChEMBL, self).__init__(url=ChEMBL._url, name="ChEMBL", verbose=verbose, cache=cache) self.format = "json" def _get_data(self, name, params): # keep the number of events we want and original offset max_data = params["limit"] offset = params["offset"] # I noticed that # if offset + limit > total_count, then limit is set to 1000 - offset # Not sure whether it is a bug or intended behaviour but this caused # some issues during the debugging. # So http_get("mechanism?format=json&limit=10000&offset=10") # returns 990 entries and not 1000 as expected. # if a resources is small (e.g. tissue has 655 < 1000 entries) there is # no such issues. # So, the best is to constraint limit to 1000 params["limit"] = 1000 # for the first call # The limit used in all other calls limit = 1000 res = self.http_get("{}".format(name), params=params) self._check_request(res) # get rid of page_meta key/value self.page_meta = res["page_meta"] keys = list(res.keys()) keys.remove("page_meta") names = keys[0] # the parameter name in plural form # keep first chunk of data data = res[names] if max_data == -1: max_data = res["page_meta"]["total_count"] elif max_data > res["page_meta"]["total_count"]: max_data = res["page_meta"]["total_count"] N = max_data from easydev import Progress pb = Progress(N) count = 1 while res["page_meta"]["next"] and len(data) < max_data: params["limit"] = limit params["offset"] = limit * count + offset res = self.http_get("{}".format(name), params=params) data += res[names] count += 1 pb.animate(len(data)) self.page_meta = res["page_meta"] if self.page_meta["next"]: offset = self.page_meta["offset"] total = self.page_meta["total_count"] - len(data) - int(offset) self.logging.warning( "More data available ({}). rerun with higher" "limit and/or offset {}. Check content of page_meta" " attribute".format(total, offset) ) if len(data) > max_data: return data[0:max_data] else: return data def _check_request(self, res): # If there is no output because of wrong query, a 404 is returned. if isinstance(res, int): raise ValueError("Invalid request for {} {}. Check your query and parameters") def _get_this_service(self, name, query, params={"limit": 20, "offset": 0}): """ if query is None, calls the resources URL/data/[resource] if query is a string, calls URL/data/[resource]/ID if query is a list of IDS, calls URL/data/[resource]/set/[IDS] In case 1 and 3, returns a dictionary and populate attribute page_meta. In case 2, there is only one requested ID so returns a dictionary (not a list of dictionaries). """ # look at any filters provided by the user if params["filters"] is None: del params["filters"] elif isinstance(params["filters"], list): for filter in params["filters"]: assert filter.count("=") == 1 key, value = filter.split("=") params[key] = value del params["filters"] else: assert params["filters"].count("=") == 1 k, v = params["filters"].split("=") del params["filters"] params[k] = v params["format"] = self.format # Here, we will switch between several ways of using each # service. if query is None: res = self._get_data(name, params) # user may use integer, floats or strings. elif isinstance(query, (str, int, float)): res = self.http_get("{}/{}".format(name, query), params=params) self._check_request(res) elif isinstance(query, list): assert params["limit"] <= 1000, "limit must be less than 1000" ids = ";".join([str(x) for x in query]) res = self.http_get("{}/set/{}".format(name, ids), params=params) self._check_request(res) # Note that there is no page_meta key in the returned object but a # single key that is the plural for of the resource except if some # entries are not found. In such case, a if "not_found" in res.keys(): self.logging.warning("Some entries were not found: {}".format(res["not_found"])) self.not_found = res["not_found"] del res["not_found"] names = list(res.keys())[0] res = res[names] return res def _search(self, name, query, params): # Check the validity of limits assert params["limit"] > 0, "limits must be less than 1000" assert params["limit"] <= 1000, "limits must be positive" res = self.http_get("{}/search.{}?q={}".format(name, self.format, query), params=params) if isinstance(res, int): self.logging.warning("Invalid request for {} {}. Check your parameters".format(name, params)) return {} if "page_meta" in res and res["page_meta"]["next"]: Next = res["page_meta"]["next"] offset = Next.split("&offset=")[1] self.logging.warning("More data available with offset {}".format(offset)) return res
[docs] def search_activity(self, query, limit=20, offset=0): """Activity values recorded in an Assay""" params = {"limit": limit, "offset": offset} return self._search("activity", query, params=params)
[docs] def get_activity(self, query=None, limit=20, offset=0, filters=None): """Activity values recorded in an Assay""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("activity", query, params=params)
[docs] def search_assay(self, query, limit=20, offset=0): """Assay details as reported in source document""" params = {"limit": limit, "offset": offset} return self._search("assay", query, params=params)
[docs] def get_assay(self, query=None, limit=20, offset=0, filters=None): """Assay details as reported in source Document/Dataset >>> c.get_assay("CHEMBL1217643") """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("assay", query, params=params)
[docs] def get_ATC(self, limit=20, offset=0, filters=None): """WHO ATC Classification for drugs c.get_atc() c['atc'] .. note:: get_molecule returns 'molecules' and likewise all methods return a dictionary whose key is the plural of the method name. This is quite consistent through the API except for that one because it is an acronym """ params = {"limit": limit, "offset": offset, "filters": filters} query = None return self._get_this_service("atc_class", query, params=params)
[docs] def get_binding_site(self, limit=20, offset=0, filters=None): """Target binding site definition""" params = {"limit": limit, "offset": offset, "filters": filters} query = None return self._get_this_service("binding_site", query, params=params)
[docs] def get_biotherapeutic(self, limit=20, offset=0, filters=None): """Biotherapeutic molecules, which includes HELM notation and sequence data""" params = {"limit": limit, "offset": offset, "filters": filters} query = None return self._get_this_service("biotherapeutic", query, params=params)
[docs] def get_cell_line(self, limit=20, offset=0, filters=None): """Cell line information""" params = {"limit": limit, "offset": offset, "filters": filters} query = None return self._get_this_service("cell_line", query, params=params)
[docs] def search_chembl_id_lookup(self, query, limit=20, offset=0): """Look up ChEMBL Id entity type""" params = {"limit": limit, "offset": offset} return self._search("chembl_id_lookup", query, params=params)
[docs] def get_chembl_id_lookup(self, query=None, limit=20, offset=0, filters=None): """Look up ChEMBL Id entity type""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("chembl_id_lookup", query, params=params)
[docs] def get_compound_record(self, query=None, limit=20, offset=0, filters=None): """Occurence of a given compound in a spcecific document""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("compound_record", query, params=params)
[docs] def get_compound_structural_alert(self, query=None, limit=20, offset=0, filters=None): """Indicates certain anomaly in compound structure""" params = {"limit": limit, "offset": offset, "filters": filters} query = None return self._get_this_service("compound_structural_alert", query, params=params)
[docs] def search_document(self, query, limit=20, offset=0): """Document/Dataset from which Assays have been extracted""" params = {"limit": limit, "offset": offset} return self._search("document", query, params=params)
[docs] def get_document(self, query=None, limit=20, offset=0, filters=None): """Document/Dataset from which Assays have been extracted""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("document", query, params=params)
[docs] def get_document_similarity(self, query=None, limit=20, offset=0, filters=None): """Provides documents similar to a given one""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("document_similarity", query, params=params)
[docs] def get_document_term(self, query=None, limit=20, offset=0, filters=None): """Provides keywords extracted from a document using the TextRank algorithm""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("document_term", query, params=params)
[docs] def get_approved_drugs(self, max_phase=4, maxdrugs=1000000): """Return all approved drugs :param max_phase: 4 by default for approved drugs. """ filters = "development_phase__exact={}".format(max_phase) data = self.get_drug(filters=filters, limit=maxdrugs) return data
[docs] def get_drug(self, query=None, limit=20, offset=0, filters=None): """Approved drugs information, icluding (but not limited to) applicants, patent numbers and research codes""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("drug", query, params=params)
[docs] def get_drug_indication(self, query=None, limit=20, offset=0, filters=None): """Joins drugs with diseases providing references to relevant sources""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("drug_indication", query, params=params)
[docs] def get_go_slim(self, query=None, limit=20, offset=0, filters=None): """GO slim ontology""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("go_slim", query, params=params)
[docs] def get_mechanism(self, query=None, limit=20, offset=0, filters=None): """Mechanism of action information for FDA-approved drugs""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("mechanism", query, params=params)
[docs] def get_metabolism(self, query=None, limit=20, offset=0, filters=None): """Metabolic pathways with references""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("metabolism", query, params=params)
[docs] def search_molecule(self, query, limit=20, offset=0): params = {"limit": limit, "offset": offset} return self._search("molecule", query, params=params)
[docs] def get_molecule(self, query=None, limit=20, offset=0, filters=None): """Returns some molecules :param limit: number of molecules to retrieve :param offset: molecules to ignore before retrieving molecules. :return: a dictionary with keys *page_meta* and *molecules*. There are 1,800,000 molecules (Jan 2019). You can only retrieve 1,000 molecule at most using the *limit* parameter. With a loop you can retrieve molecules in some range. :: c.get_molecule('QFFGVLORLPOAEC-SNVBAGLBSA-N') c.get_molecule("CC(=O)Oc1ccccc1C(=O)O") """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("molecule", query, params=params)
[docs] def get_molecule_form(self, query=None, limit=20, offset=0, filters=None): """Relationships between molecule parents and salts >>> s.get_molecule_form("CHEMBL2")['molecule_forms'] [{'is_parent': 'True', 'molecule_chembl_id': 'CHEMBL2', 'parent_chembl_id': 'CHEMBL2'}, {'is_parent': 'False', 'molecule_chembl_id': 'CHEMBL1558', 'parent_chembl_id': 'CHEMBL2'}, {'is_parent': 'False', 'molecule_chembl_id': 'CHEMBL1347191', 'parent_chembl_id': 'CHEMBL2'}] """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("molecule_form", query, params=params)
[docs] def get_organism(self, query=None, limit=20, offset=0, filters=None): params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("organism", query, params=params)
[docs] def search_protein_class(self, query, limit=20, offset=0): params = {"limit": limit, "offset": offset} return self._search("protein_class", query, params=params)
[docs] def get_protein_class(self, query=None, limit=20, offset=0, filters=None): """Protein family classification of TargetComponents""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("protein_class", query, params=params)
[docs] def get_substructure(self, structure, limit=20, offset=0, filters=None): """Molecule substructure search :param structure: provide a valid / existing substructure in SMILE format to look for in all molecules: :return: list of molecules corresponding to the search :: >>> from bioservices import ChEMBL >>> c = ChEMBL() >>> res = c.get_substructure("CC(=O)Oc1ccccc1C(=O)O") Other examples:: # Substructure search for against ChEMBL using aspirin # SMILES string c.get_substructure("CC(=O)Oc1ccccc1C(=O)O") # Substructure search for against ChEMBL using aspirin # CHEMBL_ID c.get_substructure("CHEMBL25") # Substructure search for against ChEMBL using aspirin # InChIKey c.get_substructure("BSYNRYMUTXBXSQ-UHFFFAOYSA-N") The 'Substructure' and 'Similarity' web service resources allow for the chemical content of ChEMBL to be searched. Similar to the other resources, these search based resources except filtering, paging and ordering arguments. These methods accept SMILES, InChI Key and molecule ChEMBL_ID as arguments and in the case of similarity searches an additional identity cut-off is needed. Some example molecule searches are provided in the table below. Searching with InChI key is only possible for InChI keys found in the ChEMBL database. The system does not try and convert InChI key to a chemical representation. """ # we use quote to formqt the SMILE/InChiKey for a URL parsing structure = quote(structure) params = {"limit": limit, "offset": offset, "filters": filters} query = None return self._get_this_service("substructure/{}".format(structure), query, params=params)
[docs] def get_similarity(self, structure, similarity=80, limit=20, offset=0, filters=None): """Molecule similarity search :param structure: provide a valid / existing substructure in SMILE format to look for in all molecules: :param similarity: must be an integer greater than 70 and less than 100 :return: list of **molecules** corresponding to the search :: >>> from bioservices import ChEMBL >>> c = ChEMBL() >>> res = c.get_similarity("CC(=O)Oc1ccccc1C(=O)O", 80) >>> res['molecules'] Here are more examples:: # Similarity (80% cut off) search for against ChEMBL using # aspirin SMILES string c.get_similarity("CC(=O)Oc1ccccc1C(=O)O") # 80 by default # Similarity (80% cut off) search for against ChEMBL using # aspirin CHEMBL_ID c.get_similarity("CHEMBL25") # Similarity (80% cut off) search for against ChEMBL # using aspirin InChI Key c.get_similarity("BSYNRYMUTXBXSQ-UHFFFAOYSA-N") The 'Substructure' and 'Similarity' web service resources allow for the chemical content of ChEMBL to be searched. Similar to the other resources, these search based resources except filtering, paging and ordering arguments. These methods accept SMILES, InChI Key and molecule ChEMBL_ID as arguments and in the case of similarity searches an additional identity cut-off is needed. Some example molecule searches are provided in the table below. Searching with InChI key is only possible for InChI keys found in the ChEMBL database. The system does not try and convert InChI key to a chemical representation. """ # we use quote to formqt the SMILE/InChiKey for a URL parsing structure = quote(structure) assert isinstance(similarity, int) assert similarity >= 70 and similarity <= 100, "similarity must be in the range [70, 100]" params = {"limit": limit, "offset": offset, "filters": filters} query = None return self._get_this_service("similarity/{}/{}".format(structure, similarity), query, params=params)
[docs] def get_source(self, query=None, limit=20, offset=0, filters=None): """Document/Dataset source""" params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("source", query, params=params)
[docs] def search_target(self, query, limit=20, offset=0): """Targets (protein and non-protein) defined in Assay""" params = {"limit": limit, "offset": offset} return self._search("target", query, params=params)
[docs] def get_target(self, query=None, limit=20, offset=0, filters=None): """Targets (protein and non-protein) defined in Assay >>> from bioservices import * >>> s = ChEMBL(verbose=False) >>> resjson = s.get_targetd('CHEMBL240') """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("target", query, params=params)
[docs] def get_target_component(self, query=None, limit=20, offset=0, filters=None): """Target sequence information (A Target may have 1 or more sequences) :: res = c.get_target_component(1) res['sequence'] """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("target_component", query, params=params)
[docs] def get_target_prediction(self, query=None, limit=20, offset=0, filters=None): """Predictied binding of a molecule to a given biological target :: >>> res = c.get_target_prediction(1) >>> res['molecule_chembl_id'] 'CHEMBL2' """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("target_prediction", query, params=params)
[docs] def get_target_relation(self, query=None, limit=20, offset=0, filters=None): """Describes relations between targets :: >>> c.get_target_relation('CHEMBL261') {'related_target_chembl_id': 'CHEMBL2095180', 'relationship': 'SUBSET OF', 'target_chembl_id': 'CHEMBL261'} """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("target_relation", query, params=params)
[docs] def get_tissue(self, query=None, limit=20, offset=0, filters=None): """Tissue classification c.get_tissue(filters=['pref_name__contains=cervix']) """ params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("tissue", query, params=params)
[docs] def get_xref_source(self, query=None, limit=20, offset=0, filters=None): params = {"limit": limit, "offset": offset, "filters": filters} return self._get_this_service("xref_source", query, params=params)
[docs] def get_image(self, query, dimensions=500, format="png", save=True, view=True, engine="indigo"): """Get the image of a given compound in PNG png format. :param str query: a valid compound ChEMBLId or a list/tuple of valid compound ChEMBLIds. :param format: png, svg. json not supported :param int dimensions: size of image in pixels. An integer z (:math:`1 \leq z \leq 500`) :param save: :param view: :param engine: Defaults to rdkit. can be rdkit or indigo :param bool view: show the image if set to True. :return: the path (list of paths) used to save the figure (figures) (different from Chembl API) .. plot:: :include-source: :width: 50% >>> from pylab import imread, imshow >>> from bioservices import * >>> s = ChEMBL(verbose=False) >>> res = s.get_image(31863) >>> imshow(imread(res['filenames'][0])) .. todo:: ignorecoords option """ # NOTE: not async requests here. self.devtools.check_range(dimensions, 1, 500) self.devtools.check_param_in_list(engine, ["rdkit", "indigo"]) self.devtools.check_param_in_list(format, ["png", "svg"]) queries = self.devtools.to_list(query) res = {"filenames": [], "images": [], "chemblids": []} for query in queries: req = "image/{}".format(query) params = {"engine": engine, "format": format, "dimensions": dimensions} target_data = self.http_get(req, frmt=None, params=params) file_out = os.getcwd() if format == "png": file_out += "/%s.png" % query with open(file_out, "wb") as thisfile: thisfile.write(bytes(target_data)) elif format == "svg": file_out += "/%s.svg" % query with open(file_out, "w") as thisfile: thisfile.write(target_data) self.logging.info("saved to %s" % file_out) fout = file_out res["chemblids"].append(query) res["filenames"].append(fout) res["images"].append(target_data) if view: webbrowser.open(res["filenames"][0]) return res
[docs] def get_status(self): """Return version of the DB and number of entries Returns the number of entries for activities, compound_records, distinct_compounds (molecule), publications (document), targets, etc... .. seealso:: :meth:`get_status_resources` """ return self.http_get("status?format=json")
[docs] def get_status_resources(self): """Return number of entries for all resources .. note:: not in the ChEMBL API. .. versionchanged:: 1.7.3 (removed target_prediction and document_term) """ def _local_get(this): params = {"limit": 1, "offset": 0} return self.http_get("{}?format=json".format(this), params=params)["page_meta"]["total_count"] data = {} for this in [ "activity", "assay", "atc_class", "cell_line", "binding_site", "biotherapeutic", "chembl_id_lookup", "compound_record", "compound_structural_alert", "document", "document_similarity", "drug", "drug_indication", "go_slim", "mechanism", "metabolism", "molecule", "molecule_form", "protein_class", "source", "target", "target_component", "target_relation", "tissue", ]: self.logging.info("Looking at {}".format(this)) try: data[this] = _local_get(this) except: self.logging.warning("{} resources seems down".format(this)) return data
[docs] def order_by(self, data, name, ascending=True): """Ordering data we use same API as ChEMBL API using the double underscore to indicate a hierarchy in the dictionary. So to access to d['a']['b'], we use a__b as the input **name** parameter. We only allows 3 levels e.g., a__b__c :: data = c.get_molecules() data1 = c.order_by(data, 'molecule_chembl_id') data2 = c.order_by(data, 'molecule_properties__alogp') .. note:: the ChEMBL API allows for ordering but we do not use that API. Instead, we provide this generic function. """ # FIXME sorry no time for a better solution # we allow only 3 levels using 3 if if name.count("__") == 0: data = sorted(data, key=lambda k: k[name], reverse=not ascending) elif name.count("__") == 1: n1, n2 = name.split("__") data = sorted(data, key=lambda k: k[n1][n2], reverse=not ascending) elif name.count("__") == 2: n1, n2, n3 = name.split("__") data = sorted(data, key=lambda k: k[n1][n2][n3], reverse=not ascending) else: raise NotImplementedError( """Please submit a issue on https://github.com/cokelaer/bioservices to allow this level or ordering together will your code example.""" ) return data
[docs] def compounds2accession(self, compounds): """For each compound, identifies the target and corresponding UniProt accession number This is not part of ChEMBL API :: # we recommend to use cache if you use this method regularly c = Chembl(cache=True) drugs = c.get_approved_drugs() # to speed up example drugs = drugs[0:20] IDs = [x['molecule_chembl_id] for x in drugs] c.compounds2accession(IDs) """ # we jump from compounds to targets through activities # Here this is a one to many mapping so we initialise a default # dictionary. from collections import defaultdict compound2target = defaultdict(set) filter = "molecule_chembl_id__in={}" from easydev import Progress if isinstance(compounds, list): pass else: compounds = list(compounds) pb = Progress(len(compounds)) for i in range(0, len(compounds)): # FIXME could get activities by bunch using # ",".join(compounds[i:i+10) for example activities = self.get_activity(filters=filter.format(compounds[i])) # get target ChEMBL IDs from activities for act in activities: compound2target[act["molecule_chembl_id"]].add(act["target_chembl_id"]) pb.animate(i + 1) # What we need is to get targets for all targets found in the previous # step. For each compound/drug there are hundreds of targets though. And # we will call the get_target for each list of hundreds targets. This # will take forever. Instead, because there are *only* 12,000 targets, # let us download all of them ! This took about 4 minutes on this test but # if you use the cache, next time it will be much much quicker. This is # not down at the activities level because there are too many entries targets = self.get_target(limit=-1) # identifies all target chembl id to easily retrieve the entry later on target_names = [target["target_chembl_id"] for target in targets] # retrieve all uniprot accessions for all targets of each compound for compound, targs in compound2target.items(): accessions = set() for target in targs: index = target_names.index(target) accessions = accessions.union([comp["accession"] for comp in targets[index]["target_components"]]) compound2target[compound] = accessions return compound2target