{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
This notebook illustrates some of the ChEMBL web services using BioServices chembl module.
\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "from bioservices import ChEMBL\n", "%pylab inline\n", "matplotlib.rcParams['savefig.dpi'] = 2 * matplotlib.rcParams['savefig.dpi'] \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Note: We are going to play with the ChEMBL service here. We will also use Pandas and Matplotlib packages, that are not part of BioServices and not installed with bioservices. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# let us see if pandas is installed indeed\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# First, let us create an instance\n", "s = ChEMBL()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Compounds related" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first thing is get familiar with the inputs that are expected by most of the functions, which is a valid identifier. ChEMBL identifier are of the form 'CHEMBL' + number e.g, CHEMBL2. Let us see the output:\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{u'compound': {u'acdBasicPka': 6.52,\n", " u'acdLogd': 2.09,\n", " u'acdLogp': 2.14,\n", " u'alogp': 2.11,\n", " u'chemblId': u'CHEMBL2',\n", " u'knownDrug': u'Yes',\n", " u'medChemFriendly': u'Yes',\n", " u'molecularFormula': u'C19H21N5O4',\n", " u'molecularWeight': 383.4,\n", " u'numRo5Violations': 0,\n", " u'passesRuleOfThree': u'No',\n", " u'preferredCompoundName': u'PRAZOSIN',\n", " u'rotatableBonds': 4,\n", " u'smiles': u'COc1cc2nc(nc(N)c2cc1OC)N3CCN(CC3)C(=O)c4occc4',\n", " u'species': u'NEUTRAL',\n", " u'stdInChiKey': u'IENZQIKPVFGBNW-UHFFFAOYSA-N',\n", " u'synonyms': u'CP-12299,Prazosin'}}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.get_compounds_by_chemblId('CHEMBL2')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a dictionary with information about that particular ChEMBL identifier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can call the previous method with a list of identifiers instead of just one identifier. We can do that in a systematic way. One issue is that some identifiers do not exists (e.g., CHEMBL7).In that particular case, the number 404 is returned. For instance, here let us try to fetch the first 1000 identifiers." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1000" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = s.get_compounds_by_chemblId(['CHEMBL%s' % i for i in range(0,1000)])\n", "len(res)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "