1. Quick Start

1.1. Introduction

BioServices provides access to several Web Services. Each service requires some expertise on its own. In this Quick Start section, we will neither cover all the services nor all their functionalities. However, it should give you a good overview of what you can do with BioServices (both from the user and developer point of views).

Before starting, let us remind what are Web Services. There provide an access to databases or applications via a web interface based on the SOAP/WSDL or the REST technologies. These technologies allow a programmatic access, which we take advantage in BioServices.

The REST technology uses URLs so there is no external dependency. You simply need to build a well-formatted URL and you will retrieve an XML document that you can consume with your preferred technology platform.

The SOAP/WSDL technology combines SOAP (Simple Object Access Protocol), which is a messaging protocol for transporting information and the WSDL (Web Services Description Language), which is a method for describing Web Services and their capabilities.

1.1.1. What methods are available for a given service

Usually most of the service functionalities have been wrapped and we try to keep the names as close as possible to the API. On top of the service methods, each class inherits from the BioService class (REST or WSDL). For instance REST service have the useful request method. Another nice function is the onWeb.

See also

REST, WSDLService

1.1.2. What about the output ?

Outputs depend on the service and functionalities of the service. It can be heteregeneous. However, output are mostly XML formatted or in tabulated separated column format (TSV). When XML is returned, it is usually parsed via the BeautilSoup package (for instance you can get all children using getchildren() function). Sometimes, we also convert output into dictionaries. So, it really depends on the service/functionality you are using.

Let us look at some of the Web Services wrapped in BioServices.

1.2. UniProt service

Let us start with the UniProt class. With this class, you can access to uniprot services. In particular, you can map an ID from a database to another one. For instance to convert the UniProtKB ID into KEGG ID, use:

>>> from bioservices.uniprot import UniProt
>>> u = UniProt(verbose=False)
>>> u.mapping(fr="ACC+ID", to="KEGG_ID", query='P43403')
{'P43403': ['hsa:7535']}

Note that the returned response from uniprot web service is converted into a list. The first two elements are the databases used for the mapping. Then, alternance of the queried element and the answer populates the list.

You can also search for a specific UniProtKB ID to get exhaustive information:

>>> print(u.search("P43403", frmt="txt"))
ID   ZAP70_HUMAN             Reviewed;         619 AA.
AC   P43403; A6NFP4; Q6PIA4; Q8IXD6; Q9UBS6;
DT   01-NOV-1995, integrated into UniProtKB/Swiss-Prot.
DT   01-NOV-1995, sequence version 1.
...

To obtain the FASTA sequence, you can use searchUniProtId():

>>> res = u.searchUniProtId("P09958", frmt="xml")
>>> print(u.searchUniProtId("P09958", frmt="fasta"))
sp|P09958|FURIN_HUMAN Furin OS=Homo sapiens GN=FURIN PE=1 SV=2
MELRPWLLWVVAATGTLVLLAADAQGQKVFTNTWAVRIPGGPAVANSVARKHGFLNLGQI
FGDYYHFWHRGVTKRSLSPHRPRHSRLQREPQVQWLEQQVAKRRTKRDVYQEPTDPKFPQ
QWYLSGVTQRDLNVKAAWAQGYTGHGIVVSILDDGIEKNHPDLAGNYDPGASFDVNDQDP
DPQPRYTQMNDNRHGTRCAGEVAAVANNGVCGVGVAYNARIGGVRMLDGEVTDAVEARSL
GLNPNHIHIYSASWGPEDDGKTVDGPARLAEEAFFRGVSQGRGGLGSIFVWASGNGGREH
DSCNCDGYTNSIYTLSISSATQFGNVPWYSEACSSTLATTYSSGNQNEKQIVTTDLRQKC
TESHTGTSASAPLAAGIIALTLEANKNLTWRDMQHLVVQTSKPAHLNANDWATNGVGRKV
SHSYGYGLLDAGAMVALAQNWTTVAPQRKCIIDILTEPKDIGKRLEVRKTVTACLGEPNH
ITRLEHAQARLTLSYNRRGDLAIHLVSPMGTRSTLLAARPHDYSADGFNDWAFMTTHSWD
EDPSGEWVLEIENTSEANNYGTLTKFTLVLYGTAPEGLPVPPESSGCKTLTSSQACVVCE
EGFSLHQKSCVQHCPPGFAPQVLDTHYSTENDVETIRASVCAPCHASCATCQGPALTDCL
SCPSHASLDPVEQTCSRQSQSSRESPPQQQPPRLPPEVEAGQRLRAGLLPSHLPEVVAGL
SCAFIVLVFVTVFLVLQLRSGFSFRGVKVYTMDRGLISYKGLPPEAWQEECPSDSEEDEG
RGERTAFIKDQSAL

See also

Reference guide of bioservices.uniprot.UniProt for more details

1.3. KEGG service

The KEGG interface is similar but contains more methods. The tutorial presents the KEGG itnerface in details, but let us have a quick overview. First, let us start a KEGG instance:

from bioservices import KEGG
k = KEGG(verbose=False)

KEGG contains biological data for many organisms. By default, no organism is set, which can be checked in the following attribute

k.organism

We can set it to human using KEGG terminology for homo sapiens:

k.organis = 'hsa'

You can use the dbinfo() to obtain statistics on the pathway database:

>>> print(k.info("pathway"))
pathway          KEGG Pathway Database
path             Release 65.0+/01-15, Jan 13
                 Kanehisa Laboratories
                 218,277 entries

You can see the list of valid databases using the databases attribute. Each of the database entry can also be listed using the list() method. For instance, the organisms can be retrieved with:

k.list("organism")

However, to extract the Ids extra processing is required. So, we provide aliases to retrieve the organism Ids easily:

k.organismIds

The human organism is coded as “hsa”. You can also get its T number instead:

>>> k.code2Tnumber("hsa")
'T01001'

Every elements is referred to with a KEGG ID, which may be difficult to handle at first. There are methods to retrieve the IDs though. For instance, get the list of pathways iIs for the current organism as follows:

k.pathwayIds

For a given gene, you can get the full information related to that gene by using the method get():

print(k.get("hsa:3586"))

or a pathway:

print(k.get("path:hsa05416"))

See also

Reference guide of bioservices.kegg.KEGG for more details

See also

KEGG Tutorial for more details

See also

Reference guide of bioservices.kegg.KEGGParser to parse a KEGG entry into a dictionary

1.4. QuickGO

To acces to the GO interface, simply create an instance and look for a entry using the bioservices.quickgo.QuickGO.Term() method:

>>> from bioservices import QuickGO
>>> g = QuickGO(verbose=False)
>>> print(g.Term("GO:0003824", frmt="obo"))
[Term]
id: GO:0003824
name: catalytic activity
def: "Catalysis of a biochemical reaction at physiological temperatures. In
biologically catalyzed reactions, the reactants are known as substrates, and the
catalysts are naturally occurring macromolecular substances known as enzymes.
Enzymes possess specific binding sites for substrates, and are usually composed
wholly or largely of protein, but RNA that has catalytic activity (ribozyme) is
often also regarded as enzymatic."
synonym: "enzyme activity" exact
xref: InterPro:IPR000183
...

See also

Reference guide of bioservices.quickgo.QuickGO for more details

1.5. PICR service

PICR, the Protein Identifier Cross Reference service provides 2 services in WSDL and REST protocols. When it is the case, we arbitrary chose one of the available protocol. In the PICR case, we implemented only the REST interface. The methods available in the REST service are very similar to those available via SOAP except for one major difference: only one accession or sequence can be mapped per request.

The following example returns a XML document containing information about the protein P29375 found in two specific databases:

>>> from bioservices.picr import PICR
>>> p = PICR()
>>> res = p.getUPIForAccession("P29375", ["IPI", "ENSEMBL"])

See also

Reference guide of bioservices.picr.PICR for more details

1.6. BioModels service

You can access the biomodels service and obtain a model as follows:

>>> from bioservices import biomodels
>>> b = biomodels.BioModels()
>>> model = b.get_model('BIOMD0000000299')

Then you can play with the SBML file with your favorite SBML tool.

In order to get the model IDs, you can look at the full list:

>>> b.get_models()

See also

Reference guide of bioservices.biomodels.BioModels for more details

See also

Biomodels tutorial for more details

1.7. Rhea service

Create a Rhea instance as follows:

from bioservices import Rhea
r = Rhea()

Rhea provides only 2 type of requests with a REST interface that are available with the search() and query() methods. Let us first find information about the chemical product caffein using the search() method:

response = r.search("caffein*")

The output is a JSON file that we convert in BioServices into a Pandas dataframe.

The previous request returns more than 10,000 entries. Here are the first two entries:

    Reaction identifier                                           Equation             ChEBI name           Cross-reference (KEGG)  Cross-reference (Reactome)
0   RHEA:47148              a ubiquinone + caffeine + H2O = 1,3,7-trimethy...          MetaCyc:RXN-11523           KEGG:R07980                         NaN
1   RHEA:10280              1,7-dimethylxanthine + S-adenosyl-L-methionine...          MetaCyc:RXN-7601            KEGG:R07921                         NaN

The second method provided is the query() method. Given an Id, you can query the Rhea database using Id found earlier (e.g., 10280). This is finally a filtering method as compared to the search method. If you kow what your are looking for (the rhea-id) use this method instead of the search method:

info = r.query("10280", columns="rhea-id,equation", limit=10)

See also

Reference guide of bioservices.rhea.Rhea for more details

1.8. Other services

There are many other services provided within BioServices and the reference guide should give you all the information available with examples to start to play with any of them. The home page of the services themselves is usually a good starting point as well.

Services that are not available in BioServices can still be accesssed to quite easily as demonstrated in the Developer Guide section.