5. Developer Guide

5.1. Service module (REST)

Modules with common tools to access web resources

exception BioServicesError(value)[source]

class HTTPResponseError(status_code, reason='', url='')[source]

Returned when a remote service replies with a non-2xx HTTP status code.

Behaves exactly like an int so that existing code that does:

if res == 404: ...
if isinstance(res, int): ...

continues to work. However, trying to use the return value as a mapping or sequence — the most common source of the cryptic TypeError: 'int' object is not subscriptable — raises a BioServicesError with a human-readable explanation instead.

Attributes:

status_codeint: The HTTP status code (e.g. 404).
reasonstr: The HTTP reason phrase (e.g. "Not Found").
urlstr: The URL that produced the error.

items()[source]

keys()[source]

reason: str

status_code: int

url: str

values()[source]

class REST(name, url=None, verbose=True, cache=False, requests_per_sec=3, proxies=[], cert=None, url_defined_later=False)[source]

The ideas (sync/async) and code using requests were inspired from the chembl python wrapper but significantly changed.

Get one value:

>>> from bioservices import REST
>>> s = REST("test", "https://www.ebi.ac.uk/chemblws")
>>> res = s.get_one("targets/CHEMBL2476.json", "json")
>>> res['organism']
u'Homo sapiens'

The caching has two major interests. First one is that it speed up requests if you repeat requests.

>>> s = REST("test", "https://www.ebi.ac.uk/chemblws")
>>> s.CACHING = True
>>> # requests will be stored in a local sqlite database
>>> s.get_one("targets/CHEMBL2476")
>>> # Disconnect your wiki and any network connections.
>>> # Without caching you cannot fetch any requests but with
>>> # the CACHING on, you can retrieve previous requests:
>>> s.get_one("targets/CHEMBL2476")

Advantages of requests over urllib

requests length is not limited to 2000 characters http://www.g-loaded.eu/2008/10/24/maximum-url-length/

There is no need for authentication if the web services available in bioservices except for a few exception. In such case, the username and password are to be provided with the method call. However, in the future if a services requires authentication, one can set the attribute authentication to a tuple:

s = REST()
s.authentication = ('user', 'pass')

Note about headers and content type. The Accept header is used by HTTP clients to tell the server what content types they will accept. The server will then send back a response, which will include a Content-Type header telling the client what the content type of the returned content actually is. When using the get__headers(), you can see the User-Agent, the Accept and Content-Type keys. So, here the HTTP requests also contain Content-Type headers. In POST or PUT requests the client is actually sendingdata to the server as part of the request, and the Content-Type header tells the server what the data actually is For a POST request resulting from an HTML form submission, the Content-Type of the request should be one of the standard form content types: application/x-www-form-urlencoded (default, older, simpler) or multipart/form-data (newer, adds support for file uploads)

Constructor

Parameters:

name (str) – a name for this service
url (str) – its URL
verbose (bool) – prints informative messages if True (default is True)
requests_per_sec – maximum number of requests per seconds are restricted to 3. You can change that value. If you reach the limit, an error is raise. The reason for this limitation is that some services (e.g.., NCBI) may black list you IP. If you need or can do more (e.g., ChEMBL does not seem to have restrictions), change the value. You can also have several instance but again, if you send too many requests at the same, your future requests may be retricted. Currently implemented for REST only

All instances have an attribute called logging that is an instanceof the logging module. It can be used to print information, warning, error messages:

self.logging.info("informative message")
self.logging.warning("warning message")
self.logging.error("error message")

The attribute debugLevel can be used to set the behaviour of the logging messages. If the argument verbose is True, the debugLebel is set to INFO. If verbose if False, the debugLevel is set to WARNING. However, you can use the debugLevel attribute to change it to one of DEBUG, INFO, WARNING, ERROR, CRITICAL. debugLevel=WARNING means that only WARNING, ERROR and CRITICAL messages are shown.

property TIMEOUT

clear_cache()[source]

content_types = {'bed': 'text/x-bed', 'default': 'application/x-www-form-urlencoded', 'fasta': 'text/x-fasta', 'gff3': 'text/x-gff3', 'gif': 'image/gif', 'jpeg': 'image/jpg', 'jpg': 'image/jpg', 'json': 'application/json', 'jsonp': 'text/javascript', 'nh': 'text/x-nh', 'phylip': 'text/x-phyloxml+xml', 'phyloxml': 'text/x-phyloxml+xml', 'png': 'image/png', 'seqxml': 'text/x-seqxml+xml', 'svg': 'image/svg', 'svg+xml': 'image/svg+xml', 'text': 'text/plain', 'txt': 'text/plain', 'xml': 'application/xml', 'yaml': 'text/x-yaml'}

debug_message()[source]

delete_cache()[source]

delete_one(query, frmt='json', **kargs)[source]

getUserAgent()[source]

get_async(keys, frmt='json', params={}, **kargs)[source]

get_headers(content='default')[source]

Parameters:: content (str) – set to default that is application/x-www-form-urlencoded so that it has the same behaviour as urllib2 (Sept 2014)

get_one(query=None, frmt='json', params={}, **kargs)[source]: if query starts with http:// do not use self.url

get_sync(keys, frmt='json', **kargs)[source]

http_delete(query, params=None, frmt='xml', headers=None, **kargs)[source]

http_get(query, frmt='json', params={}, **kargs)[source]

query is the suffix that will be appended to the main url attribute.
query is either a string or a list of strings.
if list is larger than ASYNC_THRESHOLD, use asynchronous call.

http_post(query, params=None, data=None, frmt='xml', headers=None, files=None, content=None, **kargs)[source]

post_one(query=None, frmt='json', **kargs)[source]

property session

class Service(name, url=None, verbose=True, requests_per_sec=10, url_defined_later=False)[source]

Base class for REST service classes

See also

REST

Constructor

Parameters:

name (str) – a name for this service
url (str) – its URL
verbose (bool) – prints informative messages if True (default is True)
requests_per_sec – maximum number of requests per seconds are restricted to 3. You can change that value. If you reach the limit, an error is raise. The reason for this limitation is that some services (e.g.., NCBI) may black list you IP. If you need or can do more (e.g., ChEMBL does not seem to have restrictions), change the value. You can also have several instance but again, if you send too many requests at the same, your future requests may be retricted. Currently implemented for REST only

All instances have an attribute called logging that is an instanceof the logging module. It can be used to print information, warning, error messages:

self.logging.info("informative message")
self.logging.warning("warning message")
self.logging.error("error message")

The attribute debugLevel can be used to set the behaviour of the logging messages. If the argument verbose is True, the debugLebel is set to INFO. If verbose if False, the debugLevel is set to WARNING. However, you can use the debugLevel attribute to change it to one of DEBUG, INFO, WARNING, ERROR, CRITICAL. debugLevel=WARNING means that only WARNING, ERROR and CRITICAL messages are shown.

property CACHING

on_web(url)[source]: Open a URL into a browser

pubmed(Id)[source]

Open a pubmed Id into a browser tab

Parameters:: Id – a valid pubmed Id in string or integer format.

The URL is a concatenation of the pubmed URL http://www.ncbi.nlm.nih.gov/pubmed/ and the provided Id.

response_codes = {200: 'OK', 201: 'Created', 400: 'Bad Request. There is a problem with your input', 404: 'Not found. The resource you requests does not exist', 405: 'Method not allowed', 406: 'Not Acceptable. Usually headers issue', 410: 'Gone. The resource you requested was removed.', 415: 'Unsupported Media Type', 500: 'Internal server error. Most likely a temporary problem', 503: 'Service not available. The server is being updated, try again later'}: some useful response codes

save_str_to_image(data, filename)[source]: Save string object into a file converting into binary

property url: URL of this service

5.2. Naming convention

To add a web services in BioServices, decide on a name for the python module. By convention we have the module name in lower case. Internally, class uses standard Python convention (Upper case for first letter).

The module name (e.g. uniprot) should be use to name the module (uniprot.py).

It will also be used to add a test or the continuous integration

5.3. Creating a service class (REST case)

You can test directly a SOAP/WSDL or REST service in a few lines. For instance, to access to the biomart REST service, type:

>>> s = REST("BioMart" ,"http://www.biomart.org/biomart/martservice")

The first parameter is compulsary but can be any word. You can retrieve the base URL by typing:

>>> s.url
'http://www.biomart.org/biomart/martservice'

and then send a request to retrieve registry information for instance (see www.biomart.org.martservice.html for valid request:

>>> s.http_get("?type=registry")

The request method available from RESTService class concatenates the url and the parameter provided so it request the “http://www.biomart.org.biomart/martservice” URL.

As a developer, you should ease the life of the user by wrapping up the previous commands. An example of a BioMart class with a unique method dedicated to the registry would look like:

>>> class BioMart(REST):
...    def __init__(self):
...        url = "http://www.biomart.org/biomart/martservice"
...        super(BioMart, self).__init__("BioMart", url=url)
...    def registry(self):
...        ret = self.request("?type=registry")
...        return ret

and you would use it as follows:

>>> s = BioMart()
>>> s.registry()

5.4. How to include tests ?

We use pytest. There are many web services included in BioServices. Consequently there are many tests. It is common to have failed tests on Travis and the continuous integration.

Some tests are known to be long or failing from time to time (e.g. service is down).

When a test is known to fail sometimes, we can add this decorator:

@pytest.mark.flaky(max_runs=3, min_passes=1)

On travis we allows 8 failures.

For long tests, we allows 60s at most. You can mark a tests if you knw it will fail on travis (e.g. too long):

pytest.mark.xfail

Finally, we skip some tests for some conditions:

skiptravis = pytest.mark.skipif( "TRAVIS_PYTHON_VERSION" in os.environ,
  reason="On travis")
@skiptravis
def test():
    ...

5.5. Continuous integration

add a test in ./test/webservices/test_**yourmodule**.py
add a continous integration file named after yourmodule.yml. See example in .github/workflows/template.txt and replace __name__ by your module name