metosite

# How to access the API of MetOSite using the R language

R is an excellent language for statistics and data analyses. Therefore, R is a fine choice to analyse the data supplied by MetOSite. In this tutorial we will show how to pull data into R from our database (DB) using its API.

The MetOSite‘s API offers a number of end-point functions. However, the procedure to invoke them and parse the data they will return is similar in all of them:

• Make a “GET” request to pull raw data into our R environment.
• Parse raw data through JavaScript Object Notification (JSON) into a usable format (most of the time a dataframe)

## Getting a summary of MetOSite data

To illustrate this two-steps procedure with a straightforward example, let us suppose we want to obtain a dataframe containing a summary (statistics) of the data found into MetOSite: what species are represented and with how many proteins and with how many MetO sites contributes each of these species.

But before we begin our task, we will need to download and install two R packages: httr and jsonlite, which will assist us in our purposes.

``````# install.packages("httr")
# install.packages("jsonlite")
require(httr)
require(jsonlite)
``````

Afterwards, we are ready to make our first “GET” request.

``````call <- 'https://metosite.uma.es/api/summaries/species' # This is the API URL
response <- httr::GET(call)
``````

The response to the API call, which has been placed in the object we have named as response, is actually a list containing many different items with most of it being administrative information from the API in which we are not interested. To get the data we really want, we are going to use another httr function that will perform the process named ‘deserialization’, which will offer us the data we are searching for in a json format.

``````if (response\$status_code == 200){
json_species <- httr::content(response, 'text')
} else {
print(response\$status_code)
}
``````

This converts the raw data from our API call into JSON format. However, if we want to analyse these data using R, it would be convenient to parse the JSON using the jsonlite package we have previouly installed.

``````df_species <- jsonlite::fromJSON(json_species, flatten = TRUE)
``````
```## Error in jsonlite::fromJSON(json_species, flatten = TRUE): objeto 'json_species' no encontrado
```
``````head(df_species)
``````
```## Error in head(df_species): objeto 'df_species' no encontrado
```

So, df_species is the dataframe that were looking for from the beginning.

Alternatively, we can get a sammary putting the focus on the oxidants instead of the species. In this case the end-point function from the API that we have to call is not species but oxidant:

``````call <- 'https://metosite.uma.es/api/summaries/oxidant' # This is the API URL
response <- httr::GET(call)

if (response\$status_code == 200){
json_oxidants <- httr::content(response, 'text')
df_oxidants <- jsonlite::fromJSON(json_oxidants, flatten = TRUE)
} else {
print(response\$status_code)
}
``````

## Getting all the sites involved in PPI effects

Now that we have gained confidence in our abilities to access MetOSite through its API, we can face a task a bit more elaborated.

Basically what we want to do is to filter the DB to keep only those entries related to changes in a biological property such as the ability to stabilize or destabilize protein-protein interactions. In order to understand how the API filters the DB using the end-point search, we have to introduce previously some basic ideas related to Groups and Functional Categories.

Each MetO site is assigned to one of three possible Groups. Group 1 is composed of all those MetO sites coming from high-throughput studies for which nothing is known about the effect of their sulfoxidation just because it has not been addressed. On the contrary, the effect of the oxidation of residues belonging to Group 2 has been assesed, but no effect could be found. Finally, Group 3 encompasses all the methionine sites whose sulfoxidation has been reported to have an effect on at least one of the following six biological properties:

• Gain of activity
• Loss of activity
• Gain of protein-protein interaction
• Loss of protein-protein interaction
• Effect on protein stability
• Effect on subcellular localization

Each of these six properties can be considered as a binary variable. Thus, a value of 1 for any of these variable means that experimental evidence supporting such an effect has been published. On the contrary, a value of 0 only means that we have not found experimental evidence to support such an effect. In this way, attending to these variables, we have $2^6$ Functional Categories (FCs). In other words, the FC of a given MetO site can be enconded by a vector of dimension 6. It should be noted that a site with a vector (0,0,0,0,0,0), meaning that no effect has been described for the oxidation of that site, can belong to Group 1 or to Group 2. That is, actually we will deal with $2^6+1=65$ FCs.

### Making use of the mapping end-point function

Thus, to know what Functional Categories correspond with sites involving any effect (either gain or loss) on protein-protein interaction, we will make use of an ancillary end-point function called mapping. This function takes two arguments. The first one is related to the functional groups. For instance, the string 001 is interpreted as we are only interested in the Group 3. If instead of 001 we pass 101 as the first argument, then mapping will interprets that we want to filter out group 2, and focus on groups 1 and 3. The second argument that should be passed to mapping is a 6-dimensional vector providing information about the effect on the six biological properties. For instance, with the point (0,0,1,1,0,0) we would retrieve those sites for which a gain and a loss of PPI has been reported (probability with different partners), but no other effect on the remaining properties has been described. On the other hand, if we have a site causing a gain of PPI and a loss of PPI but also a gain of activity, then the right argument would be (1,0,1,1,0,0). Please, note that the first and second coordinates point to the gain and loss of activity, respectiviely, and so on (keeping the order of the list given above).

What if we are in those sites with a gain and a loss of PPI but we do not care about the 4 remaining properties (that may be affected or not). In this case, the right argument would be (2,2,1,1,2,2). As the insightful reader would have intuited the integer 2 means: it does not matter whether the property has been described to be affected or not. For instance, (2,0,1,1,0,0) = (0,0,1,1,0,0) $\cup$ (1,0,1,1,0,0).

In the example we are developing herein, we are interested in those MetO sites involved in gain and/or loss of PPI withouth any other consideration. So, these sites are encoded as follows: (2,2,1,2,2,2) $\cup$ (2,2,2,1,2,2). At this point, we are in condition to use knowingly the mapping end-point.

``````groups <- '001'

## ---------------------- Sites gaining PPI ------------------------  ##

categories <- '221222' # Note we use neither parenthesis nor commas.
call <- paste('https://metosite.uma.es/api/sites/mapping/',
groups, '/', categories, sep = "")
gPPI <- httr::GET(call)
gPPI <- httr::content(gPPI, 'text')
gPPI <- jsonlite::fromJSON(gPPI, flatten = TRUE)

## ---------------------- Sites losing PPI ------------------------  ##

categories <- '222122' # Note we use neither parenthesis nor commas.
call <- paste('https://metosite.uma.es/api/sites/mapping/',
groups, '/', categories, sep = "")
lPPI <- httr::GET(call)
lPPI <- httr::content(lPPI, 'text')
lPPI <- jsonlite::fromJSON(lPPI, flatten = TRUE)

## ------------------ Joining both sets --------------------------- ##
fcPPI <- union(gPPI, lPPI)
fcPPI <- fcPPI[order(fcPPI)]
print(fcPPI)
``````

Now, that we know that there are 48 different functional categories that meet the requeriments, we can move to find all the MetO sites present into MetOSite belonging to these categories.

### Making use of the search end-point function

This function takes three arguments: (i) the first is related to the functional categories we want to retrieve, (ii) the second one allows to filter using a taxon criterium the organism(s) we are interested in, and
(iii) the third criterium is related to the oxidant(s) we want to consider.

Because the FCs we are going to pass to the search function need to be separated from each other by the symbol &, we are going to write a function that will make the formatting work for us:

``````format_FC <- function(fc){
formatted.fc <- c()
count <- 0
for (i in fc){
count <- count + 1
if (count < length(fc)){
formatted.fc <- paste(formatted.fc, i, '&', sep = "")
} else {
formatted.fc <- paste(formatted.fc, i, sep = "")
}
}
return(formatted.fc)
}
``````

So, let’s use that function to format the set of 48 FCs we got previously, and then make use of the search end-point function:

``````ffc <- format_FC(fcPPI) # formatted FCs related to PPI
``````
```## Error in format_FC(fcPPI): no se pudo encontrar la función "format_FC"
```
``````organism <- '-1' # meaning we don't care about the organism
oxidant <- '-1'  # meaning we don't care about the oxidant

call <- paste('https://metosite.uma.es/api/sites/search/',
ffc, '/', organism, '/', oxidant, sep = "")
``````
```## Error in paste("https://metosite.uma.es/api/sites/search/", ffc, "/", : objeto 'ffc' no encontrado
```
``````response <-  httr::GET(call)
``````
```## Error in as.character(url): cannot coerce type 'special' to vector of type 'character'
```
``````json.entries <- httr::content(response, 'text')
``````
```## Error in is.response(x): objeto 'response' no encontrado
```
``````ppi.results <- jsonlite::fromJSON(json.entries, flatten = TRUE)
``````
```## Error in jsonlite::fromJSON(json.entries, flatten = TRUE): objeto 'json.entries' no encontrado
```
``````# That's the data frame we wanted.
``````
```## Error in head(ppi.results): objeto 'ppi.results' no encontrado
```

## Final remarks

This tutorial does not pretend to be exhaustive, on the contrary it aims to be a primer from which the user can continue on his own to communicate with the API of MetOSite by means of the R language. We encourage the user to explore the other end-points that will find at (https://metosite.uma.es/api-docs)