parse.dssp()

Description

Parses a DSSP file to return a dataframe

Usage

`parse.dssp(file, keepfiles = FALSE)`

Arguments

`file` input dssp file.

`keepfiles` logical, if TRUE the dataframe will be saved in the working directory and we will keep the dssp file.

Value

Returns a dataframe providing data for:
‘acc’: accessibility,
‘ss’: secondary structure element,
‘phi’: phi angle,
‘psi’: psi angle.

Details

The ptm package contains a number of ancillary functions that deal with Protein Data Bank (PDB) files. These functions may be useful when structural 3D data need to be analyzed. The mentioned functions are:

The DSSP) (Define Secondary Structure of Proteins) algorithm allows to assign secondary structure to the amino acids of a protein using the atomic coordinates of the protein (a PDB file).

DSSP, based on the identification of intra-backbone hydrogen bonds of the protein can identify eight types of secondary structure that can be grouped in three main categories:

• Helices
G = 3-turn helix $3_{10}$ helix. Min length 3 residues.
H = 4-turn helix α helix. Minimum length 4 residues.
I = 5-turn helix π helix. Minimum length 5 residues.

• Strands
E = extended strand in parallel and/or anti-parallel β-sheet conformation. Min length 2 residues.
B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)

• Loops
T = hydrogen bonded turn (3, 4 or 5 turn)
S = bend (the only non-hydrogen-bond based assignment).
C = coil (residues which are not in any of the above conformations).

Given a PDB file (or its 4 letter ID) there are three different ways to get the corresponding dssp file. The one used by the function download.dssp() consist in downloading a pre-computed file making use of the REST API provided by the Centre for Molecular and Biomolecular Informatics.

The database selected by default is ‘pdb_redo’, which corresponds to fully optimised structure models. If the choosen pdb structure has not a precomputed file, the function will resort to the database ‘PDB’ (see Facilities that make the PDB data collection more powerful for a recent review).

For instance, let’s obtain the dssp file for the Dynein light chain 2.

```download.dssp(id = '2xqq')
```

Once the file has been downloaded we can parse it using the function parse.dssp(), which returns a dataframe

```Dynein <- parse.dssp('./2xqq.dssp')
```

By default, the parsed dssp file is delated once it has been used. If you want to keep it and save the obtained dataframe, then you should pass an additional argument: keepfiles = TRUE.

To download these precomputed DSSP files we use the command ‘rsync’. If your OS experiences problems dealing with that command, you may consider the alternative of using the function compute.dssp(). In addition, the latter, also acepts a PDB file that you may have generated and therefore is not present in the PDB database (in this case you must pass the path to browse until the file as an argument). In any case, the function send the file to the XSSP server which will carry out the computation and returns a dssp file.

```compute.dssp(pdb = '2xqq')
```
```## [1] "Work done!. See file at: ./2xqq.dssp"
```

We can now parse the obtained dssp file:

```Dynein <- parse.dssp('./2xqq.dssp')
```

A drawback of this function is that it depends on the XSSP server and in ocassions it can take a long time to process the request. Thus, a third alternative option to convert a PDB file into a DSSP file is to carry out the computation on your in-house computer, using the function mkdssp(). To do that, previously you will have had to install the mkdssp program as an executable. Some help can be found here.

```Dynein <- mkdssp(pdb = '2xqq',
method = 'ptm',
exefile = '/anaconda3/bin')
kable(tail(Dynein))
```

A warning words

PDB entries are notirously hard to parse. It is no unusual that the entry contains UNK residues, Cα-only residues, or residues with otherwise missin atoms, just to mention a few issues. For that reason we provide different alternative approaches to compute the desired dssp file, in the hope that they complement each other and together allow obtaining the desired calculations for a large majority of PDB files.