mkdssp()

Description

Computes the DSSP file using an in-house version of the DSSP software

Usage

mkdssp(pdb, method = 'ptm', exefile)

Arguments

pdb is either a 4-character identifier of the PDB structure, or the path to a pdb file.

method a character string specifying the desired method to get the dssp dataframe; it should be one of ‘ptm’ or ‘bio3d’.

exefile file path to the DSSP executable on your system (i.e. how is DSSP invoked).

Value

Returns either a dataframe containing the information extracted from the dssp file (method ptm), or a list with that information (method bio3d).

References

Touw et al. (2015) Nucl. Ac. Res. 43(Database issue): D364-D368.
Kabsch & Sander (1983) Biopolimers 22:2577-2637.

Details

The ptm package contains a number of ancillary functions that deal with Protein Data Bank (PDB) files. These functions may be useful when structural 3D data need to be analyzed. The mentioned functions are:

The DSSP (Define Secondary Structure of Proteins) algorithm allows to assign secondary structure to the amino acids of a protein using the atomic coordinates of the protein (a PDB file).

DSSP, based on the identification of intra-backbone hydrogen bonds of the protein can identify eight types of secondary structure that can be grouped in three main categories:

Helices
G = 3-turn helix $3_{10}$ helix. Min length 3 residues.
H = 4-turn helix α helix. Minimum length 4 residues.
I = 5-turn helix π helix. Minimum length 5 residues.
Strands
E = extended strand in parallel and/or anti-parallel β-sheet conformation. Min length 2 residues.
B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)
Loops
T = hydrogen bonded turn (3, 4 or 5 turn)
S = bend (the only non-hydrogen-bond based assignment).
C = coil (residues which are not in any of the above conformations).

Given a PDB file (or its 4 letter ID) there are two different ways to get the corresponding dssp file. The one used by the function compute.dssp() consist in making use of the the REST API provided by the Centre for Molecular and Biomolecular Informatics, (see Facilities that make the PDB data collection more powerful for a recent review).

For instance, let’s obtain the dssp file for the Dynein light chain 2.

compute.dssp(pdb = '2xqq')

## [1] "Work done!. See file at: ./2xqq.dssp"

We can now parse the obtained dssp file:

Dynein <- parse.dssp('./2xqq.dssp')
kable(head(Dynein))

resnum	respdb	chain	aa	ss	sasa	phi	psi
1	3	A	D	C	130	360.0	108.6
2	4	A	R	C	106	-135.5	24.4
3	5	A	K	C	71	-59.1	133.9
4	6	A	A	E	8	-95.2	133.8
5	7	A	V	E	45	-133.9	116.1
6	8	A	I	E	7	-82.0	121.5

In addition, compute.dssp() also accepts a PDB file that you may have generated and therefore is not present in the PDB database (in this case you must pass the path to browse until the file as an argument). In any case, the function send the file to the XSSP server which will carry out the computation and returns a dssp file.

A drawback of this function is that it depends on the XSSP server and in ocasions it can take a long time to process the request. Thus, a third alternative option to convert a PDB file into a DSSP file is to carry out the computation on your in-house computer, using the function mkdssp(). To do that, previously you will have had to install the mkdssp program as an executable. Some help can be foun here.

Dynein <- mkdssp(pdb = '2xqq', method = 'ptm')
kable(tail(Dynein))

	resnum	respdb	chain	aa	ss	sasa	phi	psi
377	366	2	H	R	E	84	-143.9	149.2
378	367	3	H	G	E	47	-109.0	146.8
379	368	4	H	T	E	13	-131.4	155.7
380	369	5	H	Q	E	122	-148.4	125.8
381	370	6	H	T	C	5	-68.3	150.2
382	371	7	H	E	C	201	-81.4	360.0

A warning words

PDB entries are notirously hard to parse. It is no unusual that the entry contains UNK residues, Cα-only residues, or residues with otherwise missin atoms, just to mention a few issues. For that reason we provide different alternative approaches to compute the desired dssp file, in the hope that they complement each other and together allow obtaining the desired calculations for a large majority of PDB files.