msa()

Description

Aligns multiple protein sequences.

Usage

msa(sequences, ids = names(squences), sfile = FALSE, inhouse = FALSE)

Arguments

sequences vector containing the sequences.

ids vector containing the sequences’ ids.

sfile path to the file where the fasta alignment should be saved, if any.

inhouse logical, if TRUE the in-house MUSCLE software is used. It must be installed on your system and in the search path for executables.

Value

Returns a list of four elements. The first one (seq) provides the sequences analyzed, the second element (ids) retuns the identifiers, ther third element (aln) privides the alignment in fasta format and the fourth element (ali) gives the alignment in matricial format.

References

Edgar RC. Nucleic Acids Res. 2004 32:1792-1797.

H. Pagès, P. Aboyoun, R. Gentleman and S. DebRoy (2019). Biostrings: Efficient
manipulation of biological strings. R package version 2.52.0.

Edgar RC. BMC Bioinformatics 5(1):113.

See Also

custom.aln(), list.hom(), parse.hssp(), get.hssp(), shannon(), site.type()

Details

Multiple sequence alignment (MSA) is generally the alignment of three or more biological sequences. From the output, homology can be inferred and the evolutionary relationships between the sequences studied. Thus, alignment is the most important stage in most evolutionary analyses. In addition, MSA is also an essential tool for protein structure and function prediction. The package ptm offers several functions that will assist you in the process of sequence analysis:

msa (the current document)
custom.aln
list.hom
parse.hssp
get.hssp
shannon
site.type

The function msa() carries out MSAs either taking advantage of the functionalities of Biostrings or, alternatively, making use of the program MUSCLE. In the first case, you must have installed the R package Biostrings. To install that package, start R and enter:

# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install("Biostrings")

Alternatively, if you have previously installed MUSCLE in your machine, msa() can call this software, passing the argument ‘inhouse = TRUE’, to carry out the alingment. MUSCLE is a fast multiple sequence alignment program available from the muscle home page. Details to guide you through the installation of MUSCLE can be found here.

Let’s see msa() in action. To this end, we will use as a case study the protein COX3 (subunit 3 from the Cytochrome c Oxidase Complex) that will help to illustrate the relevance of epistatic effects on protein evolution.

Leber’s hereditary optic neuropathy (LHON) is a degeneration of the retinal gangliocytes and their axons, inherited mitochondrially (from the mother to all her children), leading to an acute or subacute loss of central vision. LHON is only transmitted through the mother since it is mainly due to mutations in the mitochondrial genome (not the nuclear one) and only the egg contributes mitochondria to the embryo. The pathogenic A32 to T32 mutation (change from alanine to threonine at position 32) in the COX3 protein has been related to LHON.

We can check, that an alanine residue, indeed, is found at position 32 in the human protein:

aa.at(at = 32, target = 'P00414')
## [1] "A"

Next we will obtain the COX3 sequence for human, bonobo, chimp, gorilla and orangutan (Hominidae family) and carry out the MSA using msa():

sequences <- sapply(c('P00414', 'E0XI88', 'Q9T9V9', 'Q9T9Y6', 'P92696' ), ptm::get.seq)
ids <- c('human', 'bonobo', 'chimpazee', 'gorilla', 'orangutan')
msa(sequences, ids, inhouse = TRUE)
##             1        .         .         .         .         .         60 
## human       MTHQSHAYHMVKPSPWPLTGALSALLMTSGLAMWFHFHSMTLLMLGLLTNTLTMYQWWRD
## bonobo      MAHQSHAYHMVKPSPWPLTGALSALLMTSGLAMWFHFYSTTLLTLGLLTNTLTMYQWWRD
## chimpazee   MTHQSHAYHMVKPSPWPLTGALSALLMTSGLAMWFHFYSTTLLTLGLLTNTLTMYQWWRD
## gorilla     MIHQSHAYHMVKPSPWPLTGALSALLMTSGLAMWFHFHSTTLLMLGLLTNMLTMYQWWRD
## orangutan   MAHQSHAYHMVKPSPWPLTGALSALLTTSGLTMWFHFHSTTLLLTGLLTNALTMYQWWRD
##             * ************************ **** ***** * ***  ***** ********* 
##             1        .         .         .         .         .         60 
## 
##            61        .         .         .         .         .         120 
## human       VTRESTYQGHHTPPVQKGLRYGMILFITSEVFFFAGFFWAFYHSSLAPTPQLGGHWPPTG
## bonobo      VMRESTYQGHHTPPVQKGLRYGMILFITSEVFFFAGFFWAFYHSSLAPTPQLGGHWPPTG
## chimpazee   VMREGTYQGHHTPPVQKGLRYGMILFITSEVFFFAGFFWAFYHSSLAPTPQLGGHWPPTG
## gorilla     VMRESTYQGHHTLPVQKGLRYGMILFITSEVFFFAGFFWAFYHSSLAPTPQLGAHWPPTG
## orangutan   VVRESTYQGHHTLPVQKGLRYGMILFITSEVFFFAGFFWAFYHSSLAPTPQLGGHWPPTG
##             * ** ******* ****************************************^****** 
##            61        .         .         .         .         .         120 
## 
##           121        .         .         .         .         .         180 
## human       ITPLNPLEVPLLNTSVLLASGVSITWAHHSLMENNRNQMIQALLITILLGLYFTLLQASE
## bonobo      ITPLNPLEVPLLNTSVLLASGVSITWAHHSLMENNRNQMIQALLITILLGLYFTLLQASE
## chimpazee   ITPLNPLEVPLLNTSVLLASGVSITWAHHSLMENNRNQMIQALLITILLGLYFTLLQASE
## gorilla     ITPLNPLEVPLLNTSVLLASGVSITWAHHSLMENNRNQMIQALLITILLGLYFTLLQASE
## orangutan   IIPLNPLEVPLLNTSVLLASGVSITWAHHSLMENNRTQMIQALLITILLGIYFTLLQASE
##             * ********************************** *************^********* 
##           121        .         .         .         .         .         180 
## 
##           181        .         .         .         .         .         240 
## human       YFESPFTISDGIYGSTFFVATGFHGLHVIIGSTFLTICFIRQLMFHFTSKHHFGFEAAAW
## bonobo      YFESPFTISDGIYGSTFFVATGFHGLHVIIGSTFLTICLIRQLMFHFTSKHHFGFEAAAW
## chimpazee   YFESPFTISDGIYGSTFFVATGFHGLHVIIGSTFLTICLIRQLMFHFTSKHHFGFQAAAW
## gorilla     YFEAPFTISDGIYGSTFFVATGFHGLHVIIGSTFLTICLIRQLMFHFTSKHHFGFEAAAW
## orangutan   YIEAPFTISDGIYGSTFFMATGFHGLHVIIGSTFLTVCLARQLLFHFTSKHHFGFEAAAW
##             * * **************^*****************^*  ***^*********** **** 
##           181        .         .         .         .         .         240 
## 
##           241        .         .261 
## human       YWHFVDVVWLFLYVSIYWWGS
## bonobo      YWHFVDVVWLFLYVSIYWWGS
## chimpazee   YWHFVDVVWLFLYVSIYWWGS
## gorilla     YWHFVDVVWLFLYVSIYWWGS
## orangutan   YWHFVDVVWLFLYVSIYWWGS
##             ********************* 
##           241        .         .261 
## 
## Call:
##   bio3d::seqaln(aln = sqs, id = ids, exefile = "muscle")
## 
## Class:
##   fasta
## 
## Alignment dimensions:
##   5 sequence rows; 261 position columns (261 non-gap, 0 gap) 
## 
## + attr: id, ali, call, seq

What amino acid has been fixed at position 32 into the orangutan wild-type sequence? Yes, threonine! Thus, while a threonine at this position causes a disease in humans, in the genetic context of orangutans, T32 is fine!