env.matrices()

Description

Provides the frequencies of each amino acid within the environment

Usage

env.matrices(env)

Arguments

env a character string vector containing the environments.

Value

Returns a list of two dataframes. The first, shown the environment in matricial form. The second provides the frequencies of each amino acid within the environments.

Reference

Aledo et al. Sci Rep. 2015; 5: 16955

Details

Amino acids in the vicinity of a PTM site are critical in promoting or hindering the incoming substituent on specific amino acid. Thus, knowledge about amino acids surronding modified sites and the correlation between them is very valuable.

The package ptm offers four functions that will assist you in the process of gaining such a knowledge:

env.extract
env.matrices (the current document)
env.Ztest
env.plot

A more elaborated vignette showing how these four functions can be coordinately used to gain knowledge can be found here.

For the sake of concretion, let’s suppose we are interested in investigating whether the environment of serines that are susceptible to being phosphorylated (phosphosites) can be statistically discriminated against those other enviroments belonging to serines that are not phosphorylatable. To address this issue, we need a collection of positive environments (sequence of amino acids arround a phosphorylatable serine) and its control counterpart (sequence of amino acids arround a non phosphorylatable serine). These sets can be obtained with the help of env.extract(). Suppose we have already extracted the sequence environments of 40 phosphoserines, as well as, the same number of environments corresponding to non-phosphorylatable serines (to be used as control). This small number of environments is, of course, extremely small to reach any meaningful conclusion, but enough to illustrate the use of env.matrices()

positive <- c("ERNLLsVAYKN", "SWRIIsSIEQK", "LNEPLsNEDRN", "LTLWTsDQQDD", "WRVLSsIEQKS",
              "ESELRsICTTV", "ASQAEsKVFYL", "RKILLsEWKSQ", "GGSSCsQTPSR", "QVLLEsGEKST",
              "ARAVYsDADIF", "NRQLPsDGKKM", "SPGYRsVRERT", "KDRTTsEAQTE", "KTEAEsYEGLL",
              "CGRTGsGKSSL", "HVPAPsPQGPG", "IQESEsHSKNG", "LHGKKsGKPPL", "SISAPsSDKPL",
              "NTVANsPQTLL", "PYAHLsKKEKK", "GKQQVsPIRNL", "VCEKQtITKWP", "ASQAGsRKESR",
              "FIRGVsGGERK", "LTPGGsMGLQV", "PCPRYsNPADF", "GITGSsQDTYV", "KLKGKsPGIIF",
              "GQQLAsMLRWT", "YKVLSsLGYHV", "FISGLsDQLIP", "LFRSRsLREFE", "PGIDLsQVYEL",
              "SPRTLsPTPSA", "IRRSSsDFFYS", "PASSTsGSPSR", "TPTSRsPQHYS", "MKARSsSYADP")

control <- c("PEKACsLAKTA", "AYKAAsDIAMT", "LALNFsVFYYE", "ATVVEsSEKAY", "TLSEDsYKDST",
             "AWRVIsSIEQK", "PEKACsLAKTA", "VKKENsVETQA", "MSGGSsCSQTP", "SGHQPsQSRAI",
             "FALVLsALILA", "RTFSEsSVWSQ", "CGSVGsGKTSL", "EDPQQsNPCPE", "STLEYsNERLK",
             "GTMDPsQVPEH", "PIVTPsGEVVV", "AVIQEsESHSK", "LYSNLsKPFLD", "TSTRGsVQMLT",
             "FDEPSsYLDVK", "PTQKFsGGWRM", "HIINLsLTFHG", "SRLESsGKNKS", "EKEILsNINGI",
             "TMIFSsVCYWT", "LVKTLsRLAKG", "QAAQHsPYVAL", "YSGVGsSDGNS", "VPVAPsSSSGG",
             "SSSSGsAAAAL", "SEGEAsEEGLY", "PADQFsDGREP", "GPAEEsRVRRH", "CSSEKsKVTSS",
             "SYGDVsGGVRD", "GIRCDsCEKYI", "SVPASsTSGSP", "RSGPEsGRSSP", "TTAGNsSQVSD")

Now, for each of these sets (positive and control), we create two matrices. One, shows the environments being analyzed in matricial form:

# Positive amino acid matrix
p1 <- env.matrices(positive)[[1]]
kable(p1)

-5	-4	-3	-2	-1	0	1	2	3	4	5
E	R	N	L	L	s	V	A	Y	K	N
S	W	R	I	I	s	S	I	E	Q	K
L	N	E	P	L	s	N	E	D	R	N
L	T	L	W	T	s	D	Q	Q	D	D
W	R	V	L	S	s	I	E	Q	K	S
E	S	E	L	R	s	I	C	T	T	V
A	S	Q	A	E	s	K	V	F	Y	L
R	K	I	L	L	s	E	W	K	S	Q
G	G	S	S	C	s	Q	T	P	S	R
Q	V	L	L	E	s	G	E	K	S	T
A	R	A	V	Y	s	D	A	D	I	F
N	R	Q	L	P	s	D	G	K	K	M
S	P	G	Y	R	s	V	R	E	R	T
K	D	R	T	T	s	E	A	Q	T	E
K	T	E	A	E	s	Y	E	G	L	L
C	G	R	T	G	s	G	K	S	S	L
H	V	P	A	P	s	P	Q	G	P	G
I	Q	E	S	E	s	H	S	K	N	G
L	H	G	K	K	s	G	K	P	P	L
S	I	S	A	P	s	S	D	K	P	L
N	T	V	A	N	s	P	Q	T	L	L
P	Y	A	H	L	s	K	K	E	K	K
G	K	Q	Q	V	s	P	I	R	N	L
V	C	E	K	Q	t	I	T	K	W	P
A	S	Q	A	G	s	R	K	E	S	R
F	I	R	G	V	s	G	G	E	R	K
L	T	P	G	G	s	M	G	L	Q	V
P	C	P	R	Y	s	N	P	A	D	F
G	I	T	G	S	s	Q	D	T	Y	V
K	L	K	G	K	s	P	G	I	I	F
G	Q	Q	L	A	s	M	L	R	W	T
Y	K	V	L	S	s	L	G	Y	H	V
F	I	S	G	L	s	D	Q	L	I	P
L	F	R	S	R	s	L	R	E	F	E
P	G	I	D	L	s	Q	V	Y	E	L
S	P	R	T	L	s	P	T	P	S	A
I	R	R	S	S	s	D	F	F	Y	S
P	A	S	S	T	s	G	S	P	S	R
T	P	T	S	R	s	P	Q	H	Y	S
M	K	A	R	S	s	S	Y	A	D	P

# Control amino acid matrix
c1 <- env.matrices(control)[[1]]
kable(c1)

-5	-4	-3	-2	-1	0	1	2	3	4	5
P	E	K	A	C	s	L	A	K	T	A
A	Y	K	A	A	s	D	I	A	M	T
L	A	L	N	F	s	V	F	Y	Y	E
A	T	V	V	E	s	S	E	K	A	Y
T	L	S	E	D	s	Y	K	D	S	T
A	W	R	V	I	s	S	I	E	Q	K
P	E	K	A	C	s	L	A	K	T	A
V	K	K	E	N	s	V	E	T	Q	A
M	S	G	G	S	s	C	S	Q	T	P
S	G	H	Q	P	s	Q	S	R	A	I
F	A	L	V	L	s	A	L	I	L	A
R	T	F	S	E	s	S	V	W	S	Q
C	G	S	V	G	s	G	K	T	S	L
E	D	P	Q	Q	s	N	P	C	P	E
S	T	L	E	Y	s	N	E	R	L	K
G	T	M	D	P	s	Q	V	P	E	H
P	I	V	T	P	s	G	E	V	V	V
A	V	I	Q	E	s	E	S	H	S	K
L	Y	S	N	L	s	K	P	F	L	D
T	S	T	R	G	s	V	Q	M	L	T
F	D	E	P	S	s	Y	L	D	V	K
P	T	Q	K	F	s	G	G	W	R	M
H	I	I	N	L	s	L	T	F	H	G
S	R	L	E	S	s	G	K	N	K	S
E	K	E	I	L	s	N	I	N	G	I
T	M	I	F	S	s	V	C	Y	W	T
L	V	K	T	L	s	R	L	A	K	G
Q	A	A	Q	H	s	P	Y	V	A	L
Y	S	G	V	G	s	S	D	G	N	S
V	P	V	A	P	s	S	S	S	G	G
S	S	S	S	G	s	A	A	A	A	L
S	E	G	E	A	s	E	E	G	L	Y
P	A	D	Q	F	s	D	G	R	E	P
G	P	A	E	E	s	R	V	R	R	H
C	S	S	E	K	s	K	V	T	S	S
S	Y	G	D	V	s	G	G	V	R	D
G	I	R	C	D	s	C	E	K	Y	I
S	V	P	A	S	s	T	S	G	S	P
R	S	G	P	E	s	G	R	S	S	P
T	T	A	G	N	s	S	Q	V	S	D

And the other matrix provides the frequencies of each amino acid within the environments:

# Positive frequency matrix
p2 <- env.matrices(positive)[[2]]
kable(p2)

	-5	-4	-3	-2	-1	0	1	2	3	4	5
A	3	1	3	6	1	0	0	3	2	0	1
C	1	2	0	0	1	0	0	1	0	0	0
D	0	1	0	1	0	0	5	2	2	3	1
E	2	0	5	0	4	0	2	4	6	1	2
F	2	1	0	0	0	0	0	1	2	1	3
G	4	3	2	5	3	0	5	5	2	0	2
H	1	1	0	1	0	0	1	0	1	1	0
I	2	4	2	1	1	0	3	2	1	3	0
K	3	4	1	2	2	0	2	4	6	4	3
L	5	1	2	8	7	0	2	1	2	2	8
M	1	0	0	0	0	0	2	0	0	0	1
N	2	1	1	0	1	0	2	0	0	2	2
P	4	3	3	1	3	0	6	1	4	3	3
Q	1	2	5	1	1	0	3	5	3	2	1
R	1	5	7	2	4	0	1	2	2	3	3
S	4	3	4	6	5	40	3	2	1	7	3
T	1	4	2	3	3	0	0	3	3	2	3
V	1	2	3	1	2	0	2	2	0	0	4
W	1	1	0	1	0	0	0	1	0	2	0
Y	1	1	0	1	2	0	1	1	3	4	0
X	0	0	0	0	0	0	0	0	0	0	0

# Control frequency matrix
c2 <- env.matrices(control)[[2]]
kable(c2)

	-5	-4	-3	-2	-1	0	1	2	3	4	5
A	4	4	3	5	2	0	2	3	3	4	4
C	2	0	0	1	2	0	2	1	1	0	0
D	0	2	1	2	2	0	2	1	2	0	3
E	2	3	2	7	5	0	2	6	1	2	2
F	2	0	1	1	3	0	0	1	2	0	0
G	3	2	5	2	4	0	6	3	3	2	3
H	1	0	1	0	1	0	0	0	1	1	2
I	0	3	3	1	1	0	0	3	1	0	3
K	0	2	5	1	1	0	2	3	4	2	4
L	3	1	4	0	5	0	3	3	0	5	3
M	1	1	1	0	0	0	0	0	1	1	1
N	0	0	0	3	2	0	3	0	2	1	0
P	5	2	2	2	4	0	1	2	1	1	4
Q	1	0	1	5	1	0	2	2	1	2	1
R	2	1	2	1	0	0	2	1	4	3	0
S	7	6	5	2	5	40	6	5	2	8	3
T	4	6	1	2	0	0	1	1	3	3	4
V	2	3	3	5	1	0	4	4	4	2	1
W	0	1	0	0	0	0	0	0	2	1	0
Y	1	3	0	0	1	0	2	1	2	2	2
X	0	0	0	0	0	0	0	0	0	0	0

At this point we have two frequency matrices for positive and control environments (p2 and c2) and we are ready to contrast the null hypothesis: the environment of phosphorylatable and non-phosphorylatable serine residues are not different. That can be done with env.Ztest().