env.matrices()

Description

Provides the frequencies of each amino acid within the environment

Usage

env.matrices(env)

Arguments

env a character string vector containing the environments.

Value

Returns a list of two dataframes. The first, shown the environment in matricial form. The second provides the frequencies of each amino acid within the environments.

Reference

Aledo et al. Sci Rep. 2015; 5: 16955

See Also

env.extract(), env.Ztest(), env.plot()

Details

Amino acids in the vicinity of a PTM site are critical in promoting or hindering the incoming substituent on specific amino acid. Thus, knowledge about amino acids surronding modified sites and the correlation between them is very valuable.

The package ptm offers four functions that will assist you in the process of gaining such a knowledge:

A more elaborated vignette showing how these four functions can be coordinately used to gain knowledge can be found here.

For the sake of concretion, let’s suppose we are interested in investigating whether the environment of serines that are susceptible to being phosphorylated (phosphosites) can be statistically discriminated against those other enviroments belonging to serines that are not phosphorylatable. To address this issue, we need a collection of positive environments (sequence of amino acids arround a phosphorylatable serine) and its control counterpart (sequence of amino acids arround a non phosphorylatable serine). These sets can be obtained with the help of env.extract(). Suppose we have already extracted the sequence environments of 40 phosphoserines, as well as, the same number of environments corresponding to non-phosphorylatable serines (to be used as control). This small number of environments is, of course, extremely small to reach any meaningful conclusion, but enough to illustrate the use of env.matrices()

positive <- c("ERNLLsVAYKN", "SWRIIsSIEQK", "LNEPLsNEDRN", "LTLWTsDQQDD", "WRVLSsIEQKS",
              "ESELRsICTTV", "ASQAEsKVFYL", "RKILLsEWKSQ", "GGSSCsQTPSR", "QVLLEsGEKST",
              "ARAVYsDADIF", "NRQLPsDGKKM", "SPGYRsVRERT", "KDRTTsEAQTE", "KTEAEsYEGLL",
              "CGRTGsGKSSL", "HVPAPsPQGPG", "IQESEsHSKNG", "LHGKKsGKPPL", "SISAPsSDKPL",
              "NTVANsPQTLL", "PYAHLsKKEKK", "GKQQVsPIRNL", "VCEKQtITKWP", "ASQAGsRKESR",
              "FIRGVsGGERK", "LTPGGsMGLQV", "PCPRYsNPADF", "GITGSsQDTYV", "KLKGKsPGIIF",
              "GQQLAsMLRWT", "YKVLSsLGYHV", "FISGLsDQLIP", "LFRSRsLREFE", "PGIDLsQVYEL",
              "SPRTLsPTPSA", "IRRSSsDFFYS", "PASSTsGSPSR", "TPTSRsPQHYS", "MKARSsSYADP")

control <- c("PEKACsLAKTA", "AYKAAsDIAMT", "LALNFsVFYYE", "ATVVEsSEKAY", "TLSEDsYKDST",
             "AWRVIsSIEQK", "PEKACsLAKTA", "VKKENsVETQA", "MSGGSsCSQTP", "SGHQPsQSRAI",
             "FALVLsALILA", "RTFSEsSVWSQ", "CGSVGsGKTSL", "EDPQQsNPCPE", "STLEYsNERLK",
             "GTMDPsQVPEH", "PIVTPsGEVVV", "AVIQEsESHSK", "LYSNLsKPFLD", "TSTRGsVQMLT",
             "FDEPSsYLDVK", "PTQKFsGGWRM", "HIINLsLTFHG", "SRLESsGKNKS", "EKEILsNINGI",
             "TMIFSsVCYWT", "LVKTLsRLAKG", "QAAQHsPYVAL", "YSGVGsSDGNS", "VPVAPsSSSGG",
             "SSSSGsAAAAL", "SEGEAsEEGLY", "PADQFsDGREP", "GPAEEsRVRRH", "CSSEKsKVTSS",
             "SYGDVsGGVRD", "GIRCDsCEKYI", "SVPASsTSGSP", "RSGPEsGRSSP", "TTAGNsSQVSD")

Now, for each of these sets (positive and control), we create two matrices. One, shows the environments being analyzed in matricial form:

# Positive amino acid matrix
p1 <- env.matrices(positive)[[1]]
kable(p1)
-5-4-3-2-1012345
ERNLLsVAYKN
SWRIIsSIEQK
LNEPLsNEDRN
LTLWTsDQQDD
WRVLSsIEQKS
ESELRsICTTV
ASQAEsKVFYL
RKILLsEWKSQ
GGSSCsQTPSR
QVLLEsGEKST
ARAVYsDADIF
NRQLPsDGKKM
SPGYRsVRERT
KDRTTsEAQTE
KTEAEsYEGLL
CGRTGsGKSSL
HVPAPsPQGPG
IQESEsHSKNG
LHGKKsGKPPL
SISAPsSDKPL
NTVANsPQTLL
PYAHLsKKEKK
GKQQVsPIRNL
VCEKQtITKWP
ASQAGsRKESR
FIRGVsGGERK
LTPGGsMGLQV
PCPRYsNPADF
GITGSsQDTYV
KLKGKsPGIIF
GQQLAsMLRWT
YKVLSsLGYHV
FISGLsDQLIP
LFRSRsLREFE
PGIDLsQVYEL
SPRTLsPTPSA
IRRSSsDFFYS
PASSTsGSPSR
TPTSRsPQHYS
MKARSsSYADP
# Control amino acid matrix
c1 <- env.matrices(control)[[1]]
kable(c1)
-5-4-3-2-1012345
PEKACsLAKTA
AYKAAsDIAMT
LALNFsVFYYE
ATVVEsSEKAY
TLSEDsYKDST
AWRVIsSIEQK
PEKACsLAKTA
VKKENsVETQA
MSGGSsCSQTP
SGHQPsQSRAI
FALVLsALILA
RTFSEsSVWSQ
CGSVGsGKTSL
EDPQQsNPCPE
STLEYsNERLK
GTMDPsQVPEH
PIVTPsGEVVV
AVIQEsESHSK
LYSNLsKPFLD
TSTRGsVQMLT
FDEPSsYLDVK
PTQKFsGGWRM
HIINLsLTFHG
SRLESsGKNKS
EKEILsNINGI
TMIFSsVCYWT
LVKTLsRLAKG
QAAQHsPYVAL
YSGVGsSDGNS
VPVAPsSSSGG
SSSSGsAAAAL
SEGEAsEEGLY
PADQFsDGREP
GPAEEsRVRRH
CSSEKsKVTSS
SYGDVsGGVRD
GIRCDsCEKYI
SVPASsTSGSP
RSGPEsGRSSP
TTAGNsSQVSD

And the other matrix provides the frequencies of each amino acid within the environments:

# Positive frequency matrix
p2 <- env.matrices(positive)[[2]]
kable(p2)
-5-4-3-2-1012345
A31361003201
C12001001000
D01010052231
E20504024612
F21000001213
G43253055202
H11010010110
I24211032130
K34122024643
L51287021228
M10000020001
N21101020022
P43313061433
Q12511035321
R15724012233
S434654032173
T14233003323
V12312022004
W11010001020
Y11012011340
X00000000000
# Control frequency matrix
c2 <- env.matrices(control)[[2]]
kable(c2)
-5-4-3-2-1012345
A44352023344
C20012021100
D02122021203
E23275026122
F20113001200
G32524063323
H10101000112
I03311003103
K02511023424
L31405033053
M11100000111
N00032030210
P52224012114
Q10151022121
R21210021430
S765254065283
T46120011334
V23351044421
W01000000210
Y13001021222
X00000000000

At this point we have two frequency matrices for positive and control environments (p2 and c2) and we are ready to contrast the null hypothesis: the environment of phosphorylatable and non-phosphorylatable serine residues are not different. That can be done with env.Ztest().