Description
Searchs for homologous entries.
Usage
list.hom(target, homology = 'o')
Arguments
target
the KEGG identifier of the protein of interest.
homology
one leter indicating the type of homology. It should be either ‘o’ (orthologs) or ‘p’ (paralogs).
Value
Returns a dataframe with the requested entries.
References
Kanehisa et al (2017) Nucl. Ac. Res. 33:D353-D361.
Pearson WR (2014) Curr. Protoc Bioinformatics 42:3.1.1.3.1.8
See Also
msa(), custom.aln(), parse.hssp(), get.hssp(), shannon(), site.type()
Details
The concept of sequence homology (common evolutionary ancestry) is central to computational analyses of protein and DNA sequences. We infer homology when two sequences or structures share more similarity than would be expected by chance. When excess similarity is observed, the simplest explanation for that excess is that the two sequences did not arise independently, they arose from a common ancestor. Thus, linked to the concept of homology is the process of multiple sequence alignment (MSA), which consists in the alignment of three or more biological sequences. From the output, homology can be inferred and the evolutionary relationships between the sequences studied. Thus, alignment is the most important stage in most evolutionary analyses. In addition, MSA is also an essential tool for protein structure and function prediction. The package ptm offers several functions that will assist you in the process of sequence analysis:
msa
custom.aln
list.hom (current doc)
parse.hssp
get.hssp
shannon
site.type
The function list.hom() searchs for either orthologs or paralogs sequences to a given sequence of interest. The function rests on the KEGG Sequence Similarity Database, which contains the information about amino acid sequence similarities among all protein-coding genes in the complete genomes, as well as the addendum and virus categories, of the GENES database. Thefore, we have to provide the KEGG ID of the target protein. For instance, suppose we want to get the orthologous sequences of human glyceraldehyde-3-phosphate dehydrogenase, for which we know the UniProt ID: P04406. In this case, we can use another ptm function: id.mapping(), in the following way:
orthologous <- list.hom(target = id.mapping('P04406', from = 'uniprot', to = 'kegg'), hom = 'o')
orthologous
## species entry name ko len identity overlap ## 1 ggo 101154517 glyceraldehyde-3-phosphate dehydrogenase K00134 335 1.000 335 ## 2 pps 100978702 glyceraldehyde-3-phosphate dehydrogenase K00134 335 1.000 335 ## 3 nle 100583761 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.994 335 ## 4 sbq 101039451 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.994 335 ## 5 csab 103218453 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.991 335 ## 6 mcc 574353 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.991 335 ## 7 mcf 102141145 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.991 335 ## 8 rbb 108533183 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.991 335 ## 9 rro 104656959 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.991 335 ## 10 cjc 100404960 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.985 335 ## 11 pon 100172694 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.988 335 ## 12 fca 493876 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.967 332 ## 13 ptg 102956216 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.961 332 ## 14 pale 102888425 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.961 332 ## 15 oaa 100076533 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.955 332 ## 16 uah 113257977 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.961 332 ## 17 aju 106965447 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.958 332 ## 18 ray 107519804 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.958 332 ## 19 ppad 109276130 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.958 332 ## 20 ssc 396823 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.958 332 ## 21 aml 100478741 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.958 332 ## 22 mjv 108387626 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 23 oro 101364556 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.955 332 ## 24 cfa 403755 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.955 332 ## 25 ccan 109682800 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.958 332 ## 26 oor 101280250 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 27 bbub 102404028 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 28 dle 111176669 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 29 tup 102492039 glyceraldehyde-3-phosphate dehydrogenase K00134 426 0.949 334 ## 30 hgl 101703971 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.955 332 ## 31 lve 103073003 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 32 mna 107544089 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 33 mun 110548566 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 34 myb 102245543 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 35 myd 102766226 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 36 chx 100860872 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 37 ecb 100033897 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 38 ngi 103736085 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.955 332 ## 39 oas 443005 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 40 cdk 105098219 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 41 cfr 102508162 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.952 332 ## 42 biu 109558933 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 43 bom 102275759 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 44 bta 281181 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 45 pcad 102995243 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.946 332 ## 46 hai 109373651 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 47 ocu 100009074 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 48 elk 111153592 glyceraldehyde-3-phosphate dehydrogenase- K00134 334 0.949 332 ## 49 eai 106827217 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 50 dro 112321904 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.940 332 ## 51 pcw 110213727 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.949 332 ## 52 rno 108351137 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.943 332 ## 53 shr 100929474 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.940 332 ## 54 mmu 14433 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.943 332 ## 55 tmu 101350847 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.934 332 ## 56 cge 100736557 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.940 332 ## 57 mpah 110317091 LOW QUALITY PROTEIN: glyceraldehyde-3-ph K00134 333 0.937 332 ## 58 mcal 110292590 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.934 334 ## 59 nmel 110398687 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.925 332 ## 60 cjo 107318960 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.928 332 ## 61 lav 104845884 glyceraldehyde-3-phosphate dehydrogenase- K00134 333 0.919 332 ## 62 etl 114058195 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.922 332 ## 63 scan 103827158 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.922 332 ## 64 gga 374193 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.922 332 ## 65 amj 102567195 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.916 332 ## 66 apla 101803965 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.919 332 ## 67 bacu 103015916 glyceraldehyde-3-phosphate dehydrogenase K00134 404 0.938 325 ## 68 fab 101814323 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.919 332 ## 69 lsr 110480762 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.919 332 ## 70 mgp 100303685 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.919 332 ## 71 phi 102099216 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.919 332 ## 72 tgu 100190636 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.919 332 ## 73 pvt 110084649 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.909 331 ## 74 acyg 106047846 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.916 332 ## 75 clv 102089934 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.916 332 ## 76 cpic 101938370 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.907 332 ## 77 pmua 114587728 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.895 332 ## 78 vvp 112933858 glyceraldehyde-3-phosphate dehydrogenase- K00134 333 0.922 332 ## 79 pss 102462433 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.898 332 ## 80 pmur 107285760 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.886 332 ## 81 mdo 751079 glyceraldehyde-3-phosphate dehydrogenase K00134 528 0.929 325 ## 82 egz 104126590 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.917 325 ## 83 lcm 102346560 glyceraldehyde-3-phosphate dehydrogenase K00134 335 0.887 335 ## 84 pbi 103049649 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.883 332 ## 85 acs 100564080 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.889 332 ## 86 asn 102388535 glyceraldehyde-3-phosphate dehydrogenase K00134 400 0.914 325 ## 87 ccae 111943088 glyceraldehyde-3-phosphate dehydrogenase K00134 378 0.917 325 ## 88 ccw 104687218 glyceraldehyde-3-phosphate dehydrogenase K00134 379 0.917 325 ## 89 nni 104009682 glyceraldehyde-3-phosphate dehydrogenase K00134 430 0.917 325 ## 90 pmaj 107211477 glyceraldehyde-3-phosphate dehydrogenase K00134 376 0.917 325 ## 91 aam 106497202 glyceraldehyde-3-phosphate dehydrogenase K00134 357 0.914 325 ## 92 acun 113479368 glyceraldehyde-3-phosphate dehydrogenase K00134 361 0.914 325 ## 93 fch 102057505 glyceraldehyde-3-phosphate dehydrogenase K00134 358 0.914 325 ## 94 fpg 101917630 glyceraldehyde-3-phosphate dehydrogenase K00134 387 0.914 325 ## 95 hcq 109515387 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 96 tsr 106549842 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.883 332 ## 97 alim 106520468 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 98 ola 101172760 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.877 332 ## 99 malb 109966231 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 100 tru 101067242 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.870 332 ## 101 amex 103027606 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.877 332 ## 102 phyp 113533894 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.877 332 ## 103 onl 100704894 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 104 pret 103478010 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 105 sdu 111230083 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.867 332 ## 106 slal 111669851 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.867 332 ## 107 kmr 108235429 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 108 nfu 107379445 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 109 cvg 107082253 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.864 332 ## 110 tng GSTEN00015338G001 unnamed protein product K00134 333 0.864 332 ## 111 pki 111833970 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.880 332 ## 112 sfm 108925443 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.873 332 ## 113 lcf 108887350 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.864 332 ## 114 otw 112222720 glyceraldehyde-3-phosphate dehydrogenase K00134 334 0.862 333 ## 115 sasa 106575942 glyceraldehyde-3-phosphate dehydrogenase K00134 334 0.862 333 ## 116 sanh 107685520 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.870 332 ## 117 ipu 100528929 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.867 332 ## 118 mze 101474674 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.861 332 ## 119 salp 111978076 glyceraldehyde-3-phosphate dehydrogenase K00134 334 0.859 333 ## 120 eee 113575253 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.858 332 ## 121 ptr 451783 glyceraldehyde-3-phosphate dehydrogenase iso K00134 293 1.000 293 ## 122 bpec 110163125 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.858 332 ## 123 csem 103387941 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.861 332 ## 124 xco 114141659 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.858 332 ## 125 xma 102237772 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.858 332 ## 126 dre 317743 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.864 332 ## 127 xla 108706049 glyceraldehyde-3-phosphate dehydrogenase K00134 293 1.000 293 ## 128 srx 107748884 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.861 332 ## 129 sgh 107596669 glyceraldehyde-3-phosphate dehydrogenase- K00134 333 0.861 332 ## 130 lco 104929005 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.858 332 ## 131 els 105019067 glyceraldehyde-3-phosphate dehydrogenase K00134 334 0.850 333 ## 132 pov 109633308 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.855 332 ## 133 aoce 111564297 LOW QUALITY PROTEIN: glyceraldehyde-3-ph K00134 332 0.849 332 ## 134 ncc 104942925 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.837 332 ## 135 umr 103656199 glyceraldehyde-3-phosphate dehydrogenase K00134 293 0.956 293 ## 136 epz 103547001 glyceraldehyde-3-phosphate dehydrogenase K00134 293 0.945 293 ## 137 npr 108784413 glyceraldehyde-3-phosphate dehydrogenase K00134 333 0.810 332 ## 138 gfr 102038663 glyceraldehyde-3-phosphate dehydrogenase K00134 293 0.918 293 ## 139 cmy 102944676 glyceraldehyde-3-phosphate dehydrogenase K00134 293 0.911 293 ## 140 gja 107118411 glyceraldehyde-3-phosphate dehydrogenase K00134 293 0.898 293 ## 141 cin 100186457 glyceraldehyde-3-phosphate dehydrogenase K00134 334 0.786 332 ## 142 pxy 105388172 glyceraldehyde-3-phosphate dehydrogenase K00134 332 0.789 331 ## [ reached 'max' / getOption("max.print") -- omitted 5990 rows ]
As you can check we recover a few thousand of orthologous sequences, although we only show herein a few of them. In contrast, if we ask for paralogus sequences found in human, the number of recovered entries is only 2 (including the target one):
paralogous <- list.hom(target = id.mapping('P04406', from = 'uniprot', to = 'kegg'), hom = 'p')
paralogous
## species entry name ko len identity overlap ## 1 hsa 26330 glyceraldehyde-3-phosphate dehydrogenase, spe K10705 408 0.683 334