list.hom()

Description

Searchs for homologous entries.

Usage

list.hom(target, homology = 'o')

Arguments

target the KEGG identifier of the protein of interest.

homology one leter indicating the type of homology. It should be either ‘o’ (orthologs) or ‘p’ (paralogs).

Value

Returns a dataframe with the requested entries.

References

Kanehisa et al (2017) Nucl. Ac. Res. 33:D353-D361.
Pearson WR (2014) Curr. Protoc Bioinformatics 42:3.1.1.3.1.8

See Also

msa(), custom.aln(), parse.hssp(), get.hssp(), shannon(), site.type()

Details

The concept of sequence homology (common evolutionary ancestry) is central to computational analyses of protein and DNA sequences. We infer homology when two sequences or structures share more similarity than would be expected by chance. When excess similarity is observed, the simplest explanation for that excess is that the two sequences did not arise independently, they arose from a common ancestor. Thus, linked to the concept of homology is the process of multiple sequence alignment (MSA), which consists in the alignment of three or more biological sequences. From the output, homology can be inferred and the evolutionary relationships between the sequences studied. Thus, alignment is the most important stage in most evolutionary analyses. In addition, MSA is also an essential tool for protein structure and function prediction. The package ptm offers several functions that will assist you in the process of sequence analysis:

msa
custom.aln
list.hom (current doc)
parse.hssp
get.hssp
shannon
site.type

The function list.hom() searchs for either orthologs or paralogs sequences to a given sequence of interest. The function rests on the KEGG Sequence Similarity Database, which contains the information about amino acid sequence similarities among all protein-coding genes in the complete genomes, as well as the addendum and virus categories, of the GENES database. Thefore, we have to provide the KEGG ID of the target protein. For instance, suppose we want to get the orthologous sequences of human glyceraldehyde-3-phosphate dehydrogenase, for which we know the UniProt ID: P04406. In this case, we can use another ptm function: id.mapping(), in the following way:

orthologous <- list.hom(target = id.mapping('P04406', from = 'uniprot', to = 'kegg'), hom = 'o')
orthologous
##     species             entry                                         name     ko len identity overlap
## 1       ggo         101154517     glyceraldehyde-3-phosphate dehydrogenase K00134 335    1.000     335
## 2       pps         100978702     glyceraldehyde-3-phosphate dehydrogenase K00134 335    1.000     335
## 3       nle         100583761     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.994     335
## 4       sbq         101039451     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.994     335
## 5      csab         103218453     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.991     335
## 6       mcc            574353     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.991     335
## 7       mcf         102141145     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.991     335
## 8       rbb         108533183     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.991     335
## 9       rro         104656959     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.991     335
## 10      cjc         100404960     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.985     335
## 11      pon         100172694     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.988     335
## 12      fca            493876     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.967     332
## 13      ptg         102956216     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.961     332
## 14     pale         102888425     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.961     332
## 15      oaa         100076533     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.955     332
## 16      uah         113257977     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.961     332
## 17      aju         106965447     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.958     332
## 18      ray         107519804     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.958     332
## 19     ppad         109276130     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.958     332
## 20      ssc            396823     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.958     332
## 21      aml         100478741     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.958     332
## 22      mjv         108387626     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 23      oro         101364556     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.955     332
## 24      cfa            403755     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.955     332
## 25     ccan         109682800     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.958     332
## 26      oor         101280250     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 27     bbub         102404028     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 28      dle         111176669     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 29      tup         102492039     glyceraldehyde-3-phosphate dehydrogenase K00134 426    0.949     334
## 30      hgl         101703971     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.955     332
## 31      lve         103073003     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 32      mna         107544089     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 33      mun         110548566     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 34      myb         102245543     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 35      myd         102766226     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 36      chx         100860872     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 37      ecb         100033897     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 38      ngi         103736085     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.955     332
## 39      oas            443005     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 40      cdk         105098219     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 41      cfr         102508162     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.952     332
## 42      biu         109558933     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 43      bom         102275759     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 44      bta            281181     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 45     pcad         102995243     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.946     332
## 46      hai         109373651     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 47      ocu         100009074     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 48      elk         111153592    glyceraldehyde-3-phosphate dehydrogenase- K00134 334    0.949     332
## 49      eai         106827217     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 50      dro         112321904     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.940     332
## 51      pcw         110213727     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.949     332
## 52      rno         108351137     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.943     332
## 53      shr         100929474     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.940     332
## 54      mmu             14433     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.943     332
## 55      tmu         101350847     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.934     332
## 56      cge         100736557     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.940     332
## 57     mpah         110317091     LOW QUALITY PROTEIN: glyceraldehyde-3-ph K00134 333    0.937     332
## 58     mcal         110292590     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.934     334
## 59     nmel         110398687     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.925     332
## 60      cjo         107318960     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.928     332
## 61      lav         104845884    glyceraldehyde-3-phosphate dehydrogenase- K00134 333    0.919     332
## 62      etl         114058195     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.922     332
## 63     scan         103827158     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.922     332
## 64      gga            374193     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.922     332
## 65      amj         102567195     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.916     332
## 66     apla         101803965     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.919     332
## 67     bacu         103015916     glyceraldehyde-3-phosphate dehydrogenase K00134 404    0.938     325
## 68      fab         101814323     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.919     332
## 69      lsr         110480762     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.919     332
## 70      mgp         100303685     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.919     332
## 71      phi         102099216     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.919     332
## 72      tgu         100190636     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.919     332
## 73      pvt         110084649     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.909     331
## 74     acyg         106047846     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.916     332
## 75      clv         102089934     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.916     332
## 76     cpic         101938370     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.907     332
## 77     pmua         114587728     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.895     332
## 78      vvp         112933858    glyceraldehyde-3-phosphate dehydrogenase- K00134 333    0.922     332
## 79      pss         102462433     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.898     332
## 80     pmur         107285760     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.886     332
## 81      mdo            751079     glyceraldehyde-3-phosphate dehydrogenase K00134 528    0.929     325
## 82      egz         104126590     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.917     325
## 83      lcm         102346560     glyceraldehyde-3-phosphate dehydrogenase K00134 335    0.887     335
## 84      pbi         103049649     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.883     332
## 85      acs         100564080     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.889     332
## 86      asn         102388535     glyceraldehyde-3-phosphate dehydrogenase K00134 400    0.914     325
## 87     ccae         111943088     glyceraldehyde-3-phosphate dehydrogenase K00134 378    0.917     325
## 88      ccw         104687218     glyceraldehyde-3-phosphate dehydrogenase K00134 379    0.917     325
## 89      nni         104009682     glyceraldehyde-3-phosphate dehydrogenase K00134 430    0.917     325
## 90     pmaj         107211477     glyceraldehyde-3-phosphate dehydrogenase K00134 376    0.917     325
## 91      aam         106497202     glyceraldehyde-3-phosphate dehydrogenase K00134 357    0.914     325
## 92     acun         113479368     glyceraldehyde-3-phosphate dehydrogenase K00134 361    0.914     325
## 93      fch         102057505     glyceraldehyde-3-phosphate dehydrogenase K00134 358    0.914     325
## 94      fpg         101917630     glyceraldehyde-3-phosphate dehydrogenase K00134 387    0.914     325
## 95      hcq         109515387     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 96      tsr         106549842     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.883     332
## 97     alim         106520468     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 98      ola         101172760     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.877     332
## 99     malb         109966231     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 100     tru         101067242     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.870     332
## 101    amex         103027606     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.877     332
## 102    phyp         113533894     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.877     332
## 103     onl         100704894     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 104    pret         103478010     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 105     sdu         111230083     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.867     332
## 106    slal         111669851     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.867     332
## 107     kmr         108235429     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 108     nfu         107379445     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 109     cvg         107082253     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.864     332
## 110     tng GSTEN00015338G001                      unnamed protein product K00134 333    0.864     332
## 111     pki         111833970     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.880     332
## 112     sfm         108925443     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.873     332
## 113     lcf         108887350     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.864     332
## 114     otw         112222720     glyceraldehyde-3-phosphate dehydrogenase K00134 334    0.862     333
## 115    sasa         106575942     glyceraldehyde-3-phosphate dehydrogenase K00134 334    0.862     333
## 116    sanh         107685520     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.870     332
## 117     ipu         100528929     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.867     332
## 118     mze         101474674     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.861     332
## 119    salp         111978076     glyceraldehyde-3-phosphate dehydrogenase K00134 334    0.859     333
## 120     eee         113575253     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.858     332
## 121     ptr            451783 glyceraldehyde-3-phosphate dehydrogenase iso K00134 293    1.000     293
## 122    bpec         110163125     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.858     332
## 123    csem         103387941     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.861     332
## 124     xco         114141659     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.858     332
## 125     xma         102237772     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.858     332
## 126     dre            317743     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.864     332
## 127     xla         108706049     glyceraldehyde-3-phosphate dehydrogenase K00134 293    1.000     293
## 128     srx         107748884     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.861     332
## 129     sgh         107596669    glyceraldehyde-3-phosphate dehydrogenase- K00134 333    0.861     332
## 130     lco         104929005     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.858     332
## 131     els         105019067     glyceraldehyde-3-phosphate dehydrogenase K00134 334    0.850     333
## 132     pov         109633308     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.855     332
## 133    aoce         111564297     LOW QUALITY PROTEIN: glyceraldehyde-3-ph K00134 332    0.849     332
## 134     ncc         104942925     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.837     332
## 135     umr         103656199     glyceraldehyde-3-phosphate dehydrogenase K00134 293    0.956     293
## 136     epz         103547001     glyceraldehyde-3-phosphate dehydrogenase K00134 293    0.945     293
## 137     npr         108784413     glyceraldehyde-3-phosphate dehydrogenase K00134 333    0.810     332
## 138     gfr         102038663     glyceraldehyde-3-phosphate dehydrogenase K00134 293    0.918     293
## 139     cmy         102944676     glyceraldehyde-3-phosphate dehydrogenase K00134 293    0.911     293
## 140     gja         107118411     glyceraldehyde-3-phosphate dehydrogenase K00134 293    0.898     293
## 141     cin         100186457     glyceraldehyde-3-phosphate dehydrogenase K00134 334    0.786     332
## 142     pxy         105388172     glyceraldehyde-3-phosphate dehydrogenase K00134 332    0.789     331
##  [ reached 'max' / getOption("max.print") -- omitted 5990 rows ]

As you can check we recover a few thousand of orthologous sequences, although we only show herein a few of them. In contrast, if we ask for paralogus sequences found in human, the number of recovered entries is only 2 (including the target one):

paralogous <- list.hom(target = id.mapping('P04406', from = 'uniprot', to = 'kegg'), hom = 'p')
paralogous
##   species entry                                          name     ko len identity overlap
## 1     hsa 26330 glyceraldehyde-3-phosphate dehydrogenase, spe K10705 408    0.683     334