Forgotten password?
New user?


What is Gentrepid?

Gentrepid is a candidate disease gene tool aimed at researchers looking to narrow down their list of candidate disease genes that are located in their region of interest on the genome that has been found to be associated with a particular phenotype or disease.

What makes Gentrepid different to other candidate disease gene prediction tools?

Gentrepid utilizes methodology from the fields of structural bioinformatics and systems biology. Two algorithms are applied: Common Module Profiling and Common Pathway Scanning. CMP is completely novel and is based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. CPS assumes that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway.

We have shown that the use of independent biological data to make complementary predictions ameliorates the problem of incomplete data coverage. Gentrepid is a powerful tool in candidate disease gene prediction and will significantly reduce the time and cost of experimental studies.

What is CMP?

CMP uses a domain-based approach to identify genes with a potential functional similarity to known disease genes and is based on the hypothesis that genes of similar function will lead to the same phenotype (1). Gentrepid contains precalculated Pfam-domain (2) annotation for all genes. CMP compares the domain content of each protein within a disease interval to identify putative disease genes. Each protein observed to have disease-like domains is assigned a score based on the sequence similarity between the domains.

What is CPS?

CPS is based on the assumption that common phenotypes are associated with proteins that participate in the same complex or pathway (3). CPS applies protein-protein interaction data from the I2D database (4) and pathway data from KEGG (5) and BioCarta (6) to identify relationships between known disease genes and genes in the disease interval.

What external databases are used by Gentrepid?
NCBI reference build: Build 36.3
Pfam version24.0
I2D version1.8
dbSNP build131

What are the current database statistics?
Number of genes: 28933 Download here
Number of genes with pathways: 4562 Download here
Number of pathways:513 Download here
... from KEGG199 Download here
... from BioCarta314 Download here
Number of genes with domains: 16052 Download here
Number of domains: 2994
Number of proteins: 30994 Download here
Number of protein-protein interactions: 51553 Download here


  1. Jimenez-Sanchez G, Childs B, and Valle D (2001) Human disease genes. Nature, 409, 853-855
  2. Bateman A et al. (2004) The Pfam protein families database.Nucleic Acids Res, 32, D138-D141
  3. Badano JL and Katsanis N (2002) Beyond Mendel: an evolving view of human genetic disease transmission. Nature Rev Genet, 3, 779-789
  4. Brown KR and Jurisica I (2007) Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biology, 8, R95
  5. Kanehisa M, Goto S, Kawashima S, Okuno Y and Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res, 32, D277-D280
  6. BioCarta