Skip to main content
Article thumbnail
Location of Repository

An informatics project and online "Knowledge Centre" supporting modern genotype-to-phenotype research

By Adam J. Webb, Gudmundur A. Thorisson, Anthony J. Brookes and GEN2PHEN Consortium


This is the published article. It is reproduced here with the publisher's permission (OnlineOpen Wiley). This is an open access article and it is also freely accessible from the publisher's website at: ; DOI: 10.1002/humu.21469Explosive growth in the generation of genotype-to-phenotype (G2P) data necessitates a concerted effort to tackle the logistical and informatics challenges this presents. The GEN2PHEN Project represents one such effort, with a broad strategy of uniting disparate G2P resources into a hybrid centralized-federated network. This is achieved through a holistic strategy focussed on three overlapping areas: data input standards and pipelines through which to submit and collect data (data in); federated, independent, extendable, yet interoperable database platforms on which to store and curate widely diverse datasets (data storage); and data formats and mechanisms with which to exchange, combine, and extract data (data exchange and output). To fully leverage this data network, we have constructed the “G2P Knowledge Centre” ( This central platform provides holistic searching of the G2P data domain allied with facilities for data annotation and user feedback, access to extensive G2P and informatics resources, and tools for constructing online working communities centered on the G2P domain. Through the efforts of GEN2PHEN, and through combining data with broader community-derived knowledge, the Knowledge Centre opens up exciting possibilities for organizing, integrating, sharing, and interpreting new waves of G2P data in a collaborative fashion

Topics: genotype–phenotype, association, GWAS, informatics, database, integration, Web services, variation
Publisher: Wiley
Year: 2011
DOI identifier: 10.1002/humu.21469
OAI identifier:

Suggested articles


  1. (2009). Adventures in semantic publishing: exemplar semantic enhancements of a research article. doi
  2. (2009). An infrastructure for interconnecting research institutions. Drug Discov Today 14:605–610. doi
  3. (2003). Assessing the impact of biobanks. doi
  4. (2010). BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Res 38(Suppl):W689–W694. doi
  5. (2008). BioJava: an open-source framework for bioinformatics. doi
  6. (2009). BioMart—biological queries made easy. doi
  7. (2006). caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid. doi
  8. (2008). Calling on a million minds for community annotation in WikiProteins. Genome Biol 9:R89. doi
  9. (2005). Cyberinfrastructure for e-Science. doi
  10. (2005). Cyberinfrastructure: empowering a ‘‘third way’’ in biomedical research. doi
  11. (2009). Data sharing in genomics—re-shaping scientific practice. doi
  12. (2010). Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. doi
  13. (2005). DNA, diseases and databases: disastrously deficient. doi
  14. (2010). Ethical implications of the use of whole genome methods in medical research. doi
  15. (2008). Genotype–phenotype databases: challenges and solutions for the post-genomic era. doi
  16. (2009). Head in the clouds: Re-imagining the experimental laboratory record for the web-based networked world. doi
  17. (2008). HGVbaseG2P: a central genetic association database. Nucleic Acids Res 37:D797–D802. doi
  18. (2008). Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 29:6–13. doi
  19. (2008). Integrating biological data—the Distributed Annotation System. BMC Bioinformatics 9:S3. doi
  20. (2010). Locus Reference Genomic sequences: an improved basis for describing human DNA variants. doi
  21. (2010). Locusspecific database domain and data content analysis: evolution and content maturation toward clinical usea. Hum Mutat 31:1109–1116. doi
  22. (2005). LOVD: easy creation of a locus-specific sequence variation database using an ‘‘LSDB-in-a-box’’ approach. Hum Mutat 26:63–68. doi
  23. (2008). Members of the RSBI Working Group. doi
  24. (2005). Mutation Database): doi
  25. (1999). MUTbase: maintenance and analysis of distributed mutation databases. doi
  26. (2010). myExperiment: a repository and social network for the sharing of bioinformatics workflows. doi
  27. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. doi
  28. (2010). Practical guidelines addressing ethical issues pertaining to the curation of human locus-specific variation databases (LSDBs). doi
  29. (2008). Prepare for the deluge. doi
  30. (2008). Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. doi
  31. (2001). Publishing on the semantic web. doi
  32. (2008). Semantic mashup of biomedical data. doi
  33. (2011). Sharing research data to improve public health: full joint statement by funders of health research. Available at: About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-epidemiology/ WTDV030690.htm [Accessed
  34. (2011). Sharing research data to improve public health. The Lancet Published online 10 January ahead of print. Available at:
  35. (2006). Standardizing the standards. doi
  36. (2006). Standards for systems biology. doi
  37. (2008). State of the nation in data integration for bioinformatics. doi
  38. (2006). Taverna: a tool for building and running workflows of services. doi
  39. (2004). Taverna: a tool for the composition and enactment of bioinformatics workflows. doi
  40. (2002). The Bioperl Toolkit: Perl modules for the life sciences.
  41. (2001). The Distributed Annotation System.
  42. (2011). The future of biocuration.
  43. (2009). The Human Gene Mutation Database: doi
  44. (2010). The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button. doi
  45. (2009). The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation. Hum Mutat 30:968–977. doi
  46. (2008). The RNA WikiProject: community annotation of RNA families. doi
  47. (2005). Time to organize the bioinformatics resourceome. PLoS Comput Biol 1:e76. doi
  48. (2008). Tracing biological collections: between books and clinical trials. doi
  49. (2010). What do I want from the publisher of the future? PLoS Comput Biol 6:e1000787. doi
  50. (2005). Will a biological database be different from a biological journal? PLoS doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.