13 research outputs found

    Gene Characterization Index: Assessing the Depth of Gene Annotation

    Get PDF
    We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets.The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation.The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/

    Genomics and bioinformatics approaches to functional gene annotation

    Get PDF
    Biomedical research has been undergoing a quasi-revolution with the dawn of the genomics era. The flood of sequence data from the various genome projects, the task of cataloging the entire coding portion of a genome instead of identifying and characterizing individual genes, as well as technical demands accompanying these developments have posed great challenges to the research community. Although the entire human genome sequence has been virtually recorded, fundamental issues remain about the precise number of protein coding genes, as well as their functional characterization. Available resources for the study of human gene function include large genome annotation pipelines, expression profiling data, and protein interaction screens. To gain biological insights from this maze of data, one must both find mechanisms to organize the information and assess the quality of the results. This thesis focuses on the functional annotation of sparsely characterized human genes and their encoded proteins. The work includes four stages: I. Gene expression profiling II. Assessment of the level of characterization of human genes III. Projection of protein networks from lower eukaryotes onto human IV. Integration of computational and experimental results for data mining. Initially, a cross-platform comparison for a set of gene expression profiling techniques was carried out to compare the performance of cutting-edge high-throughput methods and conventional approaches in terms of sensitivity, reliability, and throughput. In this study, we demonstrated that correlation between the different methods was poor and thus multi-technique validation was justified. Nonetheless, the strongest correlation between the new reference data in our report, i.e., a collection of traditional Northern blots, was observed with microarray-based technologies. The assessment of the level of functional characterization of human genes was addressed in the second study, where we developed a scoring system to quantify the annotation status of each human gene. We created a metric to effectively predict the characterization status of human genes based on a set of predictors from the GeneLynx database (http://www.genelynx.org). This scoring function will not only assist the targeted analysis of groups of sparsely annotated genes and proteins, but will prove itself useful in the monitoring of long-term gene annotation efforts and the overall annotation status of the human genome. Comparative genomics efforts to transfer gene annotation from proteins in amenable model organisms onto human proteins are currently restricted by the limited availability of experimental data. Nonetheless, we demonstrated how protein networks could be effectively projected from lower eukaryotes onto human and how the confidence in these projections increased with redundantly detected protein interactions. This so-called Interolog Analysis offers promise for reliable inference of protein function. The bioinformatics system we created (Ulysses) provides a novel intuitive interface for biologists studying human proteins. As data depth and coverage will increase over time, this system will prove to be valuable in the extended prediction of high-confidence functional associations of a large portion of human genes. The fusion of experimental data and computational predictions is a central goal of functional genomics. We constructed a bioinformatics workbench for the study of uncharacterized human gene families. By assembling bioinformatics resources and experimental results in a common space, the NovelFam3000 system facilitates functional characterization. Working with a collection of uncharacterized genes, we demonstrated how bioinformatics methods can lead to novel inferences about cellular function of specific protein families. This thesis unites the identification of uncharacterized human genes, the assessment of genomics data quality, and the application of high-throughput data for the inference of protein function

    The inequities of mental health research funding.

    Get PDF

    Combining chemical genomics screens in yeast to reveal spectrum of effects of chemical inhibition of sphingolipid biosynthesis

    No full text
    Background: Single genome-wide screens for the effect of altered gene dosage on drug sensitivity in the model organism Saccharomyces cerevisiae provide only a partial picture of the mechanism of action of a drug. Results Using the example of the tumor cell invasion inhibitor dihydromotuporamine C, we show that a more complete picture of drug action can be obtained by combining different chemical genomics approaches - analysis of the sensitivity of ρ 0 cells lacking mitochondrial DNA, drug-induced haploinsufficiency, suppression of drug sensitivity by gene overexpression and chemical-genetic synthetic lethality screening using strains deleted of nonessential genes. Killing of yeast by this chemical requires a functional mitochondrial electron-transport chain and cytochrome c heme lyase function. However, we find that it does not require genes associated with programmed cell death in yeast. The chemical also inhibits endocytosis and intracellular vesicle trafficking and interferes with vacuolar acidification in yeast and in human cancer cells. These effects can all be ascribed to inhibition of sphingolipid biosynthesis by dihydromotuporamine C. Conclusion Despite their similar conceptual basis, namely altering drug sensitivity by modifying gene dosage, each of the screening approaches provided a distinct set of information that, when integrated, revealed a more complete picture of the mechanism of action of a drug on cells.Biochemistry and Molecular Biology, Department ofCellular and Physiological Sciences, Department ofMedicine, Faculty ofNon UBCReviewedFacult

    NovelFam3000 - Uncharacterized human protein domains conserved across model organisms

    Get PDF
    Background: Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins. Description: From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system. Conclusion: Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families.Medical Genetics, Department ofMedicine, Faculty ofMolecular Medicine and Therapeutics, Centre forReviewedFacult

    Ulysses - an application for the projection of molecular interactions across species

    No full text
    We developed Ulysses as a user-oriented system that uses a process called Interolog Analysis for the parallel analysis and display of protein interactions detected in various species. Ulysses was designed to perform such Interolog Analysis by the projection of model organism interaction data onto homologous human proteins, and thus serves as an accelerator for the analysis of uncharacterized human proteins. The relevance of projections was assessed and validated against published reference collections. All source code is freely available, and the Ulysses system can be accessed via a web interface http://www.cisreg.ca/ulysses .Computer Science, Department ofMedical Genetics, Department ofMedicine, Faculty ofMolecular Medicine and Therapeutics, Centre forScience, Faculty ofReviewedFacult
    corecore