24 research outputs found

    Gitools: Analysis and Visualisation of Genomic Data Using Interactive Heat-Maps

    Get PDF
    Intuitive visualization of data and results is very important in genomics, especially when many conditions are to be analyzed and compared. Heat-maps have proven very useful for the representation of biological data. Here we present Gitools (http://www.gitools.org), an open-source tool to perform analyses and visualize data and results as interactive heat-maps. Gitools contains data import systems from several sources (i.e. IntOGen, Biomart, KEGG, Gene Ontology), which facilitate the integration of novel data with previous knowledge

    Computational study of cancer

    Get PDF
    In my thesis, I focused on integrative analysis of high-throughput oncogenomic data. This was done in two parts: In the first part, I describe IntOGen, an integrative data mining tool for the study of cancer. This system collates, annotates, pre-processes and analyzes large-scale data for transcriptomic, copy number aberration and mutational profiling of a large number of tumors in multiple cancer types. All oncogenomic data is annotated with ICD-O terms. We perform analysis at different levels of complexity: at the level of genes, at the level of modules, at the level of studies and finally combination of studies. The results are publicly available in a web service. I also present the Biomart interface of IntOGen for bulk download of data. In the final part, I propose a methodology based on sample-level enrichment analysis to identify patient subgroups from high-throughput profiling of tumors. I also apply this approach to a specific biological problem and characterize properties of worse prognosis tumor in multiple cancer types. This methodology can be used in the translational version of IntOGen

    BioMart Central Portal: an open database network for the biological community

    Get PDF
    BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities

    BioMart: a data federation framework for large collaborative projects

    Get PDF
    BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike

    The BioMart community portal: an innovative alternative to large, centralized data repositories.

    Get PDF
    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations

    Genet Med

    Get PDF
    Recent dramatic advances in multiomics research coupled with exponentially increasing volume, complexity, and interdisciplinary nature of publications are making it challenging for scientists to stay up-to-date on the literature. Strategies to address this challenge include the creation of online databases and warehouses to support timely and targeted dissemination of research findings. Although most of the early examples have been in cancer genomics and pharmacogenomics, the approaches used can be adapted to support investigators in heart, lung, blood, and sleep (HLBS) disorders research. In this article, we describe the creation of an HLBS population genomics (HLBS-PopOmics) knowledge base as an online, continuously updated, searchable database to support the dissemination and implementation of studies and resources that are relevant to clinical and public health practice. In addition to targeted searches based on the HLBS disease categories, cross-cutting themes reflecting the ethical, legal, and social implications of genomics research; systematic evidence reviews; and clinical practice guidelines supporting screening, detection, evaluation, and treatment are also emphasized in HLBS-PopOmics. Future updates of the knowledge base will include additional emphasis on transcriptomics, proteomics, metabolomics, and other omics research; explore opportunities for leveraging data sets designed to support scientific discovery; and incorporate advanced machine learning bioinformatics capabilities.CC999999/Intramural CDC HHS/United StatesZ99 CA999999/Intramural NIH HHS/United States2019-09-01T00:00:00Z30197419PMC64029527184vault:3161

    A compendium of mutational cancer driver genes

    Full text link
    A fundamental goal in cancer research is to understand the mechanisms of cell transformation. This is key to developing more efficient cancer detection methods and therapeutic approaches. One milestone towards this objective is the identification of all the genes with mutations capable of driving tumours. Since the 1970s, the list of cancer genes has been growing steadily. Because cancer driver genes are under positive selection in tumorigenesis, their observed patterns of somatic mutations across tumours in a cohort deviate from those expected from neutral mutagenesis. These deviations, which constitute signals of positive selection, may be detected by carefully designed bioinformatics methods, which have become the state of the art in the identification of driver genes. A systematic approach combining several of these signals could lead to a compendium of mutational cancer genes. In this Review, we present the Integrative OncoGenomics (IntOGen) pipeline, an implementation of such an approach to obtain the compendium of mutational cancer drivers. Its application to somatic mutations of more than 28,000 tumours of 66 cancer types reveals 568 cancer genes and points towards their mechanisms of tumorigenesis. The application of this approach to the ever-growing datasets of somatic tumour mutations will support the continuous refinement of our knowledge of the genetic basis of cancer

    Integrating regulatory and methylome data for the discovery of clear cell Renal Cell Carcinoma (ccRCC) variants

    Get PDF
    >Magister Scientiae - MScKidney cancers, of which clear cell renal cell carcinoma comprises an estimated 70%, have been placed amongst the top ten most common cancers in both males and females. With a mortality rate that exceeds 40%, kidney cancer is considered the most lethal cancer of the genitourinary system. Despite advances in its treatment, the mortality- and incidence rates across all stages of the disease have continued to climb. Since the release of the Human Genome Project in the early 2000’s, most genetics studies have focused on the protein coding region of the human genome, which accounts for a mere 2% of the entire genome. It has been suggested that diverting our focus to the other 98% of the genome, which was previously dismissed as non-functional “junk DNA”, could possibly contribute significantly to our understanding of the underlying mechanisms of complex diseases.In this study a whole genome sequencing somatic mutation data set from the International Cancer Genome Consortium was used. The non-coding somatic mutations within the promoter, intronic, 5-prime untranslated and 3-prime untranslated regions of clear cell renal cell carcinoma-implicated genes were extracted and submitted to RegulomDB for their functional annotation.As expected, most of the variants were located within the intronic regions and only a small subset of identified variants was predicted to be deleterious. Although the variants all belonged to a selected subset of kidney cancer-associated genes, the genes frequently mutated in the non-coding regions were not the same genes that were frequently mutated in the whole exome studies (where the focus is on the coding sequences). This indicates that with whole genome sequencing studies a new set of genes/variants previously unassociated with the clear cell renal cell carcinoma could be identified. In addition, most of the non-coding somatic variants fell within multiple transcriptions factor binding sites. Since many of these variants were also deleterious (as predicted by RegulomDB), this suggests that mutations in the non-coding regions could contribute to disease due to their role in transcription factor binding site disruptions and their subsequent impact on transcriptional regulation. The substantial overlap between the genes with the most aberrantly methylated variants and the genes with the most transcription factor binding site disruptions signifies a potential link between differential methylation and transcription factor binding site affinities. In contrast to the upregulated DNA methylation generally seen in promoter methylation studies, all of the significant hits in this study were hypomethylated, with the subsequent up-regulation of the genes of interest, suggesting that in the clear cell renal cell carcinoma, aberrant methylation may play a role in activating proto-oncogenes, rather than the silencing of genes. When a cross-analysis was carried out between the gene expression patterns and the transcription factor binding site disruptions, the non-coding somatic variants and differential methylation profiles, the genes affected again showed a clear overlap. Interestingly, most of the variants were not present in the 1000genomes data and thus represent novel mutations, which possibly occurred as a result of genomic instability. However, identifying novel variants are always promising, since they epitomise the possibility of developing pioneering ways to target diseases. The numerous detrimental effects a single non-coding mutation can have on other genomic processes have been demonstrated in this study and therefore validate the inclusion of non-coding regions of the genome in genetic studies in order to study complex multifactorial diseases.National Research Foundation (NRF) and DAA
    corecore