11 research outputs found

    Computational approaches for integrative cancer genomics

    Get PDF
    Given the complexity and heterogeneity of cancer, the development of new high-throughput wide-genome technologies has open new possibilities for its study. Several projects around the globe are exploiting these technologies for generating unprecedented amount of data for cancer genomes. Its analysis, integration and exploration are still a key challenge in the field. In this dissertation, we first present Gitools, a tool for accessing databases in biology, analysing high-throughput data, and visualising multi-dimensional results with interactive heatmaps. Then, we show IntOGen, the methodology employed for collection and organization of the data, the methods used for its analysis, and how the results and analysis were made available to other researchers. Finally, we compare several methods for impact prediction of non-synonymous mutations, showing that new tools specifically designed for cancer outperform those traditionally used for general diseases, and also the need for using other sources of information for better prediction of cancer mutations.Davant de la complexitat i heterogeneitat del cancer, el desenvolupament de noves tecnologies per l'estudi de genomes, ha obert noves posibilitats. Diversos projectes al voltant del mon les fan servir per generar quantitats de dades de genomes de cancer mai vistes abans. En aquest treball, primer presentem Gitools, una eina que permet obtenir dades de bases de dades en biologia, anal itzar dades genomiques, i visual itzar els resul tats multidimensionals mitjançant mapes de calor interactius. Després mostrem IntOGen, les metodologies per obtenir i organitzar les dades, els metodes per el seu analisi, i com es van possar a disposició d'altres investigadors. Finalment, comparem diversos metods de predicció de l'impacte de les mutacions no sinonimes, que ens mostra com nou metods desenvolupats per cancer funcionen millor que els utilitzats tradicionalment per enfermetats generals, aixis com la necesitat de recorrer a altres fonts d'informació per tenir millor prediccions per mutacions de cancer

    Computational approaches for integrative cancer genomics

    No full text
    Given the complexity and heterogeneity of cancer, the development of new high-throughput wide-genome technologies has open new possibilities for its study. Several projects around the globe are exploiting these technologies for generating unprecedented amount of data for cancer genomes. Its analysis, integration and exploration are still a key challenge in the field. In this dissertation, we first present Gitools, a tool for accessing databases in biology, analysing high-throughput data, and visualising multi-dimensional results with interactive heatmaps. Then, we show IntOGen, the methodology employed for collection and organization of the data, the methods used for its analysis, and how the results and analysis were made available to other researchers. Finally, we compare several methods for impact prediction of non-synonymous mutations, showing that new tools specifically designed for cancer outperform those traditionally used for general diseases, and also the need for using other sources of information for better prediction of cancer mutations.Davant de la complexitat i heterogeneitat del cancer, el desenvolupament de noves tecnologies per l'estudi de genomes, ha obert noves posibilitats. Diversos projectes al voltant del mon les fan servir per generar quantitats de dades de genomes de cancer mai vistes abans. En aquest treball, primer presentem Gitools, una eina que permet obtenir dades de bases de dades en biologia, anal itzar dades genomiques, i visual itzar els resul tats multidimensionals mitjançant mapes de calor interactius. Després mostrem IntOGen, les metodologies per obtenir i organitzar les dades, els metodes per el seu analisi, i com es van possar a disposició d'altres investigadors. Finalment, comparem diversos metods de predicció de l'impacte de les mutacions no sinonimes, que ens mostra com nou metods desenvolupats per cancer funcionen millor que els utilitzats tradicionalment per enfermetats generals, aixis com la necesitat de recorrer a altres fonts d'informació per tenir millor prediccions per mutacions de cancer

    Wok

    No full text
    Workflows in Wok are defined in an xml file with the .flow extension. This definition includes:/n- the different modules (or pieces of processing)/n- the interconnections between modules (i.e. the input of module B links with the output of module A)/n- explicit dependencies (i.e. module A cannot be executed until module B has finished)/n- descriptions that can be used to generate documentation automatically or to create web forms/nEach module corresponds with a piece of software that has to be run in order to process some input and generate an output. For now, only Python scripts are allowed, but they can be used to execute software written in other languages./nWorkflows in Wok can be treated as any software project and managed with version control system tools and the IDE of your choice./nWok can be used as a terminal script or can be run in server mode./nThe execution of a workflow in the terminal is done using the wok-run script which allows few options:/n- An instance name (-n name), which allows to run the same workflow many times simultaneously independently/n- Configuration files (-c file.conf), the configuration can be splitted in as much files as desired/n- Configuration parameters (-D param=value), which overwrite any previous configuration in configuration files/nThe workflow definition file (i.e. myworkflow.flow) is passed as the first argument./nTo monitor the execution of the workflow there are different resources available:/n- The web server that allows to interact with the engine in a very straightforward way. Recommended!./n- The logs emited by the wok-run through the standard output,/n- The intermediate files generated by Wok (i.e. the tasks output files)/nIt has been designed for workflow developers who feel more confortable programming than doing hundred of clicks and drag & drop's, and also for those who want infraestructure flexibility and full control and monitorization of the execution.Wok is a workflow management system implemented in Python that makes very easy to structure the workflows, parallelize their execution and monitor its progress among other things. It is designed in a modular way allowing to adapt it to different infraestructures./nFor the time being it is strongly focused on clusters implementing any DRMAA compatible resource manager (i.e. Oracle Grid Engine) which working nodes have a shared folder in common. Other, more flexible infrastructures (such as the Amazon EC2) are considered for future implementations

    Gitools: analysis and visualisation of genomic data using interactive heat-maps

    No full text
    Intuitive visualization of data and results is very important in genomics, especially when many conditions are to be analyzed and compared. Heat-maps have proven very useful for the representation of biological data. Here we present Gitools (http://www.gitools.org), an open-source tool to perform analyses and visualize data and results as interactive heat-maps. Gitools contains data import systems from several sources (i.e. IntOGen, Biomart, KEGG, Gene Ontology), which facilitate the integration of novel data with previous knowledge.The authors acknowledge funding from the Spanish Ministry of Science and Technology (grant number SAF2009-06954) and the Spanish National Institute of Bioinformatics (INB: http://www.inab.org). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscrip

    Integrative cancer genomics (IntOGen) in Biomart

    No full text
    Recently, we created IntOGen, a resource to integrate a large amount of cancer genomic data. IntOGen aims at facilitating the detection of the most recurrent alterations that drive tumorigenesis. It collates, annotates and analyzes high-throughput data about transcriptional, genomic and mutational changes taking place in tumors from different studies annotated with specific cancer types. Currently, it contains 118 studies for mRNA expression profiling and 188 studies for genomic alterations covering in total 64 different tumor topographies. In this article, we describe the Biomart portal for IntOGen. The portal provides easy access to different types of data and facilitates the bulk download of all the analysis results. Here, we describe the general features of IntOGen and give example queries to demonstrate its use. Database URL: www.intogen.org

    The BioMart community portal: an innovative alternative to large, centralized data repositories

    No full text
    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations

    BioMart Central Portal: an open database network for the biological community

    No full text
    BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities

    IntOGen - Cancer Drivers Database (2014)

    Get PDF
    File contents/n-----------------/nTumor_cohort_details.tsv/nDatasets of somatic mutations employed in the analysis to detect drivers/nCNA_drivers_per_tumor_type.tsv/nList of 29 CNA cancer driver genes in TCGA cohort./nFusion_drivers_per_tumor_type.tsv/nList of 10 fusion driver genes in TCGA cohort./nMutational_drivers_per_tumor_type.tsv/nList of 459 mutation driver genes in full cohort./nMutational_drivers_project_detection.tsv/nList of 459 mutation driver genes detected by project./nMutational_drivers_signals.tsv/nList of genes with at least 1 signal of positive selection across the 6792 samples./nMutational_drivers_count_frequency.tsv/n List of 459 mutational drivers the count of mutated samples across all tumor types./nDrivers_type_role.tsv/nList of 475 drivers, driver type (mutational, CNA and/or fusion driver) and its role in cancer/nDrivers_cloncal_frequency.tsv/nList of 666 genes for which we were able to compute clonal frequencyThis database contains information on the genes identified as drivers in Rubio-Perez and Tamborero et al. (2015). It contains information on driver identification at mutational, CNA and gene fusion level. Additional ancillary information about the role and major clonality of drivers is also included. A table is also provided with the list of datasets used for mutational driver identification

    In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities

    No full text
    Large efforts dedicated to detect somatic alterations across tumor genomes/exomes are expected to produce significant improvements in precision cancer medicine. However, high inter-tumor heterogeneity is a major obstacle to developing and applying therapeutic targeted agents to treat most cancer patients. Here, we offer a comprehensive assessment of the scope of targeted therapeutic agents in a large pan-cancer cohort. We developed an in silico prescription strategy based on identification of the driver alterations in each tumor and their druggability options. Although relatively few tumors are tractable by approved agents following clinical guidelines (5.9%), up to 40.2% could benefit from different repurposing options, and up to 73.3% considering treatments currently under clinical investigation. We also identified 80 therapeutically targetable cancer genes.We acknowledge funding from the Spanish Ministry of Economy and Competitiveness (grant number SAF2012-36199), La Fundació la Marató de TV3, and the Spanish National Institute of Bioinformatics (INB). C.R.-P. and M.P.S. are supported by an FPI fellowship. D.T. is supported by the People Programme (Marie Curie Actions) of the Seventh Framework Programme of the European Union (FP7/2007-2013) under REA grant agreement number 600388 and by the Agency of Competitiveness for Companies of the Government of Catalonia, ACCIÓ. A.G.-P. is supported by a Ramón y Cajal contract

    Comprehensive identification of mutational cancer driver genes across 12 tumor types

    Get PDF
    With the ability to fully sequence tumor genomes/exomes, the quest for cancer driver genes can now be undertaken in an unbiased manner. However, obtaining a complete catalog of cancer genes is difficult due to the heterogeneous molecular nature of the disease and the limitations of available computational methods. Here we show that the combination of complementary methods allows identifying a comprehensive and reliable list of cancer driver genes. We provide a list of 291 high-confidence cancer driver genes acting on 3,205 tumors from 12 different cancer types. Among those genes, some have not been previously identified as cancer drivers and 16 have clear preference to sustain mutations in one specific tumor type. The novel driver candidates complement our current picture of the emergence of these diseases. In summary, the catalog of driver genes and the methodology presented here open new avenues to better understand the mechanisms of tumorigenesis.We acknowledge funding from the Spanish Ministry of Science and Technology (grant number SAF2009-06954 and SAF2012-36199) and the Spanish National Institute of Bioinformatics (INB). This work was supported by NRNB (U.S. National Institutes of Health, National Center for Research Resources grant number P41 GM103504). We gratefully acknowledge the contributions from the TCGA Research Network and its TCGA Pan-Cancer Analysis Working Group (contributing consortium members are listed in Supplementary Note 1)
    corecore