27 research outputs found

    Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future.

    Get PDF
    "Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Genome urbanization: clusters of topologically co-regulated genes delineate functional compartments in the genome of Saccharomyces cerevisiae

    No full text
    The eukaryotic genome evolves under the dual constraint of maintaining coordinated gene transcription and performing effective DNA replication and cell division, the coupling of which brings about inevitable DNA topological tension. DNA supercoiling is resolved and, in some cases, even harnessed by the genome through the function of DNA topoisomerases, as has been shown in the concurrent transcriptional activation and suppression of genes upon transient deactivation of topoisomerase II (topoII). By analyzing a genome-wide transcription run-on experiment upon thermal inactivation of topoII in Saccharomyces cerevisiae we were able to define 116 gene clusters of consistent response (either positive or negative) to topological stress. A comprehensive analysis of these topologically co-regulated gene clusters reveals pronounced preferences regarding their functional, regulatory and structural attributes. Genes that negatively respond to topological stress, are positioned in gene-dense pericentromeric regions, are more conserved and associated to essential functions, while upregulated gene clusters are preferentially located in the gene-sparse nuclear periphery, associated with secondary functions and under complex regulatory control. We propose that genome architecture evolves with a core of essential genes occupying a compact genomic ‘old town’, whereas more recently acquired, condition-specific genes tend to be located in a more spacious ‘suburban’ genomic periphery.University of Crete Small-Scale Research Grant [4274 to C.N.]. Funding for open access charge: Plan Nacional de I+D+I of Spain Grant Number: BFU2015-67007-P to J.R.Peer reviewe

    Gene socialization: gene order, GC content and gene silencing in Salmonella

    No full text
    BACKGROUND: Genes of conserved order in bacterial genomes tend to evolve slower than genes whose order is not conserved. In addition, genes with a GC content lower than the GC content of the resident genome are known to be selectively silenced by the histone-like nucleoid structuring protein (H-NS) in Salmonella. RESULTS: In this study, we use a comparative genomics approach to demonstrate that in Salmonella, genes whose order is not conserved (or genes without homologs) in closely related bacteria possess a significantly lower average GC content in comparison to genes that preserve their relative position in the genome. Moreover, these genes are more frequently targeted by H-NS than genes that have conserved their genomic neighborhood. We also observed that duplicated genes that do not preserve their genomic neighborhood are, on average, under less selective pressure. CONCLUSIONS: We establish a strong association between gene order, GC content and gene silencing in a model bacterial species. This analysis suggests that genes that are not under strong selective pressure (evolve faster than others) in Salmonella tend to accumulate more AT-rich mutations and are eventually silenced by H-NS. Our findings may establish new approaches for a better understanding of bacterial genome evolution and function, using information from functional and comparative genomics

    DrugQuest - a text mining workflow for drug association discovery.

    No full text
    BackgroundText mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases.ResultsHerein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface.ConclusionsDrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest

    Supplement (01-10)

    No full text
    Supplements described in manuscript: S1 is sequence input (fasta format); S2 is time-ordered list of species; S3 is CAST output; S4 is parsed BLAST output; S5 is MCL output; S6 is BLAST output for unique genes; S7 is genome distance matrix; S8 is full sequence dataset from Figure 4 (fasta format); S9 is sequence dataset from Figure 5 (fasta format); S10 is count of new family contributions
    corecore