27 research outputs found
Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future.
"Î picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further
Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of âmetagenomicsâ, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards
Recommended from our members
BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery.
The process of navigating through the landscape of biomedical literature and performing searches or combining them with bioinformatics analyses can be daunting, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(Âź) and related repositories. Herein, we present BioTextQuest v2.0, a tool for biomedical literature mining. BioTextQuest v2.0 is an open-source online web portal for document clustering based on sets of selected biomedical terms, offering efficient management of information derived from PubMed abstracts. Employing established machine learning algorithms, the tool facilitates document clustering while allowing users to customize the analysis by selecting terms of interest. BioTextQuest v2.0 streamlines the process of uncovering valuable insights from biomedical research articles, serving as an agent that connects the identification of key terms like genes/proteins, diseases, chemicals, Gene Ontology (GO) terms, functions, and others through named entity recognition, and their application in biological research. Instead of manually sifting through articles, researchers can enter their PubMed-like query and receive extracted information in two user-friendly formats, tables and word clouds, simplifying the comprehension of key findings. The latest update of BioTextQuest leverages the EXTRACT named entity recognition tagger, enhancing its ability to pinpoint various biological entities within text. BioTextQuest v2.0 acts as a research assistant, significantly reducing the time and effort required for researchers to identify and present relevant information from the biomedical literature
Genome urbanization: clusters of topologically co-regulated genes delineate functional compartments in the genome of Saccharomyces cerevisiae
The eukaryotic genome evolves under the dual constraint of maintaining coordinated gene transcription and performing effective DNA replication and cell division, the coupling of which brings about inevitable DNA topological tension. DNA supercoiling is resolved and, in some cases, even harnessed by the genome through the function of DNA topoisomerases, as has been shown in the concurrent transcriptional activation and suppression of genes upon transient deactivation of topoisomerase II (topoII). By analyzing a genome-wide transcription run-on experiment upon thermal inactivation of topoII in Saccharomyces cerevisiae we were able to define 116 gene clusters of consistent response (either positive or negative) to topological stress. A comprehensive analysis of these topologically co-regulated gene clusters reveals pronounced preferences regarding their functional, regulatory and structural attributes. Genes that negatively respond to topological stress, are positioned in gene-dense pericentromeric regions, are more conserved and associated to essential functions, while upregulated gene clusters are preferentially located in the gene-sparse nuclear periphery, associated with secondary functions and under complex regulatory control. We propose that genome architecture evolves with a core of essential genes occupying a compact genomic âold townâ, whereas more recently acquired, condition-specific genes tend to be located in a more spacious âsuburbanâ genomic periphery.University of Crete Small-Scale Research Grant [4274 to C.N.]. Funding for open access charge: Plan Nacional de I+D+I of Spain Grant Number: BFU2015-67007-P to J.R.Peer reviewe
Gene socialization: gene order, GC content and gene silencing in Salmonella
BACKGROUND: Genes of conserved order in bacterial genomes tend to evolve slower than genes whose order is not conserved. In addition, genes with a GC content lower than the GC content of the resident genome are known to be selectively silenced by the histone-like nucleoid structuring protein (H-NS) in Salmonella. RESULTS: In this study, we use a comparative genomics approach to demonstrate that in Salmonella, genes whose order is not conserved (or genes without homologs) in closely related bacteria possess a significantly lower average GC content in comparison to genes that preserve their relative position in the genome. Moreover, these genes are more frequently targeted by H-NS than genes that have conserved their genomic neighborhood. We also observed that duplicated genes that do not preserve their genomic neighborhood are, on average, under less selective pressure. CONCLUSIONS: We establish a strong association between gene order, GC content and gene silencing in a model bacterial species. This analysis suggests that genes that are not under strong selective pressure (evolve faster than others) in Salmonella tend to accumulate more AT-rich mutations and are eventually silenced by H-NS. Our findings may establish new approaches for a better understanding of bacterial genome evolution and function, using information from functional and comparative genomics
DrugQuest - a text mining workflow for drug association discovery.
BackgroundText mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases.ResultsHerein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface.ConclusionsDrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest
Supplement (01-10)
Supplements described in manuscript: S1 is sequence input (fasta format); S2 is time-ordered list of species; S3 is CAST output; S4 is parsed BLAST output; S5 is MCL output; S6 is BLAST output for unique genes; S7 is genome distance matrix; S8 is full sequence dataset from Figure 4 (fasta format); S9 is sequence dataset from Figure 5 (fasta format); S10 is count of new family contributions