23,658 research outputs found
BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in
biological data, demanding the development of large-scale bioinformatics
experiments. Because these experiments are computation- and data-intensive,
they require high-performance computing (HPC) techniques and can benefit from
specialized technologies such as Scientific Workflow Management Systems (SWfMS)
and databases. In this work, we present BioWorkbench, a framework for managing
and analyzing bioinformatics experiments. This framework automatically collects
provenance data, including both performance data from workflow execution and
data from the scientific domain of the workflow application. Provenance data
can be analyzed through a web application that abstracts a set of queries to
the provenance database, simplifying access to provenance information. We
evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree
assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a
RASopathy analysis workflow. We analyze each workflow from both computational
and scientific domain perspectives, by using queries to a provenance and
annotation database. Some of these queries are available as a pre-built feature
of the BioWorkbench web application. Through the provenance data, we show that
the framework is scalable and achieves high-performance, reducing up to 98% of
the case studies execution time. We also show how the application of machine
learning techniques can enrich the analysis process
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Comparative Genomics of Cyanobacterial Symbionts Reveals Distinct, Specialized Metabolism in Tropical Dysideidae Sponges.
Marine sponges are recognized as valuable sources of bioactive metabolites and renowned as petri dishes of the sea, providing specialized niches for many symbiotic microorganisms. Sponges of the family Dysideidae are well documented to be chemically talented, often containing high levels of polyhalogenated compounds, terpenoids, peptides, and other classes of bioactive small molecules. This group of tropical sponges hosts a high abundance of an uncultured filamentous cyanobacterium, Hormoscilla spongeliae Here, we report the comparative genomic analyses of two phylogenetically distinct Hormoscilla populations, which reveal shared deficiencies in essential pathways, hinting at possible reasons for their uncultivable status, as well as differing biosynthetic machinery for the production of specialized metabolites. One symbiont population contains clustered genes for expanded polybrominated diphenylether (PBDE) biosynthesis, while the other instead harbors a unique gene cluster for the biosynthesis of the dysinosin nonribosomal peptides. The hybrid sequencing and assembly approach utilized here allows, for the first time, a comprehensive look into the genomes of these elusive sponge symbionts.IMPORTANCE Natural products provide the inspiration for most clinical drugs. With the rise in antibiotic resistance, it is imperative to discover new sources of chemical diversity. Bacteria living in symbiosis with marine invertebrates have emerged as an untapped source of natural chemistry. While symbiotic bacteria are often recalcitrant to growth in the lab, advances in metagenomic sequencing and assembly now make it possible to access their genetic blueprint. A cell enrichment procedure, combined with a hybrid sequencing and assembly approach, enabled detailed genomic analysis of uncultivated cyanobacterial symbiont populations in two chemically rich tropical marine sponges. These population genomes reveal a wealth of secondary metabolism potential as well as possible reasons for historical difficulties in their cultivation
metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella
The metabolic SearcH And Reconstruction Kit
(metaSHARK) is a new fully automated software package
for the detection of enzyme-encoding genes
within unannotated genome data and their visualization
in the context of the surrounding metabolic network.
The gene detection package (SHARKhunt) runs
on a Linux systemand requires only a set of raw DNA
sequences (genomic, expressed sequence tag and/
or genome survey sequence) as input. Its output
may be uploaded to our web-based visualization
tool (SHARKview) for exploring and comparing data
from different organisms. We first demonstrate the
utility of the software by comparing its results for
the raw Plasmodium falciparum genome with the
manual annotations available at the PlasmoDB and
PlasmoCyc websites. We then apply SHARKhunt to
the unannotated genome sequences of the coccidian
parasite Eimeria tenella and observe that, at an
E-value cut-off of 10(-20), our software makes 142
additional assertions of enzymatic function compared
with a recent annotation package working
with translated open reading frame sequences. The
ability of the software to cope with low levels of
sequence coverage is investigated by analyzing
assemblies of the E.tenella genome at estimated
coverages from 0.5x to 7.5x. Lastly, as an example
of how metaSHARK can be used to evaluate the
genomic evidence for specific metabolic pathways,
we present a study of coenzyme A biosynthesis in
P.falciparum and E.tenella
Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement
Multiple genome alignment remains a challenging problem. Effects of
recombination including rearrangement, segmental duplication, gain, and loss
can create a mosaic pattern of homology even among closely related organisms.
We describe a method to align two or more genomes that have undergone
large-scale recombination, particularly genomes that have undergone substantial
amounts of gene gain and loss (gene flux). The method utilizes a novel
alignment objective score, referred to as a sum-of-pairs breakpoint score. We
also apply a probabilistic alignment filtering method to remove erroneous
alignments of unrelated sequences, which are commonly observed in other genome
alignment methods. We describe new metrics for quantifying genome alignment
accuracy which measure the quality of rearrangement breakpoint predictions and
indel predictions. The progressive genome alignment algorithm demonstrates
markedly improved accuracy over previous approaches in situations where genomes
have undergone realistic amounts of genome rearrangement, gene gain, loss, and
duplication. We apply the progressive genome alignment algorithm to a set of 23
completely sequenced genomes from the genera Escherichia, Shigella, and
Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content
conserved among all taxa and total unique content of 15.2Mbp. We document
substantial population-level variability among these organisms driven by
homologous recombination, gene gain, and gene loss. Free, open-source software
implementing the described genome alignment approach is available from
http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200
AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system
We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license
- …