25,625 research outputs found
Recommended from our members
Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads.
As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution
One of the important questions in biological evolution is to know if certain
changes along protein coding genes have contributed to the adaptation of
species. This problem is known to be biologically complex and computationally
very expensive. It, therefore, requires efficient Grid or cluster solutions to
overcome the computational challenge. We have developed a Grid-enabled tool
(gcodeml) that relies on the PAML (codeml) package to help analyse large
phylogenetic datasets on both Grids and computational clusters. Although we
report on results for gcodeml, our approach is applicable and customisable to
related problems in biology or other scientific domains.Comment: 10 pages, 4 figures. To appear in the HealthGrid 2012 con
An Open Framework for Extensible Multi-Stage Bioinformatics Software
In research labs, there is often a need to customise software at every step
in a given bioinformatics workflow, but traditionally it has been difficult to
obtain both a high degree of customisability and good performance.
Performance-sensitive tools are often highly monolithic, which can make
research difficult. We present a novel set of software development principles
and a bioinformatics framework, Friedrich, which is currently in early
development. Friedrich applications support both early stage experimentation
and late stage batch processing, since they simultaneously allow for good
performance and a high degree of flexibility and customisability. These
benefits are obtained in large part by basing Friedrich on the multiparadigm
programming language Scala. We present a case study in the form of a basic
genome assembler and its extension with new functionality. Our architecture has
the potential to greatly increase the overall productivity of software
developers and researchers in bioinformatics.Comment: 12 pages, 1 figure, to appear in proceedings of PRIB 201
- …