4,593 research outputs found
FAST: FAST Analysis of Sequences Toolbox.
FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
A common distributed language approach to software integration
An important objective in software integration is the development of techniques to allow programs written in different languages to function together. Several approaches are discussed toward achieving this objective and the Common Distributed Language Approach is presented as the approach of choice
Dynamically typed languages
Dynamically typed languages such as Python and Ruby have experienced a rapid grown in popularity in recent times. However, there is much confusion as to what makes these languages interesting relative to statically typed languages, and little knowledge of their rich history. In this chapter I explore the general topic of dynamically typed languages, how they differ from statically typed languages, their history, and their defining features
SICLE: A high-throughput tool for extracting evolutionary relationships from phylogenetic trees
We present the phylogeny analysis software SICLE (Sister Clade Extractor), an
easy-to-use, high- throughput tool to describe the nearest neighbors to a node
of interest in a phylogenetic tree as well as the support value for the
relationship. The application is a command line utility that can be embedded
into a phylogenetic analysis pipeline or can be used as a subroutine within
another C++ program. As a test case, we applied this new tool to the published
phylome of Salinibacter ruber, a species of halophilic Bacteriodetes,
identifying 13 unique sister relationships to S. ruber across the 4589 gene
phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in
the majority of phylogenies, but 91 phylogenies showed a branch-supported
sister association between S. ruber and Archaea, an evolutionarily intriguing
relationship indicative of horizontal gene transfer. This test case
demonstrates how SICLE makes it possible to summarize the phylogenetic
information produced by automated phylogenetic pipelines to rapidly identify
and quantify the possible evolutionary relationships that merit further
investigation. SICLE is available for free for noncommercial use at
http://eebweb.arizona.edu/sicle/.Comment: 8 pages, 4 figures in journal submission forma
Facilitating modular property-preserving extensions of programming languages
We will explore an approach to modular programming language descriptions and extensions in a denotational style.
Based on a language core, language features are added stepwise on the core. Language features can be described
separated from each other in a self-contained, orthogonal way. We present an extension semantics framework consisting
of mechanisms to adapt semantics of a basic language to new structural requirements in an extended language
preserving the behaviour of programs of the basic language. Common templates of extension are provided. These
can be collected in extension libraries accessible to and extendible by language designers. Mechanisms to extend
these libraries are provided. A notation for describing language features embedding these semantics extensions is
presented
NASA's supercomputing experience
A brief overview of NASA's recent experience in supercomputing is presented from two perspectives: early systems development and advanced supercomputing applications. NASA's role in supercomputing systems development is illustrated by discussion of activities carried out by the Numerical Aerodynamical Simulation Program. Current capabilities in advanced technology applications are illustrated with examples in turbulence physics, aerodynamics, aerothermodynamics, chemistry, and structural mechanics. Capabilities in science applications are illustrated by examples in astrophysics and atmospheric modeling. Future directions and NASA's new High Performance Computing Program are briefly discussed
- ā¦