7,737 research outputs found
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
Recommended from our members
Asynchronous data retrieval from an object-oriented database
We present an object-oriented semantic database model which, similar to other object-oriented systems, combines the virtues of four concepts: the functional data model, a property inheritance hierarchy, abstract data types and message-driven computation. The main emphasis is on the last of these four concepts. We describe generic procedures that permit queries to be processed in a purely message-driven manner. A database is represented as a network of nodes and directed arcs, in which each node is a logical processing element, capable of communicating with other nodes by exchanging messages. This eliminates the need for shared memory and for centralized control during query processing. Hence, the model is suitable for implementation on a multiprocessor computer architecture, consisting of large numbers of loosely coupled processing elements
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
GRASS: Generative Recursive Autoencoders for Shape Structures
We introduce a novel neural network architecture for encoding and synthesis
of 3D shapes, particularly their structures. Our key insight is that 3D shapes
are effectively characterized by their hierarchical organization of parts,
which reflects fundamental intra-shape relationships such as adjacency and
symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a
flat, unlabeled, arbitrary part layout to a compact code. The code effectively
captures hierarchical structures of man-made 3D objects of varying structural
complexities despite being fixed-dimensional: an associated decoder maps a code
back to a full hierarchy. The learned bidirectional mapping is further tuned
using an adversarial setup to yield a generative model of plausible structures,
from which novel structures can be sampled. Finally, our structure synthesis
framework is augmented by a second trained module that produces fine-grained
part geometry, conditioned on global and local structural context, leading to a
full generative pipeline for 3D shapes. We demonstrate that without
supervision, our network learns meaningful structural hierarchies adhering to
perceptual grouping principles, produces compact codes which enable
applications such as shape classification and partial matching, and supports
shape synthesis and interpolation with significant variations in topology and
geometry.Comment: Corresponding author: Kai Xu ([email protected]
Four small puzzles that Rosetta doesn't solve
A complete macromolecule modeling package must be able to solve the simplest
structure prediction problems. Despite recent successes in high resolution
structure modeling and design, the Rosetta software suite fares poorly on
deceptively small protein and RNA puzzles, some as small as four residues. To
illustrate these problems, this manuscript presents extensive Rosetta results
for four well-defined test cases: the 20-residue mini-protein Trp cage, an even
smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease
inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies,
several lines of evidence indicate that conformational sampling is not the
major bottleneck in modeling these small systems. Instead, approximations and
omissions in the Rosetta all-atom energy function currently preclude
discriminating experimentally observed conformations from de novo models at
atomic resolution. These molecular "puzzles" should serve as useful model
systems for developers wishing to make foundational improvements to this
powerful modeling suite.Comment: Published in PLoS One as a manuscript for the RosettaCon 2010 Special
Collectio
Achieving Extreme Resolution in Numerical Cosmology Using Adaptive Mesh Refinement: Resolving Primordial Star Formation
As an entry for the 2001 Gordon Bell Award in the "special" category, we
describe our 3-d, hybrid, adaptive mesh refinement (AMR) code, Enzo, designed
for high-resolution, multiphysics, cosmological structure formation
simulations. Our parallel implementation places no limit on the depth or
complexity of the adaptive grid hierarchy, allowing us to achieve unprecedented
spatial and temporal dynamic range. We report on a simulation of primordial
star formation which develops over 8000 subgrids at 34 levels of refinement to
achieve a local refinement of a factor of 10^12 in space and time. This allows
us to resolve the properties of the first stars which form in the universe
assuming standard physics and a standard cosmological model. Achieving extreme
resolution requires the use of 128-bit extended precision arithmetic (EPA) to
accurately specify the subgrid positions. We describe our EPA AMR
implementation on the IBM SP2 Blue Horizon system at the San Diego
Supercomputer Center.Comment: 23 pages, 5 figures. Peer reviewed technical paper accepted to the
proceedings of Supercomputing 2001. This entry was a Gordon Bell Prize
finalist. For more information visit http://www.TomAbel.com/GB
Automated Identification and Classification of Stereochemistry: Chirality and Double Bond Stereoisomerism
Stereoisomers have the same molecular formula and the same atom connectivity
and their existence can be related to the presence of different
three-dimensional arrangements. Stereoisomerism is of great importance in many
different fields since the molecular properties and biological effects of the
stereoisomers are often significantly different. Most drugs for example, are
often composed of a single stereoisomer of a compound, and while one of them
may have therapeutic effects on the body, another may be toxic. A challenging
task is the automatic detection of stereoisomers using line input
specifications such as SMILES or InChI since it requires information about
group theory (to distinguish stereoisomers using mathematical information about
its symmetry), topology and geometry of the molecule. There are several
software packages that include modules to handle stereochemistry, especially
the ones to name a chemical structure and/or view, edit and generate chemical
structure diagrams. However, there is a lack of software capable of
automatically analyzing a molecule represented as a graph and generate a
classification of the type of isomerism present in a given atom or bond.
Considering the importance of stereoisomerism when comparing chemical
structures, this report describes a computer program for analyzing and
processing steric information contained in a chemical structure represented as
a molecular graph and providing as output a binary classification of the isomer
type based on the recommended conventions. Due to the complexity of the
underlying issue, specification of stereochemical information is currently
limited to explicit stereochemistry and to the two most common types of
stereochemistry caused by asymmetry around carbon atoms: chiral atom and double
bond. A Webtool to automatically identify and classify stereochemistry is
available at http://nams.lasige.di.fc.ul.pt/tools.ph
- …