6,471 research outputs found
Ten Quick Tips for Using a Raspberry Pi
Much of biology (and, indeed, all of science) is becoming increasingly
computational. We tend to think of this in regards to algorithmic approaches
and software tools, as well as increased computing power. There has also been a
shift towards slicker, packaged solutions--which mirrors everyday life, from
smart phones to smart homes. As a result, it's all too easy to be detached from
the fundamental elements that power these changes, and to see solutions as
"black boxes". The major goal of this piece is to use the example of the
Raspberry Pi--a small, general-purpose computer--as the central component in a
highly developed ecosystem that brings together elements like external
hardware, sensors and controllers, state-of-the-art programming practices, and
basic electronics and physics, all in an approachable and useful way. External
devices and inputs are easily connected to the Pi, and it can, in turn, control
attached devices very simply. So whether you want to use it to manage
laboratory equipment, sample the environment, teach bioinformatics, control
your home security or make a model lunar lander, it's all built from the same
basic principles. To quote Richard Feynman, "What I cannot create, I do not
understand".Comment: 12 pages, 2 figure
Nine quick tips for efficient bioinformatics curriculum development and training.
Biomedical research is becoming increasingly data driven. New technologies that generate large-scale, complex data are continually emerging and evolving. As a result, there is a concurrent need for training researchers to use and understand new computational tools. Here we describe an efficient and effective approach to developing curriculum materials that can be deployed in a research environment to meet this need
Structural Property Prediction
While many good textbooks are available on Protein Structure, Molecular
Simulations, Thermodynamics and Bioinformatics methods in general, there is no
good introductory level book for the field of Structural Bioinformatics. This
book aims to give an introduction into Structural Bioinformatics, which is
where the previous topics meet to explore three dimensional protein structures
through computational analysis. We provide an overview of existing
computational techniques, to validate, simulate, predict and analyse protein
structures. More importantly, it will aim to provide practical knowledge about
how and when to use such techniques. We will consider proteins from three major
vantage points: Protein structure quantification, Protein structure prediction,
and Protein simulation & dynamics.
Some structural properties of proteins that are closely linked to their
function may be easier (or much faster) to predict from sequence than the
complete tertiary structure; for example, secondary structure, surface
accessibility, flexibility, disorder, interface regions or hydrophobic patches.
Serving as building blocks for the native protein fold, these structural
properties also contain important structural and functional information not
apparent from the amino acid sequence. Here, we will first give an introduction
into the application of machine learning for structural property prediction,
and explain the concepts of cross-validation and benchmarking. Next, we will
review various methods that incorporate knowledge of these concepts to predict
those structural properties, such as secondary structure, surface
accessibility, disorder and flexibility, and aggregation.Comment: editorial responsability: Juami H. M. van Gils, K. Anton Feenstra,
Sanne Abeln. This chapter is part of the book "Introduction to Protein
Structural Bioinformatics". The Preface arXiv:1801.09442 contains links to
all the (published) chapter
A Multiple Classifier System Identifies Novel Cannabinoid CB2 Receptor Ligands
open access articleDrugs have become an essential part of our lives due to their ability to improve people’s
health and quality of life. However, for many diseases, approved drugs are not yet available
or existing drugs have undesirable side effects, making the pharmaceutical industry strive to
discover new drugs and active compounds. The development of drugs is an expensive
process, which typically starts with the detection of candidate molecules (screening) for an
identified protein target. To this end, the use of high-performance screening techniques has
become a critical issue in order to palliate the high costs. Therefore, the popularity of
computer-based screening (often called virtual screening or in-silico screening) has rapidly
increased during the last decade. A wide variety of Machine Learning (ML) techniques has
been used in conjunction with chemical structure and physicochemical properties for
screening purposes including (i) simple classifiers, (ii) ensemble methods, and more recently
(iii) Multiple Classifier Systems (MCS). In this work, we apply an MCS for virtual screening
(D2-MCS) using circular fingerprints. We applied our technique to a dataset of cannabinoid
CB2 ligands obtained from the ChEMBL database. The HTS collection of Enamine
(1.834.362 compounds), was virtually screened to identify 48.432 potential active molecules
using D2-MCS. This list was subsequently clustered based on circular fingerprints and from
each cluster, the most active compound was maintained. From these, the top 60 were kept,
and 21 novel compounds were purchased. Experimental validation confirmed six highly
active hits (>50% displacement at 10 μM and subsequent Ki determination) and an
additional five medium active hits (>25% displacement at 10 μM). D2-MCS hence provided a
hit rate of 29% for highly active compounds and an overall hit rate of 52%
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
Finding functional motifs in protein sequences with deep learning and natural language models
Recently, prediction of structural/functional motifs in protein sequences takes advantage of powerful machine learning based approaches. Protein encoding adopts protein language models overpassing standard procedures. Different combinations of machine learning and encoding schemas are available for predicting different structural/functional motifs. Particularly interesting is the adoption of protein language models to encode proteins in addition to evolution information and physicochemical parameters. A thorough analysis of recent predictors developed for annotating transmembrane regions, sorting signals, lipidation and phosphorylation sites allows to investigate the state-of-the-art focusing on the relevance of protein language models for the different tasks. This highlights that more experimental data are necessary to exploit available powerful machine learning methods
- …