89,558 research outputs found
RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison
Many algorithms for sequence analysis rely on word matching or word
statistics. Often, these approaches can be improved if binary patterns
representing match and don't-care positions are used as a filter, such that
only those positions of words are considered that correspond to the match
positions of the patterns. The performance of these approaches, however,
depends on the underlying patterns. Herein, we show that the overlap complexity
of a pattern set that was introduced by Ilie and Ilie is closely related to the
variance of the number of matches between two evolutionarily related sequences
with respect to this pattern set. We propose a modified hill-climbing algorithm
to optimize pattern sets for database searching, read mapping and
alignment-free sequence comparison of nucleic-acid sequences; our
implementation of this algorithm is called rasbhari. Depending on the
application at hand, rasbhari can either minimize the overlap complexity of
pattern sets, maximize their sensitivity in database searching or minimize the
variance of the number of pattern-based matches in alignment-free sequence
comparison. We show that, for database searching, rasbhari generates pattern
sets with slightly higher sensitivity than existing approaches. In our Spaced
Words approach to alignment-free sequence comparison, pattern sets calculated
with rasbhari led to more accurate estimates of phylogenetic distances than the
randomly generated pattern sets that we previously used. Finally, we used
rasbhari to generate patterns for short read classification with CLARK-S. Here
too, the sensitivity of the results could be improved, compared to the default
patterns of the program. We integrated rasbhari into Spaced Words; the source
code of rasbhari is freely available at http://rasbhari.gobics.de
RNA secondary structure prediction from multi-aligned sequences
It has been well accepted that the RNA secondary structures of most
functional non-coding RNAs (ncRNAs) are closely related to their functions and
are conserved during evolution. Hence, prediction of conserved secondary
structures from evolutionarily related sequences is one important task in RNA
bioinformatics; the methods are useful not only to further functional analyses
of ncRNAs but also to improve the accuracy of secondary structure predictions
and to find novel functional RNAs from the genome. In this review, I focus on
common secondary structure prediction from a given aligned RNA sequence, in
which one secondary structure whose length is equal to that of the input
alignment is predicted. I systematically review and classify existing tools and
algorithms for the problem, by utilizing the information employed in the tools
and by adopting a unified viewpoint based on maximum expected gain (MEG)
estimators. I believe that this classification will allow a deeper
understanding of each tool and provide users with useful information for
selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in
a chapter of the book `Methods in Molecular Biology'. Note that this version
of the manuscript may differ from the published versio
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Spaced seeds improve k-mer-based metagenomic classification
Metagenomics is a powerful approach to study genetic content of environmental
samples that has been strongly promoted by NGS technologies. To cope with
massive data involved in modern metagenomic projects, recent tools [4, 39] rely
on the analysis of k-mers shared between the read to be classified and sampled
reference genomes. Within this general framework, we show in this work that
spaced seeds provide a significant improvement of classification accuracy as
opposed to traditional contiguous k-mers. We support this thesis through a
series a different computational experiments, including simulations of
large-scale metagenomic projects. Scripts and programs used in this study, as
well as supplementary material, are available from
http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.Comment: 23 page
Traffic monitoring using image processing : a thesis presented in partial fulfillment of the requirements for the degree of Master of Engineering in Information and Telecommunications Engineering at Massey University, Palmerston North, New Zealand
Traffic monitoring involves the collection of data describing the characteristics of vehicles and their movements. Such data may be used for automatic tolls, congestion and incident detection, law enforcement, and road capacity planning etc. With the recent advances in Computer Vision technology, videos can be analysed automatically and relevant information can be extracted for particular applications. Automatic surveillance using video cameras with image processing technique is becoming a powerful and useful technology for traffic monitoring. In this research project, a video image processing system that has the potential to be developed for real-time application is developed for traffic monitoring including vehicle tracking, counting, and classification. A heuristic approach is applied in developing this system. The system is divided into several parts, and several different functional components have been built and tested using some traffic video sequences. Evaluations are carried out to show that this system is robust and can be developed towards real-time applications
MRI-only based radiotherapy treatment planning for the rat brain on a Small Animal Radiation Research Platform (SARRP)
Computed tomography (CT) is the standard imaging modality in radiation therapy treatment planning (RTP). However, magnetic resonance (MR) imaging provides superior soft tissue contrast, increasing the precision of target volume selection. We present MR-only based RTP for a rat brain on a small animal radiation research platform (SARRP) using probabilistic voxel classification with multiple MR sequences. Six rat heads were imaged, each with one CT and five MR sequences. The MR sequences were: T1-weighted, T2-weighted, zero-echo time (ZTE), and two ultra-short echo time sequences with 20 mu s (UTE1) and 2 ms (UTE2) echo times. CT data were manually segmented into air, soft tissue, and bone to obtain the RTP reference. Bias field corrected MR images were automatically segmented into the same tissue classes using a fuzzy c-means segmentation algorithm with multiple images as input. Similarities between segmented CT and automatic segmented MR (ASMR) images were evaluated using Dice coefficient. Three ASMR images with high similarity index were used for further RTP. Three beam arrangements were investigated. Dose distributions were compared by analysing dose volume histograms. The highest Dice coefficients were obtained for the ZTE-UTE2 combination and for the T1-UTE1-T2 combination when ZTE was unavailable. Both combinations, along with UTE1-UTE2, often used to generate ASMR images, were used for further RTP. Using 1 beam, MR based RTP underestimated the dose to be delivered to the target (range: 1.4%-7.6%). When more complex beam configurations were used, the calculated dose using the ZTE-UTE2 combination was the most accurate, with 0.7% deviation from CT, compared to 0.8% for T1-UTE1-T2 and 1.7% for UTE1-UTE2. The presented MR-only based workflow for RTP on a SARRP enables both accurate organ delineation and dose calculations using multiple MR sequences. This method can be useful in longitudinal studies where CT's cumulative radiation dose might contribute to the total dose
- …