4,570 research outputs found
Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
The article presents an application of Hidden Markov Models (HMMs) for
pattern recognition on genome sequences. We apply HMM for identifying genes
encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma
brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa
causative agents of sleeping sickness and several diseases in domestic and wild
animals. These parasites have a peculiar strategy to evade the host's immune
system that consists in periodically changing their predominant cellular
surface protein (VSG). The motivation for using patterns recognition methods to
identify these genes, instead of traditional homology based ones, is that the
levels of sequence identity (amino acid and DNA sequence) amongst these genes
is often below of what is considered reliable in these methods. Among pattern
recognition approaches, HMM are particularly suitable to tackle this problem
because they can handle more naturally the determination of gene edges. We
evaluate the performance of the model using different number of states in the
Markov model, as well as several performance metrics. The model is applied
using public genomic data. Our empirical results show that the VSG genes on T.
brucei can be safely identified (high sensitivity and low rate of false
positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications,
Springer. The article contains 23 pages, 4 figures, 8 tables and 51
reference
Temporal and Spatial Data Mining with Second-Order Hidden Models
In the frame of designing a knowledge discovery system, we have developed
stochastic models based on high-order hidden Markov models. These models are
capable to map sequences of data into a Markov chain in which the transitions
between the states depend on the \texttt{n} previous states according to the
order of the model. We study the process of achieving information extraction
fromspatial and temporal data by means of an unsupervised classification. We
use therefore a French national database related to the land use of a region,
named Teruti, which describes the land use both in the spatial and temporal
domain. Land-use categories (wheat, corn, forest, ...) are logged every year on
each site regularly spaced in the region. They constitute a temporal sequence
of images in which we look for spatial and temporal dependencies. The temporal
segmentation of the data is done by means of a second-order Hidden Markov Model
(\hmmd) that appears to have very good capabilities to locate stationary
segments, as shown in our previous work in speech recognition. Thespatial
classification is performed by defining a fractal scanning ofthe images with
the help of a Hilbert-Peano curve that introduces atotal order on the sites,
preserving the relation ofneighborhood between the sites. We show that the
\hmmd performs aclassification that is meaningful for the agronomists.Spatial
and temporal classification may be achieved simultaneously by means of a 2
levels \hmmd that measures the \aposteriori probability to map a temporal
sequence of images onto a set of hidden classes
Discrimination of Individual Tigers (\u3cem\u3ePanthera tigris\u3c/em\u3e) from Long Distance Roars
This paper investigates the extent of tiger (Panthera tigris) vocal individuality through both qualitative and quantitative approaches using long distance roars from six individual tigers at Omaha\u27s Henry Doorly Zoo in Omaha, NE. The framework for comparison across individuals includes statistical and discriminant function analysis across whole vocalization measures and statistical pattern classification using a hidden Markov model (HMM) with frame-based spectral features comprised of Greenwood frequency cepstral coefficients. Individual discrimination accuracy is evaluated as a function of spectral model complexity, represented by the number of mixtures in the underlying Gaussian mixture model (GMM), and temporal model complexity, represented by the number of sequential states in the HMM. Results indicate that the temporal pattern of the vocalization is the most significant factor in accurate discrimination. Overall baseline discrimination accuracy for this data set is about 70% using high level features without complex spectral or temporal models. Accuracy increases to about 80% when more complex spectral models (multiple mixture GMMs) are incorporated, and increases to a final accuracy of 90% when more detailed temporal models (10-state HMMs) are used. Classification accuracy is stable across a relatively wide range of configurations in terms of spectral and temporal model resolution
Improving speech recognition by revising gated recurrent units
Speech recognition is largely taking advantage of deep learning, showing that
substantial benefits can be obtained by modern Recurrent Neural Networks
(RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which
typically reach state-of-the-art performance in many tasks thanks to their
ability to learn long-term dependencies and robustness to vanishing gradients.
Nevertheless, LSTMs have a rather complex design with three multiplicative
gates, that might impair their efficient implementation. An attempt to simplify
LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just
two multiplicative gates.
This paper builds on these efforts by further revising GRUs and proposing a
simplified architecture potentially more suitable for speech recognition. The
contribution of this work is two-fold. First, we suggest to remove the reset
gate in the GRU design, resulting in a more efficient single-gate architecture.
Second, we propose to replace tanh with ReLU activations in the state update
equations. Results show that, in our implementation, the revised architecture
reduces the per-epoch training time with more than 30% and consistently
improves recognition performance across different tasks, input features, and
noisy conditions when compared to a standard GRU
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Concept discovery innovations in law enforcement: a perspective.
In the past decades, the amount of information available to law enforcement agencies has increased significantly. Most of this information is in textual form, however analyses have mainly focused on the structured data. In this paper, we give an overview of the concept discovery projects at the Amsterdam-Amstelland police where Formal Concept Analysis (FCA) is being used as text mining instrument. FCA is combined with statistical techniques such as Hidden Markov Models (HMM) and Emergent Self Organizing Maps (ESOM). The combination of this concept discovery and refinement technique with statistical techniques for analyzing high-dimensional data not only resulted in new insights but often in actual improvements of the investigation procedures.Formal concept analysis; Intelligence led policing; Knowledge discovery;
- âŠ