1,016 research outputs found
A Generic Framework for Hidden Markov Models on Biomedical Data
Background: Biomedical data are usually collections of longitudinal data
assessed at certain points in time. Clinical observations assess the presences
and severity of symptoms, which are the basis for description and modeling of
disease progression. Deciphering potential underlying unknowns solely from the
distinct observation would substantially improve the understanding of
pathological cascades. Hidden Markov Models (HMMs) have been successfully
applied to the processing of possibly noisy continuous signals. The aim was to
improve the application HMMs to multivariate time-series of categorically
distributed data. Here, we used HHMs to study prediction of the loss of free
walking ability as one major clinical deterioration in the most common
autosomal dominantly inherited ataxia disorder worldwide. We used HHMs to
investigate the prediction of loss of the ability to walk freely, representing
a major clinical deterioration in the most common autosomal-dominant inherited
ataxia disorder worldwide.
Results: We present a prediction pipeline which processes data paired with a
configuration file, enabling to construct, validate and query a fully
parameterized HMM-based model. In particular, we provide a theoretical and
practical framework for multivariate time-series inference based on HMMs that
includes constructing multiple HMMs, each to predict a particular observable
variable. Our analysis is done on random data, but also on biomedical data
based on Spinocerebellar ataxia type 3 disease.
Conclusions: HHMs are a promising approach to study biomedical data that
naturally are represented as multivariate time-series. Our implementation of a
HHMs framework is publicly available and can easily be adapted for further
applications
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing.published_or_final_versio
Using Hidden Markov Models to Segment and Classify Wrist Motions Related to Eating Activities
Advances in body sensing and mobile health technology have created new opportunities for empowering people to take a more active role in managing their health. Measurements of dietary intake are commonly used for the study and treatment of obesity. However, the most widely used tools rely upon self-report and require considerable manual effort, leading to underreporting of consumption, non-compliance, and discontinued use over the long term. We are investigating the use of wrist-worn accelerometers and gyroscopes to automatically recognize eating gestures. In order to improve recognition accuracy, we studied the sequential ependency of actions during eating. In chapter 2 we first undertook the task of finding a set of wrist motion gestures which were small and descriptive enough to model the actions performed by an eater during consumption of a meal. We found a set of four actions: rest, utensiling, bite, and drink; any alternative gestures is referred as the other gesture. The stability of the definitions for gestures was evaluated using an inter-rater reliability test. Later, in chapter 3, 25 meals were hand labeled and used to study the existence of sequential dependence of the gestures. To study this, three types of classifiers were built: 1) a K-nearest neighbor classifier which uses no sequential context, 2) a hidden Markov model (HMM) which captures the sequential context of sub-gesture motions, and 3) HMMs that model inter-gesture sequential dependencies. We built first-order to sixth-order HMMs to evaluate the usefulness of increasing amounts of sequential dependence to aid recognition. The first two were our baseline algorithms. We found that the adding knowledge of the sequential dependence of gestures achieved an accuracy of 96.5%, which is an improvement of 20.7% and 12.2% over the KNN and sub-gesture HMM. Lastly, in chapter 4, we automatically segmented a continuous wrist motion signal and assessed its classification performance for each of the three classifiers. Again, the knowledge of sequential dependence enhances the recognition of gestures in unsegmented data, achieving 90% accuracy and improving 30.1% and 18.9% over the KNN and the sub-gesture HMM
A motion-based approach for audio-visual automatic speech recognition
The research work presented in this thesis introduces novel approaches for both visual
region of interest extraction and visual feature extraction for use in audio-visual
automatic speech recognition. In particular, the speaker‘s movement that occurs
during speech is used to isolate the mouth region in video sequences and motionbased
features obtained from this region are used to provide new visual features for
audio-visual automatic speech recognition. The mouth region extraction approach
proposed in this work is shown to give superior performance compared with existing
colour-based lip segmentation methods. The new features are obtained from three
separate representations of motion in the region of interest, namely the difference in
luminance between successive images, block matching based motion vectors and
optical flow. The new visual features are found to improve visual-only and audiovisual
speech recognition performance when compared with the commonly-used
appearance feature-based methods.
In addition, a novel approach is proposed for visual feature extraction from either the
discrete cosine transform or discrete wavelet transform representations of the mouth
region of the speaker. In this work, the image transform is explored from a new
viewpoint of data discrimination; in contrast to the more conventional data
preservation viewpoint. The main findings of this work are that audio-visual
automatic speech recognition systems using the new features extracted from the
frequency bands selected according to their discriminatory abilities generally
outperform those using features designed for data preservation.
To establish the noise robustness of the new features proposed in this work, their
performance has been studied in presence of a range of different types of noise and at
various signal-to-noise ratios. In these experiments, the audio-visual automatic speech
recognition systems based on the new approaches were found to give superior
performance both to audio-visual systems using appearance based features and to
audio-only speech recognition systems
Towards gestural understanding for intelligent robots
Fritsch JN. Towards gestural understanding for intelligent robots. Bielefeld: Universität Bielefeld; 2012.A strong driving force of scientific progress in the technical sciences is the quest for systems that assist humans in their daily life and make their life easier and more enjoyable. Nowadays smartphones are probably the most typical instances of such systems. Another class of systems that is getting increasing attention are intelligent robots. Instead of offering a smartphone touch screen to select actions, these systems are intended to offer a more natural human-machine interface to their users. Out of the large range of actions performed by humans, gestures performed with the hands play a very important role especially when humans interact with their direct surrounding like, e.g., pointing to an object or manipulating it. Consequently, a robot has to understand such gestures to offer an intuitive interface. Gestural understanding is, therefore, a key capability on the way to intelligent robots.
This book deals with vision-based approaches for gestural understanding. Over the past two decades, this has been an intensive field of research which has resulted in a variety of algorithms to analyze human hand motions. Following a categorization of different gesture types and a review of other sensing techniques, the design of vision systems that achieve hand gesture understanding for intelligent robots is analyzed. For each of the individual algorithmic steps – hand detection, hand tracking, and trajectory-based gesture recognition – a separate Chapter introduces common techniques and algorithms and provides example methods. The resulting recognition algorithms are considering gestures in isolation and are often not sufficient for interacting with a robot who can only understand such gestures when incorporating the context like, e.g., what object was pointed at or manipulated.
Going beyond a purely trajectory-based gesture recognition by incorporating context is an important prerequisite to achieve gesture understanding and is addressed explicitly in a separate Chapter of this book. Two types of context, user-provided context and situational context, are reviewed and existing approaches to incorporate context for gestural understanding are reviewed. Example approaches for both context types provide a deeper algorithmic insight into this field of research. An overview of recent robots capable of gesture recognition and understanding summarizes the currently realized human-robot interaction quality.
The approaches for gesture understanding covered in this book are manually designed while humans learn to recognize gestures automatically during growing up. Promising research targeted at analyzing developmental learning in children in order to mimic this capability in technical systems is highlighted in the last Chapter completing this book as this research direction may be highly influential for creating future gesture understanding systems
Markov modelling on human activity recognition
Human Activity Recognition (HAR) is a research topic with a relevant interest
in the machine learning community. Understanding the activities that a person
is performing and the context where they perform them has a huge importance
in multiple applications, including medical research, security or patient monitoring.
The improvement of the smart-phones and inertial sensors technologies has
lead to the implementation of activity recognition systems based on these devices,
either by themselves or combining their information with other sensors. Since
humans perform their daily activities sequentially in a specific order, there exist
some temporal information in the physical activities that characterize the different
human behaviour patterns. However, the most popular approach in HAR is to assume
that the data is conditionally independent, segmenting the data in different
windows and extracting the most relevant features from each segment.
In this thesis we employ the temporal information explicitly, where the raw data
provided by the wearable sensors is fed to the training models. Thus, we study
how to perform a Markov modelling implementation of a long-term monitoring
HAR system with wearable sensors, and we address the existing open problems
arising while processing and training the data, combining different sensors and
performing the long-term monitoring with battery powered devices.
Employing directly the signals from the sensors to perform the recognition can
lead to problems due to misplacements of the sensors on the body. We propose an
orientation correction algorithm based on quaternions to process the signals and
find a common frame reference for all of them independently on the position of the
sensors or their orientation. This algorithm allows for a better activity recognition
when feed to the classification algorithm when compared with similar approaches,
and the quaternion transformations allow for a faster implementation.
One of the most popular algorithms to model time series data are Hidden
Markov Models (HMMs) and the training of the parameters of the model is performed
using the Baum-Welch algorithm. However, this algorithm converges to
local maxima and the multiple initializations needed to avoid them makes it computationally expensive for large datasets. We propose employing the theory of
spectral learning to develop a discriminative HMM that avoids the problems of
the Baum-Welch algorithm, outperforming it in both complexity and computational
cost.
When we implement a HAR system with several sensors, we need to consider
how to perform the combination of the information provided by them. Data fusion
can be performed either at signal level or at classification level. When performed
at classification level, the usual approach is to combine the decisions of multiple
classifiers on the body to obtain the performed activities. However, in the simple
case with two classifiers, which can be a practical implementation of a HAR
system, the combination reduces to selecting the most discriminative sensor, and
no performance improvement is obtained against the single sensor implementation.
In this thesis, we propose to employ the soft-outputs of the classifiers in
the combination and we develop a method that considers the Markovian structure
of the ground truth to capture the dynamics of the activities. We will show
that this method improves the recognition of the activities with respect to other
combination methods and with respect to the signal fusion case.
Finally, in long-term monitoring HAR systems with wearable sensors we need
to address the energy efficiency problem that is inherent to battery powered devices.
The most common approach to improve the energy efficiency of such devices
is to reduce the amount of data acquired by the wearable sensors. In that sense,
we introduce a general framework for the energy efficiency of a system with multiple
sensors under several energy restrictions. We propose a sensing strategy to
optimize the temporal data acquisition based on computing the uncertainty of
the activities given the data and adapt the acquisition actively. Furthermore, we
develop a sensor selection algorithm based on Bayesian Experimental Design to
obtain the best configuration of sensors that performs the activity recognition accurately, allowing for a further improvement on the energy efficiency by limiting
the number of sensors employed in the acquisition.El reconocimiento de actividades humanas (HAR) es un tema de investigación
con una gran relevancia para la comunidad de aprendizaje máquina. Comprender
las actividades que una persona está realizando y el contexto en el que las
realiza es de gran importancia en multitud de aplicaciones, entre las que se incluyen
investigación médica, seguridad o monitorización de pacientes. La mejora
en los smart-phones y en las tecnologías de sensores inerciales han dado lugar a
la implementación de sistemas de reconocimiento de actividades basado en dichos
dispositivos, ya sea por si mismos o combinándolos con otro tipo de sensores. Ya
que los seres humanos realizan sus actividades diarias de manera secuencial en un
orden específico, existe una cierta información temporal en las actividades físicas
que caracterizan los diferentes patrones de comportamiento, Sin embargo, los algoritmos
más comunes asumen que los datos son condicionalmente independientes,
segmentándolos en diferentes ventanas y extrayendo las características más relevantes
de cada segmento.
En esta tesis utilizamos la información temporal de manera explícita, usando
los datos crudos de los sensores como entrada de los modelos de entrenamiento. Por
ello, analizamos como implementar modelos Markovianos para el reconocimiento
de actividades en monitorizaciones de larga duración con sensores wearable, y
tratamos los problemas existentes al procesar y entrenar los datos, al combinar
diferentes sensores y al realizar adquisiciones de larga duración con dispositivos
alimentados por baterías.
Emplear directamente las señales de los sensores para realizar el reconocimiento
de actividades puede dar lugar a problemas debido a la incorrecta colocación de
los sensores en el cuerpo. Proponemos un algoritmo de corrección de la orientación
basado en quaterniones para procesar las señales y encontrar un marco de referencia
común independiente de la posición de los sensores y su orientación. Este
algoritmo permite obtener un mejor reconocimiento de actividades al emplearlo
en conjunto con un algoritmo de clasificación, cuando se compara con modelos similares. Además, la transformación de la orientación basada en quaterniones da
lugar a una implementación más rápida.
Uno de los algoritmos más populares para modelar series temporales son los
modelos ocultos de Markov, donde los parámetros del modelo se entrenan usando
el algoritmo de Baum-Welch. Sin embargo, este algoritmo converge en general
a máximos locales, y las múltiples inicializaciones que se necesitan en su implementación lo convierten en un algoritmo de gran carga computacional cuando se
emplea con bases de datos de un volumen considerable. Proponemos emplear la
teoría de aprendizaje espectral para desarrollar un HMM discriminativo que evita
los problemas del algoritmo de Baum-Welch, superándolo tanto en complejidad
como en coste computacional. Cuando se implementa un sistema de reconocimiento de actividades con múltiples
sensores, necesitamos considerar cómo realizar la combinación de la información que proporcionan. La fusión de los datos, se puede realizar tanto a nivel
de señal como a nivel de clasificación. Cuando se realiza a nivel de clasificación, lo
normal es combinar las decisiones de múltiples clasificadores colocados en el cuerpo
para obtener las actividades que se están realizando. Sin embargo, en un caso simple
donde únicamente se emplean dos sensores, que podría ser una implantación
habitual de un sistema de reconocimiento de actividades, la combinación se reduce
a seleccionar el sensor más discriminativo, y no se obtiene mejora con respecto a
emplear un único sensor. En esta tesis proponemos emplear salidas blandas de
los clasificadores para la combinación, desarrollando un modelo que considera la
estructura Markoviana de los datos reales para capturar la dinámica de las actividades.
Mostraremos como este método mejora el reconocimiento de actividades
con respecto a otros métodos de combinación de clasificadores y con respecto a la
fusión de los datos a nivel de señal.
Por último, abordamos el problema de la eficiencia energética de dispositivos
alimentados por baterías en sistemas de reconocimiento de actividades de larga
duración. La aproximación más habitual para mejorar la eficiencia energética consiste
en reducir el volumen de datos que adquieren los sensores. En ese sentido, introducimos un marco general para tratar el problema de la eficiencia energética
en un sistema con múltiples sensores bajo ciertas restricciones de energética. Proponemos
una estrategia de adquisición activa para optimizar el sistema temporal
de recogida de datos, basándonos en la incertidumbre de las actividades dados los
datos que conocemos. Además, desarrollamos un algoritmo de selección de sensores
basado diseño experimental Bayesiano y así obtener la mejor configuración
para realizar el reconocimiento de actividades limitando el número de sensores
empleados y al mismo tiempo reduciendo su consumo energético.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Luis Ignacio Santamaría Caballero.- Secretario: Pablo Martínez Olmos.- Vocal: Alberto Suárez Gonzále
- …