745 research outputs found
Audio Imputation Using the Non-negative Hidden Markov Model
Abstract. Missing data in corrupted audio recordings poses a challeng-ing problem for audio signal processing. In this paper we present an approach that allows us to estimate missing values in the time-frequency domain of audio signals. The proposed approach, based on the Non-negative Hidden Markov Model, enables more temporally coherent es-timation for the missing data by taking into account both the spectral and temporal information of the audio signal. This approach is able to reconstruct highly corrupted audio signals with large parts of the spectro-gram missing. We demonstrate this approach on real-world polyphonic music signals. The initial experimental results show that our approach has advantages over a previous missing data imputation method.
A Comprehensive Survey on Rare Event Prediction
Rare event prediction involves identifying and forecasting events with a low
probability using machine learning and data analysis. Due to the imbalanced
data distributions, where the frequency of common events vastly outweighs that
of rare events, it requires using specialized methods within each step of the
machine learning pipeline, i.e., from data processing to algorithms to
evaluation protocols. Predicting the occurrences of rare events is important
for real-world applications, such as Industry 4.0, and is an active research
area in statistical and machine learning. This paper comprehensively reviews
the current approaches for rare event prediction along four dimensions: rare
event data, data processing, algorithmic approaches, and evaluation approaches.
Specifically, we consider 73 datasets from different modalities (i.e.,
numerical, image, text, and audio), four major categories of data processing,
five major algorithmic groupings, and two broader evaluation approaches. This
paper aims to identify gaps in the current literature and highlight the
challenges of predicting rare events. It also suggests potential research
directions, which can help guide practitioners and researchers.Comment: 44 page
Studies on noise robust automatic speech recognition
Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
A step towards Advancing Digital Phenotyping In Mental Healthcare
Smartphones and wrist-wearable devices have infiltrated our lives in recent years. According
to published statistics, nearly 84% of the world’s population owns a smartphone,
and almost 10% own a wearable device today (2022). These devices continuously generate
various data sources from multiple sensors and apps, creating our digital phenotypes.
This opens new research opportunities, particularly in mental health care, which has previously
relied almost exclusively on self-reports of mental health symptoms.
Unobtrusive monitoring using patients’ devices may result in clinically valuable markers
that can improve diagnostic processes, tailor treatment choices, provide continuous
insights into their condition for actionable outcomes, such as early signs of relapse, and
develop new intervention models. However, these data sources must be translated into
meaningful, actionable features related to mental health to achieve their full potential.
In the mental health field, there is a great need and much to be gained from defining a
way to continuously assess the evolution of patients’ mental states, ideally in their everyday
environment, to support the monitoring and treatments by health care providers. A
smartphone-based approach may be valuable in gathering long-term objective data, aside
from the usually used self-ratings, to predict clinical state changes and investigate causal
inferences about state changes in patients (e.g., those with affective disorders).
Being objective does not imply that passive data collection is also perfect. It has several
challenges: some sensors generate vast volumes of data, and others cause significant
battery drain. Furthermore, the analysis of raw passive data is complicated, and collecting
certain types of data may interfere with the phenotype of interest. Nonetheless, machine
learning is predisposed to address these matters and advance psychiatry’s era of personalised
medicine.
This work aimed to advance the research efforts on mobile and wearable sensors for
mental health monitoring. We applied supervised and unsupervised machine learning
methods to model and understand mental disease evolution based on the digital phenotype
of patients and clinician assessments at the follow-up visits, which provide ground
truths. We needed to cope with regularly and irregularly sampled, high-dimensional, and
heterogeneous time series data susceptible to distortion and missingness. Hence, the developed
methods must be robust to these limitations and handle missing data properly.
Throughout the various projects presented here, we used probabilistic latent variable
models for data imputation and feature extraction, namely, mixture models (MM) and hidden
Markov models (HMM). These unsupervised models can learn even in the presence
of missing data by marginalising the missing values in the function of the present observations. Once the generative models are trained on the data set with missing values, they can
be used to generate samples for imputation. First, the most probable component/state has
to be found for each sample. Then, sampling from the most probable distribution yields
valid and robust parameter estimates and explicit imputed values for variables that can
be analysed as outcomes or predictors. The imputation process can be repeated several
times, creating multiple datasets, thereby accounting for the uncertainty in the imputed
values and implicitly augmenting the data. Moreover, they are robust to moderate deviations
of the observed data from the assumed underlying distribution and provide accurate
estimates even when missingness is high.
Depending on the properties of the data at hand, we employed feature extraction
methods combined with classical machine learning algorithms or deep learning-based
techniques for temporal modelling to predict various mental health outcomes - emotional
state, World Health Organisation Disability Assessment Schedule (WHODAS 2.0) functionality
scores and Generalised Anxiety Disorder-7 (GAD-7) scores, of psychiatric outpatients.
We mainly focused on one-size-fits-all models, as the labelled sample size per
patient was limited; however, in the mood prediction case, it was possible to apply personalised
models.
Integrating machines and algorithms into the clinical workflow require interpretability
to increase acceptance. Therefore, we also analysed feature importance by computing
Shapley additive explanations (SHAP) values. SHAP values provide an overview of essential
features in the machine learning models by designating the weight of predictability
of each feature positively or negatively to the target variable.
The provided solutions, as such, are proof of concept, which require further clinical
validation to be deployable in the clinical workflow. Still, the results are promising
and lay some foundations for future research and collaboration among clinicians, patients,
and computer scientists. They set the paths to advance future research prospects in
technology-based mental healthcare.En los últimos años, los smartphones y los dispositivos y pulseras inteligentes, comúnmente
conocidos como wearables, se han infiltrado en nuestras vidas. Según las estadÃsticas
publicadas a dÃa de hoy (2022), cerca del 84% de la población tiene un smartphone y
aproximadamente un 10% también posee un wearable. Estos dispositivos generan datos
de forma continua en base a distintos sensores y aplicaciones, creando asà nuestro fenotipo
digital. Estos datos abren nuevas vÃas de investigación, particularmente en el área de salud
mental, dónde las fuentes de datos han sido casi exclusivamente autoevaluaciones de sÃntomas
de salud mental.
Monitorizar de forma no intrusiva a los pacientes mediante sus dispositivos puede dar
lugar a marcadores valiosos en aplicación clÃnica. Esto permite mejorar los procesos de
diagnóstico, adaptar tratamientos, e incluso proporcionar información continua sobre el
estado de los pacientes, como signos tempranos de recaÃda, y hasta desarrollar nuevos
modelos de intervención. Aun asÃ, estos datos en crudo han de ser traducidos a datos
interpretables relacionados con la salud mental para conseguir un máximo rendimiento de
los mismos.
En salud mental existe una gran necesidad, y además hay mucho que ganar, de definir
cómo evaluar de forma continuada la evolución del estado mental de los pacientes en su
entorno cotidiano para ayudar en el tratamiento y seguimiento de los mismos por parte
de los profesionales sanitarios. En este ámbito, un enfoque basado en datos recopilados
desde sus smartphones puede ser valioso para recoger datos objetivos a largo plazo al
mismo tiempo que se acompaña de las autoevaluaciones utilizadas habitualmente. La
combinación de ambos tipos de datos puede ayudar a predecir los cambios en el estado
clÃnico de estos pacientes e investigar las relaciones causales sobre estos cambios (por
ejemplo, en aquellos que padecen trastornos afectivos).
Aunque la recogida de datos de forma pasiva tiene la ventaja de ser objetiva, también
implica varios retos. Por un lado, ciertos sensores generan grandes volúmenes de
datos, provocando un importante consumo de baterÃa. Además, el análisis de los datos
pasivos en crudo es complicado, y la recogida de ciertos tipos de datos puede interferir
con el fenotipo que se quiera analizar. No obstante, el machine learning o aprendizaje
automático, está predispuesto a resolver estas cuestiones y aportar avances en la medicina
personalizada aplicada a psiquiatrÃa.
Esta tesis tiene como objetivo avanzar en la investigación de los datos recogidos por
sensores de smartphones y wearables para la monitorización en salud mental. Para ello,
aplicamos métodos de aprendizaje automático supervisado y no supervisado para modelar y comprender la evolución de las enfermedades mentales basándonos en el fenotipo digital
de los pacientes. Estos resultados se comparan con las evaluaciones de los médicos en
las visitas de seguimiento, que proporcionan las etiquetas reales. Para aplicar estos métodos
hemos lidiado con datos provenientes de series temporales con alta dimensionalidad,
muestreados de forma regular e irregular, heterogéneos y, además, susceptibles a presentar
patrones de datos perdidos y/o distorsionados. Por lo tanto, los métodos desarrollados
deben ser resistentes a estas limitaciones y manejar adecuadamente los datos perdidos.
A lo largo de los distintos proyectos presentados en este trabajo, hemos utilizado
modelos probabilÃsticos de variables latentes para la imputación de datos y la extracción
de caracterÃsticas, como por ejemplo, Mixture Models (MM) y hidden Markov Models
(HMM). Estos modelos no supervisados pueden aprender incluso en presencia de datos
perdidos, marginalizando estos valores en función de las datos que sà han sido observados.
Una vez entrenados los modelos generativos en el conjunto de datos con valores
perdidos, pueden utilizarse para imputar dichos valores generando muestras. En primer
lugar, hay que encontrar el componente/estado más probable para cada muestra. Luego,
se muestrea de la distirbución más probable resultando en estimaciones de parámetros robustos
y válidos. Además, genera imputaciones explÃcitas que pueden ser tratadas como
resultados. Este proceso de imputación puede repetirse varias veces, creando múltiples
conjuntos de datos, con lo que se tiene en cuenta la incertidumbre de los valores imputados
y aumentándose asÃ, implÃcitamente, los datos. Además, estas imputaciones son
resistentes a desviaciones que puedan existir en los datos observados con respecto a la
distribución subyacente asumida y proporcionan estimaciones precisas incluso cuando la
falta de datos es elevada.
Dependiendo de las propiedades de los datos en cuestión, hemos usado métodos de extracción
de caracterÃsticas combinados con algoritmos clásicos de aprendizaje automático
o técnicas basadas en deep learning o aprendizaje profundo para el modelado temporal.
La finalidad de ambas opciones es ser capaces de predecir varios resultados de salud
mental/estado emocional, como la puntuación sobre el World Health Organisation Disability
Assessment Schedule (WHODAS 2.0), o las puntuaciones del generalised anxiety
disorder-7 (GAD-7) de pacientes psiquiátricos ambulatorios. Nos centramos principalmente
en modelos generalizados, es decir, no personalizados para cada paciente sino
explicativos para la mayorÃa, ya que el tamaño de muestras etiquetada por paciente es
limitado; sin embargo, en el caso de la predicción del estado de ánimo, puidmos aplicar
modelos personalizados.
Para que la integración de las máquinas y algoritmos dentro del flujo de trabajo clÃnico
sea aceptada, se requiere que los resultados sean interpretables. Por lo tanto, en este trabajo
también analizamos la importancia de las caracterÃsticas sacadas por cada algoritmo
en base a los valores de las explicaciones aditivas de Shapley (SHAP). Estos valores proporcionan
una visión general de las caracterÃsticas esenciales en los modelos de aprendizaje
automático designando el peso, positivo o negativo, de cada caracterÃstica en su
predictibilidad sobre la variable objetivo. Las soluciones aportadas en esta tesis, como tales, son pruebas de concepto, que requieren
una mayor validación clÃnica para poder ser desplegadas en el flujo de trabajo
clÃnico. Aun asÃ, los resultados son prometedores y sientan base para futuras investigaciones
y colaboraciones entre clÃnicos, pacientes y cientÃficos de datos. Éstas establecen
las guÃas para avanzar en las perspectivas de investigación futuras en la atención sanitaria
mental basada en la tecnologÃa.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: David RamÃrez GarcÃa.- Secretario: Alfredo Nazábal RenterÃa.- Vocal: MarÃa Luisa Barrigón Estéve
Probabilistic sequential matrix factorization
We introduce the probabilistic sequential matrix factorization (PSMF) method
for factorizing time-varying and non-stationary datasets consisting of
high-dimensional time-series. In particular, we consider nonlinear Gaussian
state-space models where sequential approximate inference results in the
factorization of a data matrix into a dictionary and time-varying coefficients
with potentially nonlinear Markovian dependencies. The assumed Markovian
structure on the coefficients enables us to encode temporal dependencies into a
low-dimensional feature space. The proposed inference method is solely based on
an approximate extended Kalman filtering scheme, which makes the resulting
method particularly efficient. PSMF can account for temporal nonlinearities
and, more importantly, can be used to calibrate and estimate generic
differentiable nonlinear subspace models. We also introduce a robust version of
PSMF, called rPSMF, which uses Student-t filters to handle model
misspecification. We show that PSMF can be used in multiple contexts: modeling
time series with a periodic subspace, robustifying changepoint detection
methods, and imputing missing data in several high-dimensional time-series,
such as measurements of pollutants across London.Comment: Accepted for publication at AISTATS 202
Twin Networks: Matching the Future for Sequence Generation
We propose a simple technique for encouraging generative RNNs to plan ahead.
We train a "backward" recurrent network to generate a given sequence in reverse
order, and we encourage states of the forward model to predict cotemporal
states of the backward model. The backward network is used only during
training, and plays no role during sampling or inference. We hypothesize that
our approach eases modeling of long-term dependencies by implicitly forcing the
forward states to hold information about the longer-term future (as contained
in the backward states). We show empirically that our approach achieves 9%
relative improvement for a speech recognition task, and achieves significant
improvement on a COCO caption generation task.Comment: 12 pages, 3 figures, published at ICLR 201
- …