114 research outputs found
Learnable Front Ends Based on Temporal Modulation for Music Tagging
While end-to-end systems are becoming popular in auditory signal processing
including automatic music tagging, models using raw audio as input needs a
large amount of data and computational resources without domain knowledge.
Inspired by the fact that temporal modulation is regarded as an essential
component in auditory perception, we introduce the Temporal Modulation Neural
Network (TMNN) that combines Mel-like data-driven front ends and temporal
modulation filters with a simple ResNet back end. The structure includes a set
of temporal modulation filters to capture long-term patterns in all frequency
channels. Experimental results show that the proposed front ends surpass
state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music
tagging, and they are also helpful for keyword spotting on speech commands.
Moreover, the model performance for each tag suggests that genre or instrument
tags with complex rhythm and mood tags can especially be improved with temporal
modulation.Comment: Submitted to ICASSP 202
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care
The COVID-19 pandemic has posed a heavy burden to the healthcare system
worldwide and caused huge social disruption and economic loss. Many deep
learning models have been proposed to conduct clinical predictive tasks such as
mortality prediction for COVID-19 patients in intensive care units using
Electronic Health Record (EHR) data. Despite their initial success in certain
clinical applications, there is currently a lack of benchmarking results to
achieve a fair comparison so that we can select the optimal model for clinical
use. Furthermore, there is a discrepancy between the formulation of traditional
prediction tasks and real-world clinical practice in intensive care. To fill
these gaps, we propose two clinical prediction tasks, Outcome-specific
length-of-stay prediction and Early mortality prediction for COVID-19 patients
in intensive care units. The two tasks are adapted from the naive
length-of-stay and mortality prediction tasks to accommodate the clinical
practice for COVID-19 patients. We propose fair, detailed, open-source
data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models
on two tasks, including 5 machine learning models, 6 basic deep learning models
and 6 deep learning predictive models specifically designed for EHR data. We
provide benchmarking results using data from two real-world COVID-19 EHR
datasets. One dataset is publicly available without needing any inquiry and
another dataset can be accessed on request. We provide fair, reproducible
benchmarking results for two tasks. We deploy all experiment results and models
on an online platform. We also allow clinicians and researchers to upload their
data to the platform and get quick prediction results using our trained models.
We hope our efforts can further facilitate deep learning and machine learning
research for COVID-19 predictive modeling.Comment: Junyi Gao, Yinghao Zhu and Wenqing Wang contributed equall
Stereoscopic video quality assessment based on 3D convolutional neural networks
The research of stereoscopic video quality assessment (SVQA) plays an important role for promoting the development of stereoscopic video system. Existing SVQA metrics rely on hand-crafted features, which is inaccurate and time-consuming because of the diversity and complexity of stereoscopic video distortion. This paper introduces a 3D convolutional neural networks (CNN) based SVQA framework that can model not only local spatio-temporal information but also global temporal information with cubic difference video patches as input. First, instead of using hand-crafted features, we design a 3D CNN architecture to automatically and effectively capture local spatio-temporal features. Then we employ a quality score fusion strategy considering global temporal clues to obtain final video-level predicted score. Extensive experiments conducted on two public stereoscopic video quality datasets show that the proposed method correlates highly with human perception and outperforms state-of-the-art methods by a large margin. We also show that our 3D CNN features have more desirable property for SVQA than hand-crafted features in previous methods, and our 3D CNN features together with support vector regression (SVR) can further boost the performance. In addition, with no complex preprocessing and GPU acceleration, our proposed method is demonstrated computationally efficient and easy to use
Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration to Mitigate EHR Data Sparsity
Electronic Health Record (EHR) data frequently exhibits sparse
characteristics, posing challenges for predictive modeling. Current direct
imputation such as matrix imputation approaches hinge on referencing analogous
rows or columns to complete raw missing data and do not differentiate between
imputed and actual values. As a result, models may inadvertently incorporate
irrelevant or deceptive information with respect to the prediction objective,
thereby compromising the efficacy of downstream performance. While some methods
strive to recalibrate or augment EHR embeddings after direct imputation, they
often mistakenly prioritize imputed features. This misprioritization can
introduce biases or inaccuracies into the model. To tackle these issues, our
work resorts to indirect imputation, where we leverage prototype
representations from similar patients to obtain a denser embedding. Recognizing
the limitation that missing features are typically treated the same as present
ones when measuring similar patients, our approach designs a feature confidence
learner module. This module is sensitive to the missing feature status,
enabling the model to better judge the reliability of each feature. Moreover,
we propose a novel patient similarity metric that takes feature confidence into
account, ensuring that evaluations are not based merely on potentially
inaccurate imputed values. Consequently, our work captures dense prototype
patient representations with feature-missing-aware calibration process.
Comprehensive experiments demonstrate that designed model surpasses established
EHR-focused models with a statistically significant improvement on MIMIC-III
and MIMIC-IV datasets in-hospital mortality outcome prediction task. The code
is publicly available at \url{https://github.com/yhzhu99/SparseEHR} to assure
the reproducibility
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Large Language Models (LLMs) have shown immense potential in multimodal
applications, yet the convergence of textual and musical domains remains
relatively unexplored. To address this gap, we present MusiLingo, a novel
system for music caption generation and music-related query responses.
MusiLingo employs a single projection layer to align music representations from
the pre-trained frozen music audio model MERT with the frozen LLaMA language
model, bridging the gap between music audio and textual contexts. We train it
on an extensive music caption dataset and fine-tune it with instructional data.
Due to the scarcity of high-quality music Q&A datasets, we created the
MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music
inquiries. Empirical evaluations demonstrate its competitive performance in
generating music captions and composing music-related Q&A pairs. Our introduced
dataset enables notable advancements beyond previous ones
Mortality Prediction with Adaptive Feature Importance Recalibration for Peritoneal Dialysis Patients
The study aims to develop AICare, an interpretable mortality prediction model, using Electronic Medical Records (EMR) from follow-up visits for End-Stage Renal Disease (ESRD) patients. AICare includes a multi-channel feature extraction module and an adaptive feature importance recalibration module. It integrates dynamic records and static features to perform a personalized health context representation learning. The dataset encompasses 13,091 visits and demographic data of 656 peritoneal dialysis (PD) patients spanning 12 years. An additional public dataset of 4,789 visits from 1,363 hemodialysis (HD) patients is also considered. AI Care outperforms traditional deep learning models in mortality prediction while retaining interpretability. It uncovers mortality-feature relationships, variations in feature importance, and provides reference values. An AI-Doctor interaction system is developed for visualizing patients’ health trajectories and risk indicators
A comprehensive benchmark for COVID-19 predictive modeling using electronic health records in intensive care
The COVID-19 pandemic highlighted the need for predictive deep-learning models in health care. However, practical prediction task design, fair comparison, and model selection for clinical applications remain a challenge. To address this, we introduce and evaluate two new prediction tasks?outcome-specific length-of-stay and early-mortality prediction for COVID-19 patients in intensive care?which better reflect clinical realities. We developed evaluation metrics, model adaptation designs, and open-source data preprocessing pipelines for these tasks while also evaluating 18 predictive models, including clinical scoring methods and traditional machine-learning, basic deep-learning, and advanced deep-learning models, tailored for electronic health record (EHR) data. Benchmarking results from two real-world COVID-19 EHR datasets are provided, and all results and trained models have been released on an online platform for use by clinicians and researchers. Our efforts contribute to the advancement of deep-learning and machine-learning research in pandemic predictive modeling
- …