9,895 research outputs found
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk
Objective: To compare different deep learning architectures for predicting
the risk of readmission within 30 days of discharge from the intensive care
unit (ICU). The interpretability of attention-based models is leveraged to
describe patients-at-risk. Methods: Several deep learning architectures making
use of attention mechanisms, recurrent layers, neural ordinary differential
equations (ODEs), and medical concept embeddings with time-aware attention were
trained using publicly available electronic medical record data (MIMIC-III)
associated with 45,298 ICU stays for 33,150 patients. Bayesian inference was
used to compute the posterior over weights of an attention-based model. Odds
ratios associated with an increased risk of readmission were computed for
static variables. Diagnoses, procedures, medications, and vital signs were
ranked according to the associated risk of readmission. Results: A recurrent
neural network, with time dynamics of code embeddings computed by neural ODEs,
achieved the highest average precision of 0.331 (AUROC: 0.739, F1-Score:
0.372). Predictive accuracy was comparable across neural network architectures.
Groups of patients at risk included those suffering from infectious
complications, with chronic or progressive conditions, and for whom standard
medical care was not suitable. Conclusions: Attention-based networks may be
preferable to recurrent networks if an interpretable model is required, at only
marginal cost in predictive accuracy
Video Storytelling: Textual Summaries for Events
Bridging vision and natural language is a longstanding goal in computer
vision and multimedia research. While earlier works focus on generating a
single-sentence description for visual content, recent works have studied
paragraph generation. In this work, we introduce the problem of video
storytelling, which aims at generating coherent and succinct stories for long
videos. Video storytelling introduces new challenges, mainly due to the
diversity of the story and the length and complexity of the video. We propose
novel methods to address the challenges. First, we propose a context-aware
framework for multimodal embedding learning, where we design a Residual
Bidirectional Recurrent Neural Network to leverage contextual information from
past and future. Second, we propose a Narrator model to discover the underlying
storyline. The Narrator is formulated as a reinforcement learning agent which
is trained by directly optimizing the textual metric of the generated story. We
evaluate our method on the Video Story dataset, a new dataset that we have
collected to enable the study. We compare our method with multiple
state-of-the-art baselines, and show that our method achieves better
performance, in terms of quantitative measures and user study.Comment: Published in IEEE Transactions on Multimedi
Recurrent Pixel Embedding for Instance Grouping
We introduce a differentiable, end-to-end trainable framework for solving
pixel-level grouping problems such as instance segmentation consisting of two
novel components. First, we regress pixels into a hyper-spherical embedding
space so that pixels from the same group have high cosine similarity while
those from different groups have similarity below a specified margin. We
analyze the choice of embedding dimension and margin, relating them to
theoretical results on the problem of distributing points uniformly on the
sphere. Second, to group instances, we utilize a variant of mean-shift
clustering, implemented as a recurrent neural network parameterized by kernel
bandwidth. This recurrent grouping module is differentiable, enjoys convergent
dynamics and probabilistic interpretability. Backpropagating the group-weighted
loss through this module allows learning to focus on only correcting embedding
errors that won't be resolved during subsequent clustering. Our framework,
while conceptually simple and theoretically abundant, is also practically
effective and computationally efficient. We demonstrate substantial
improvements over state-of-the-art instance segmentation for object proposal
generation, as well as demonstrating the benefits of grouping loss for
classification tasks such as boundary detection and semantic segmentation
Scaling Speech Enhancement in Unseen Environments with Noise Embeddings
We address the problem of speech enhancement generalisation to unseen
environments by performing two manipulations. First, we embed an additional
recording from the environment alone, and use this embedding to alter
activations in the main enhancement subnetwork. Second, we scale the number of
noise environments present at training time to 16,784 different environments.
Experiment results show that both manipulations reduce word error rates of a
pretrained speech recognition system and improve enhancement quality according
to a number of performance measures. Specifically, our best model reduces the
word error rate from 34.04% on noisy speech to 15.46% on the enhanced speech.
Enhanced audio samples can be found in
https://speechenhancement.page.link/samples
Salience and Market-aware Skill Extraction for Job Targeting
At LinkedIn, we want to create economic opportunity for everyone in the
global workforce. To make this happen, LinkedIn offers a reactive Job Search
system, and a proactive Jobs You May Be Interested In (JYMBII) system to match
the best candidates with their dream jobs. One of the most challenging tasks
for developing these systems is to properly extract important skill entities
from job postings and then target members with matched attributes. In this
work, we show that the commonly used text-based \emph{salience and
market-agnostic} skill extraction approach is sub-optimal because it only
considers skill mention and ignores the salient level of a skill and its market
dynamics, i.e., the market supply and demand influence on the importance of
skills. To address the above drawbacks, we present \model, our deployed
\emph{salience and market-aware} skill extraction system. The proposed \model
~shows promising results in improving the online performance of job
recommendation (JYMBII) ( job apply) and skill suggestions for job
posters ( suggestion rejection rate). Lastly, we present case studies to
show interesting insights that contrast traditional skill recognition method
and the proposed \model~from occupation, industry, country, and individual
skill levels. Based on the above promising results, we deployed the \model
~online to extract job targeting skills for all M job postings served at
LinkedIn.Comment: 9 pages, to appear in KDD202
- …