22,314 research outputs found
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation
We address talker-independent monaural speaker separation from the
perspectives of deep learning and computational auditory scene analysis (CASA).
Specifically, we decompose the multi-speaker separation task into the stages of
simultaneous grouping and sequential grouping. Simultaneous grouping is first
performed in each time frame by separating the spectra of different speakers
with a permutation-invariantly trained neural network. In the second stage, the
frame-level separated spectra are sequentially grouped to different speakers by
a clustering network. The proposed deep CASA approach optimizes frame-level
separation and speaker tracking in turn, and produces excellent results for
both objectives. Experimental results on the benchmark WSJ0-2mix database show
that the new approach achieves the state-of-the-art results with a modest model
size.Comment: 10 pages, 5 figure
An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes
Background and Objective: Code assignment is of paramount importance in many
levels in modern hospitals, from ensuring accurate billing process to creating
a valid record of patient care history. However, the coding process is tedious
and subjective, and it requires medical coders with extensive training. This
study aims to evaluate the performance of deep-learning-based systems to
automatically map clinical notes to ICD-9 medical codes. Methods: The
evaluations of this research are focused on end-to-end learning methods without
manually defined rules. Traditional machine learning algorithms, as well as
state-of-the-art deep learning methods such as Recurrent Neural Networks and
Convolution Neural Networks, were applied to the Medical Information Mart for
Intensive Care (MIMIC-III) dataset. An extensive number of experiments was
applied to different settings of the tested algorithm. Results: Findings showed
that the deep learning-based methods outperformed other conventional machine
learning methods. From our assessment, the best models could predict the top 10
ICD-9 codes with 0.6957 F1 and 0.8967 accuracy and could estimate the top 10
ICD-9 categories with 0.7233 F1 and 0.8588 accuracy. Our implementation also
outperformed existing work under certain evaluation metrics. Conclusion: A set
of standard metrics was utilized in assessing the performance of ICD-9 code
assignment on MIMIC-III dataset. All the developed evaluation tools and
resources are available online, which can be used as a baseline for further
research
Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction
The recommender system is an important form of intelligent application, which
assists users to alleviate from information redundancy. Among the metrics used
to evaluate a recommender system, the metric of conversion has become more and
more important. The majority of existing recommender systems perform poorly on
the metric of conversion due to its extremely sparse feedback signal. To tackle
this challenge, we propose a deep hierarchical reinforcement learning based
recommendation framework, which consists of two components, i.e., high-level
agent and low-level agent. The high-level agent catches long-term sparse
conversion signals, and automatically sets abstract goals for low-level agent,
while the low-level agent follows the abstract goals and interacts with
real-time environment. To solve the inherent problem in hierarchical
reinforcement learning, we propose a novel deep hierarchical reinforcement
learning algorithm via multi-goals abstraction (HRL-MG). Our proposed algorithm
contains three characteristics: 1) the high-level agent generates multiple
goals to guide the low-level agent in different stages, which reduces the
difficulty of approaching high-level goals; 2) different goals share the same
state encoder parameters, which increases the update frequency of the
high-level agent and thus accelerates the convergence of our proposed
algorithm; 3) an appreciate benefit assignment function is designed to allocate
rewards in each goal so as to coordinate different goals in a consistent
direction. We evaluate our proposed algorithm based on a real-world e-commerce
dataset and validate its effectiveness.Comment: submitted to SIGKDD 201
JECL: Joint Embedding and Cluster Learning for Image-Text Pairs
We propose JECL, a method for clustering image-caption pairs by training
parallel encoders with regularized clustering and alignment objectives,
simultaneously learning both representations and cluster assignments. These
image-caption pairs arise frequently in high-value applications where
structured training data is expensive to produce, but free-text descriptions
are common. JECL trains by minimizing the Kullback-Leibler divergence between
the distribution of the images and text to that of a combined joint target
distribution and optimizing the Jensen-Shannon divergence between the soft
cluster assignments of the images and text. Regularizers are also applied to
JECL to prevent trivial solutions. Experiments show that JECL outperforms both
single-view and multi-view methods on large benchmark image-caption datasets,
and is remarkably robust to missing captions and varying data sizes
Heated-Up Softmax Embedding
Metric learning aims at learning a distance which is consistent with the
semantic meaning of the samples. The problem is generally solved by learning an
embedding for each sample such that the embeddings of samples of the same
category are compact while the embeddings of samples of different categories
are spread-out in the feature space. We study the features extracted from the
second last layer of a deep neural network based classifier trained with the
cross entropy loss on top of the softmax layer. We show that training
classifiers with different temperature values of softmax function leads to
features with different levels of compactness. Leveraging these insights, we
propose a "heating-up" strategy to train a classifier with increasing
temperatures, leading the corresponding embeddings to achieve state-of-the-art
performance on a variety of metric learning benchmarks.Comment: 11 pages, 4 figure
Convolutional Neural Networks for Fast Approximation of Graph Edit Distance
Graph Edit Distance (GED) computation is a core operation of many widely-used
graph applications, such as graph classification, graph matching, and graph
similarity search. However, computing the exact GED between two graphs is
NP-complete. Most current approximate algorithms are based on solving a
combinatorial optimization problem, which involves complicated design and high
time complexity. In this paper, we propose a novel end-to-end neural network
based approach to GED approximation, aiming to alleviate the computational
burden while preserving good performance. The proposed approach, named GSimCNN,
turns GED computation into a learning problem. Each graph is considered as a
set of nodes, represented by learnable embedding vectors. The GED computation
is then considered as a two-set matching problem, where a higher matching score
leads to a lower GED. A Convolutional Neural Network (CNN) based approach is
proposed to tackle the set matching problem. We test our algorithm on three
real graph datasets, and our model achieves significant performance enhancement
against state-of-the-art approximate GED computation algorithms.Comment: arXiv admin note: text overlap with arXiv:1808.0568
Learning K-way D-dimensional Discrete Code For Compact Embedding Representations
Embedding methods such as word embedding have become pillars for many
applications containing discrete structures. Conventional embedding methods
directly associate each symbol with a continuous embedding vector, which is
equivalent to applying linear transformation based on "one-hot" encoding of the
discrete symbols. Despite its simplicity, such approach yields number of
parameters that grows linearly with the vocabulary size and can lead to
overfitting. In this work we propose a much more compact K-way D-dimensional
discrete encoding scheme to replace the "one-hot" encoding. In "KD encoding",
each symbol is represented by a -dimensional code, and each of its dimension
has a cardinality of . The final symbol embedding vector can be generated by
composing the code embedding vectors. To learn the semantically meaningful
code, we derive a relaxed discrete optimization technique based on stochastic
gradient descent. By adopting the new coding system, the efficiency of
parameterization can be significantly improved (from linear to logarithmic),
and this can also mitigate the over-fitting problem. In our experiments with
language modeling, the number of embedding parameters can be reduced by 97\%
while achieving similar or better performance.Comment: NIPS'17 DISCM
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
Affinity CNN: Learning Pixel-Centric Pairwise Relations for Figure/Ground Embedding
Spectral embedding provides a framework for solving perceptual organization
problems, including image segmentation and figure/ground organization. From an
affinity matrix describing pairwise relationships between pixels, it clusters
pixels into regions, and, using a complex-valued extension, orders pixels
according to layer. We train a convolutional neural network (CNN) to directly
predict the pairwise relationships that define this affinity matrix. Spectral
embedding then resolves these predictions into a globally-consistent
segmentation and figure/ground organization of the scene. Experiments
demonstrate significant benefit to this direct coupling compared to prior works
which use explicit intermediate stages, such as edge detection, on the pathway
from image to affinities. Our results suggest spectral embedding as a powerful
alternative to the conditional random field (CRF)-based globalization schemes
typically coupled to deep neural networks.Comment: minor updates; extended version of CVPR 2016 conference pape
Recognizing Multi-talker Speech with Permutation Invariant Training
In this paper, we propose a novel technique for direct recognition of
multiple speech streams given the single channel of mixed speech, without first
separating them. Our technique is based on permutation invariant training (PIT)
for automatic speech recognition (ASR). In PIT-ASR, we compute the average
cross entropy (CE) over all frames in the whole utterance for each possible
output-target assignment, pick the one with the minimum CE, and optimize for
that assignment. PIT-ASR forces all the frames of the same speaker to be
aligned with the same output layer. This strategy elegantly solves the label
permutation problem and speaker tracing problem in one shot. Our experiments on
artificially mixed AMI data showed that the proposed approach is very
promising.Comment: 5 pages, 6 figures, InterSpeech201
- …