5,620 research outputs found
A Practical Framework for Relation Extraction with Noisy Labels Based on Doubly Transitional Loss
Either human annotation or rule based automatic labeling is an effective
method to augment data for relation extraction. However, the inevitable wrong
labeling problem for example by distant supervision may deteriorate the
performance of many existing methods. To address this issue, we introduce a
practical end-to-end deep learning framework, including a standard feature
extractor and a novel noisy classifier with our proposed doubly transitional
mechanism. One transition is basically parameterized by a non-linear
transformation between hidden layers that implicitly represents the conversion
between the true and noisy labels, and it can be readily optimized together
with other model parameters. Another is an explicit probability transition
matrix that captures the direct conversion between labels but needs to be
derived from an EM algorithm. We conduct experiments on the NYT dataset and
SemEval 2018 Task 7. The empirical results show comparable or better
performance over state-of-the-art methods.Comment: 10 page
PILAE: A Non-gradient Descent Learning Scheme for Deep Feedforward Neural Networks
In this work, a non-gradient descent learning scheme is proposed for deep
feedforward neural networks (DNN). As we known, autoencoder can be used as the
building blocks of the multi-layer perceptron (MLP) deep neural network. So,
the MLP will be taken as an example to illustrate the proposed scheme of
pseudoinverse learning algorithm for autoencoder (PILAE) training. The PILAE
with low rank approximation is a non-gradient based learning algorithm, and the
encoder weight matrix is set to be the low rank approximation of the
pseudoinverse of the input matrix, while the decoder weight matrix is
calculated by the pseudoinverse learning algorithm. It is worth to note that
only few network structure hyperparameters need to be tuned. Hence, the
proposed algorithm can be regarded as a quasi-automated training algorithm
which can be utilized in autonomous machine learning research field. The
experimental results show that the proposed learning scheme for DNN can achieve
better performance on considering the tradeoff between training efficiency and
classification accuracy.Comment: This work is our effort toward to realize AutoM
AD3: Attentive Deep Document Dater
Knowledge of the creation date of documents facilitates several tasks such as
summarization, event extraction, temporally focused information extraction etc.
Unfortunately, for most of the documents on the Web, the time-stamp metadata is
either missing or can't be trusted. Thus, predicting creation time from
document content itself is an important task. In this paper, we propose
Attentive Deep Document Dater (AD3), an attention-based neural document dating
system which utilizes both context and temporal information in documents in a
flexible and principled manner. We perform extensive experimentation on
multiple real-world datasets to demonstrate the effectiveness of AD3 over
neural and non-neural baselines
Deep Learning Architectures for Face Recognition in Video Surveillance
Face recognition (FR) systems for video surveillance (VS) applications
attempt to accurately detect the presence of target individuals over a
distributed network of cameras. In video-based FR systems, facial models of
target individuals are designed a priori during enrollment using a limited
number of reference still images or video data. These facial models are not
typically representative of faces being observed during operations due to large
variations in illumination, pose, scale, occlusion, blur, and to camera
inter-operability. Specifically, in still-to-video FR application, a single
high-quality reference still image captured with still camera under controlled
conditions is employed to generate a facial model to be matched later against
lower-quality faces captured with video cameras under uncontrolled conditions.
Current video-based FR systems can perform well on controlled scenarios, while
their performance is not satisfactory in uncontrolled scenarios mainly because
of the differences between the source (enrollment) and the target (operational)
domains. Most of the efforts in this area have been toward the design of robust
video-based FR systems in unconstrained surveillance environments. This chapter
presents an overview of recent advances in still-to-video FR scenario through
deep convolutional neural networks (CNNs). In particular, deep learning
architectures proposed in the literature based on triplet-loss function (e.g.,
cross-correlation matching CNN, trunk-branch ensemble CNN and HaarNet) and
supervised autoencoders (e.g., canonical face representation CNN) are reviewed
and compared in terms of accuracy and computational complexity
Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)
Automatic speech emotion recognition provides computers with critical context
to enable user understanding. While methods trained and tested within the same
dataset have been shown successful, they often fail when applied to unseen
datasets. To address this, recent work has focused on adversarial methods to
find more generalized representations of emotional speech. However, many of
these methods have issues converging, and only involve datasets collected in
laboratory conditions. In this paper, we introduce Adversarial Discriminative
Domain Generalization (ADDoG), which follows an easier to train "meet in the
middle" approach. The model iteratively moves representations learned for each
dataset closer to one another, improving cross-dataset generalization. We also
introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed
method to more than two datasets, simultaneously. Our results show consistent
convergence for the introduced methods, with significantly improved results
when not using labels from the target dataset. We also show how, in most cases,
ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods
when target dataset labels are added and in-the-wild data are considered. Even
though our experiments focus on cross-corpus speech emotion, these methods
could be used to remove unwanted factors of variation in other settings
Deep Semantic Role Labeling with Self-Attention
Semantic Role Labeling (SRL) is believed to be a crucial step towards natural
language understanding and has been widely studied. Recent years, end-to-end
SRL with recurrent neural networks (RNN) has gained increasing attention.
However, it remains a major challenge for RNNs to handle structural information
and long range dependencies. In this paper, we present a simple and effective
architecture for SRL which aims to address these problems. Our model is based
on self-attention which can directly capture the relationships between two
tokens regardless of their distance. Our single model achieves F on
the CoNLL-2005 shared task dataset and F on the CoNLL-2012 shared task
dataset, which outperforms the previous state-of-the-art results by and
F score respectively. Besides, our model is computationally
efficient, and the parsing speed is 50K tokens per second on a single Titan X
GPU.Comment: Accepted by AAAI-201
Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention
Keyword spotting (KWS) and speaker verification (SV) have been studied
independently although it is known that acoustic and speaker domains are
complementary. In this paper, we propose a multi-task network that performs KWS
and SV simultaneously to fully utilize the interrelated domain information. The
multi-task network tightly combines sub-networks aiming at performance
improvement in challenging conditions such as noisy environments,
open-vocabulary KWS, and short-duration SV, by introducing novel techniques of
connectionist temporal classification (CTC)-based soft voice activity detection
(VAD) and global query attention. Frame-level acoustic and speaker information
is integrated with phonetically originated weights so that forms a word-level
global representation. Then it is used for the aggregation of feature vectors
to generate discriminative embeddings. Our proposed approach shows 4.06% and
26.71% relative improvements in equal error rate (EER) compared to the
baselines for both tasks. We also present a visualization example and results
of ablation experiments.Comment: Accepted to Interspeech 202
Neural Metric Learning for Fast End-to-End Relation Extraction
Relation extraction (RE) is an indispensable information extraction task in
several disciplines. RE models typically assume that named entity recognition
(NER) is already performed in a previous step by another independent model.
Several recent efforts, under the theme of end-to-end RE, seek to exploit
inter-task correlations by modeling both NER and RE tasks jointly. Earlier work
in this area commonly reduces the task to a table-filling problem wherein an
additional expensive decoding step involving beam search is applied to obtain
globally consistent cell labels. In efforts that do not employ table-filling,
global optimization in the form of CRFs with Viterbi decoding for the NER
component is still necessary for competitive performance. We introduce a novel
neural architecture utilizing the table structure, based on repeated
applications of 2D convolutions for pooling local dependency and metric-based
features, that improves on the state-of-the-art without the need for global
optimization. We validate our model on the ADE and CoNLL04 datasets for
end-to-end RE and demonstrate gain (in F-score) over prior best
results with training and testing times that are seven to ten times faster ---
the latter highly advantageous for time-sensitive end user applications
Unsupervised training of neural mask-based beamforming
We present an unsupervised training approach for a neural network-based mask
estimator in an acoustic beamforming application. The network is trained to
maximize a likelihood criterion derived from a spatial mixture model of the
observations. It is trained from scratch without requiring any parallel data
consisting of degraded input and clean training targets. Thus, training can be
carried out on real recordings of noisy speech rather than simulated ones. In
contrast to previous work on unsupervised training of neural mask estimators,
our approach avoids the need for a possibly pre-trained teacher model entirely.
We demonstrate the effectiveness of our approach by speech recognition
experiments on two different datasets: one mainly deteriorated by noise (CHiME
4) and one by reverberation (REVERB). The results show that the performance of
the proposed system is on par with a supervised system using oracle target
masks for training and with a system trained using a model-based teacher.Comment: Correction to Eq. 11: Hermite symbol was on the wrong variable.
Replaces y with the normalized versio
Dynamic Memory Networks for Visual and Textual Question Answering
Neural network architectures with memory and attention mechanisms exhibit
certain reasoning capabilities required for question answering. One such
architecture, the dynamic memory network (DMN), obtained high accuracy on a
variety of language tasks. However, it was not shown whether the architecture
achieves strong results for question answering when supporting facts are not
marked during training or whether it could be applied to other modalities such
as images. Based on an analysis of the DMN, we propose several improvements to
its memory and input modules. Together with these changes we introduce a novel
input module for images in order to be able to answer visual questions. Our new
DMN+ model improves the state of the art on both the Visual Question Answering
dataset and the \babi-10k text question-answering dataset without supporting
fact supervision
- …