2,826 research outputs found
Complex Event Recognition from Images with Few Training Examples
We propose to leverage concept-level representations for complex event
recognition in photographs given limited training examples. We introduce a
novel framework to discover event concept attributes from the web and use that
to extract semantic features from images and classify them into social event
categories with few training examples. Discovered concepts include a variety of
objects, scenes, actions and event sub-types, leading to a discriminative and
compact representation for event images. Web images are obtained for each
discovered event concept and we use (pretrained) CNN features to train concept
classifiers. Extensive experiments on challenging event datasets demonstrate
that our proposed method outperforms several baselines using deep CNN features
directly in classifying images into events with limited training examples. We
also demonstrate that our method achieves the best overall accuracy on a
dataset with unseen event categories using a single training example.Comment: Accepted to Winter Applications of Computer Vision (WACV'17
Statistical learning methods for mining marketing and biological data
Nowadays, the value of data has been broadly recognized and emphasized. More and more decisions are made based on data and analysis rather than solely on experience and intuition. With the fast development of networking, data storage, and data collection capacity, data have increased dramatically in industry, science and engineering domains, which brings both great opportunities and challenges. To take advantage of the data flood, new computational methods are in demand to process, analyze and understand these datasets.
This dissertation focuses on the development of statistical learning methods for online advertising and bioinformatics to model real world data with temporal or spatial changes. First, a collaborated online change-point detection method is proposed to identify the change-points in sparse time series. It leverages the signals from the auxiliary time series such as engagement metrics to compensate the sparse revenue data and improve detection efficiency and accuracy through smart collaboration. Second, a task-specific multi-task learning algorithm is developed to model the ever-changing video viewing behaviors. With the 1-regularized task-specific features and jointly estimated shared features, it allows different models to seek common ground while reserving differences. Third, an empirical Bayes method is proposed to identify 3\u27 and 5\u27 alternative splicing in RNA-seq data. It formulates alternative 3\u27 and 5\u27 splicing site selection as a change-point problem and provides for the first time a systematic framework to pool information across genes and integrate various information when available, in particular the useful junction read information, in order to obtain better performance
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Keyword spotting (KWS) constitutes a major component of human-technology
interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate,
while minimizing the footprint size, latency and complexity are the goals for
KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks
(CRNNs). Inspired by large-scale state-of-the-art speech recognition systems,
we combine the strengths of convolutional layers and recurrent layers to
exploit local structure and long-range context. We analyze the effect of
architecture parameters, and propose training strategies to improve
performance. With only ~230k parameters, our CRNN model yields acceptably low
latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise
ratio.Comment: Accepted to Interspeech 201
- …