3,229 research outputs found
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Love Thy Neighbors: Image Annotation by Exploiting Image Metadata
Some images that are difficult to recognize on their own may become more
clear in the context of a neighborhood of related images with similar
social-network metadata. We build on this intuition to improve multilabel image
annotation. Our model uses image metadata nonparametrically to generate
neighborhoods of related images using Jaccard similarities, then uses a deep
neural network to blend visual information from the image and its neighbors.
Prior work typically models image metadata parametrically, in contrast, our
nonparametric treatment allows our model to perform well even when the
vocabulary of metadata changes between training and testing. We perform
comprehensive experiments on the NUS-WIDE dataset, where we show that our model
outperforms state-of-the-art methods for multilabel image annotation even when
our model is forced to generalize to new types of metadata.Comment: Accepted to ICCV 201
Brain MR Image Segmentation: From Multi-Atlas Method To Deep Learning Models
Quantitative analysis of the brain structures on magnetic resonance (MR) images plays a crucial role in examining brain development and abnormality, as well as in aiding the treatment planning. Although manual delineation is commonly considered as the gold standard, it suffers from the shortcomings in terms of low efficiency and inter-rater variability. Therefore, developing automatic anatomical segmentation of human brain is of importance in providing a tool for quantitative analysis (e.g., volume measurement, shape analysis, cortical surface mapping). Despite a large number of existing techniques, the automatic segmentation of brain MR images remains a challenging task due to the complexity of the brain anatomical structures and the great inter- and intra-individual variability among these anatomical structures. To address the existing challenges, four methods are proposed in this thesis. The first work proposes a novel label fusion scheme for the multi-atlas segmentation. A two-stage majority voting scheme is developed to address the over-segmentation problem in the hippocampus segmentation of brain MR images. The second work of the thesis develops a supervoxel graphical model for the whole brain segmentation, in order to relieve the dependencies on complicated pairwise registration for the multi-atlas segmentation methods. Based on the assumption that pixels within a supervoxel are supposed to have the same label, the proposed method converts the voxel labeling problem to a supervoxel labeling problem which is solved by a maximum-a-posteriori (MAP) inference in Markov random field (MRF) defined on supervoxels. The third work incorporates attention mechanism into convolutional neural networks (CNN), aiming at learning the spatial dependencies between the shallow layers and the deep layers in CNN and producing an aggregation of the attended local feature and high-level features to obtain more precise segmentation results. The fourth method takes advantage of the success of CNN in computer vision, combines the strength of the graphical model with CNN, and integrates them into an end-to-end training network. The proposed methods are evaluated on public MR image datasets, such as MICCAI2012, LPBA40, and IBSR. Extensive experiments demonstrate the effectiveness and superior performance of the three proposed methods compared with the other state-of-the-art methods
- …