1,207 research outputs found
The Microsoft 2017 Conversational Speech Recognition System
We describe the 2017 version of Microsoft's conversational speech recognition
system, in which we update our 2016 system with recent developments in
neural-network-based acoustic and language modeling to further advance the
state of the art on the Switchboard speech recognition task. The system adds a
CNN-BLSTM acoustic model to the set of model architectures we combined
previously, and includes character-based and dialog session aware LSTM language
models in rescoring. For system combination we adopt a two-stage approach,
whereby subsets of acoustic models are first combined at the senone/frame
level, followed by a word-level voting via confusion networks. We also added a
confusion network rescoring step after system combination. The resulting system
yields a 5.1\% word error rate on the 2000 Switchboard evaluation set
A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm
Computing problems that handle large amounts of data necessitate the use of
lossless data compression for efficient storage and transmission. We present a
novel lossless universal data compression algorithm that uses parallel
computational units to increase the throughput. The length- input sequence
is partitioned into blocks. Processing each block independently of the
other blocks can accelerate the computation by a factor of , but degrades
the compression quality. Instead, our approach is to first estimate the minimum
description length (MDL) context tree source underlying the entire input, and
then encode each of the blocks in parallel based on the MDL source. With
this two-pass approach, the compression loss incurred by using more parallel
units is insignificant. Our algorithm is work-efficient, i.e., its
computational complexity is . Its redundancy is approximately
bits above Rissanen's lower bound on universal compression
performance, with respect to any context tree source whose maximal depth is at
most . We improve the compression by using different quantizers for
states of the context tree based on the number of symbols corresponding to
those states. Numerical results from a prototype implementation suggest that
our algorithm offers a better trade-off between compression and throughput than
competing universal data compression algorithms.Comment: Accepted to Journal of Selected Topics in Signal Processing special
issue on Signal Processing for Big Data (expected publication date June
2015). 10 pages double column, 6 figures, and 2 tables. arXiv admin note:
substantial text overlap with arXiv:1405.6322. Version: Mar 2015: Corrected a
typ
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based
on sparse local features and hypergraph matching. We benefit from special
properties of the temporal domain in the data to derive a sequential and fast
graph matching algorithm for GPUs.
Traditionally, graphs and hypergraphs are frequently used to recognize
complex and often non-rigid patterns in computer vision, either through graph
matching or point-set matching with graphs. Most formulations resort to the
minimization of a difficult discrete energy function mixing geometric or
structural terms with data attached terms involving appearance features.
Traditional methods solve this minimization problem approximately, for instance
with spectral techniques.
In this work, instead of solving the problem approximatively, the exact
solution for the optimal assignment is calculated in parallel on GPUs. The
graphical structure is simplified and regularized, which allows to derive an
efficient recursive minimization algorithm. The algorithm distributes
subproblems over the calculation units of a GPU, which solves them in parallel,
allowing the system to run faster than real-time on medium-end GPUs
Coupled Depth Learning
In this paper we propose a method for estimating depth from a single image
using a coarse to fine approach. We argue that modeling the fine depth details
is easier after a coarse depth map has been computed. We express a global
(coarse) depth map of an image as a linear combination of a depth basis learned
from training examples. The depth basis captures spatial and statistical
regularities and reduces the problem of global depth estimation to the task of
predicting the input-specific coefficients in the linear combination. This is
formulated as a regression problem from a holistic representation of the image.
Crucially, the depth basis and the regression function are {\bf coupled} and
jointly optimized by our learning scheme. We demonstrate that this results in a
significant improvement in accuracy compared to direct regression of depth
pixel values or approaches learning the depth basis disjointly from the
regression function. The global depth estimate is then used as a guidance by a
local refinement method that introduces depth details that were not captured at
the global level. Experiments on the NYUv2 and KITTI datasets show that our
method outperforms the existing state-of-the-art at a considerably lower
computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation
Playing Smart - Artificial Intelligence in Computer Games
Abstract: With this document we will present an overview of artificial intelligence in general and artificial intelligence in the context of its use in modern computer games in particular. To this end we will firstly provide an introduction to the terminology of artificial intelligence, followed by a brief history of this field of computer science and finally we will discuss the impact which this science has had on the development of computer games. This will be further illustrated by a number of case studies, looking at how artificially intelligent behaviour has been achieved in selected games
Learning to Embed Words in Context for Syntactic Tasks
We present models for embedding words in the context of surrounding words.
Such models, which we refer to as token embeddings, represent the
characteristics of a word that are specific to a given context, such as word
sense, syntactic category, and semantic role. We explore simple, efficient
token embedding models based on standard neural network architectures. We learn
token embeddings on a large amount of unannotated text and evaluate them as
features for part-of-speech taggers and dependency parsers trained on much
smaller amounts of annotated data. We find that predictors endowed with token
embeddings consistently outperform baseline predictors across a range of
context window and training set sizes.Comment: Accepted by ACL 2017 Repl4NLP worksho
- …