473 research outputs found
In the sight of my wearable camera: Classifying my visual experience
We introduce and we analyze a new dataset which resembles the input to
biological vision systems much more than most previously published ones. Our
analysis leaded to several important conclusions. First, it is possible to
disambiguate over dozens of visual scenes (locations) encountered over the
course of several weeks of a human life with accuracy of over 80%, and this
opens up possibility for numerous novel vision applications, from early
detection of dementia to everyday use of wearable camera streams for automatic
reminders, and visual stream exchange. Second, our experimental results
indicate that, generative models such as Latent Dirichlet Allocation or
Counting Grids, are more suitable to such types of data, as they are more
robust to overtraining and comfortable with images at low resolution, blurred
and characterized by relatively random clutter and a mix of objects
No Need to Pay Attention: Simple Recurrent Neural Networks Work! (for Answering "Simple" Questions)
First-order factoid question answering assumes that the question can be
answered by a single fact in a knowledge base (KB). While this does not seem
like a challenging task, many recent attempts that apply either complex
linguistic reasoning or deep neural networks achieve 65%-76% accuracy on
benchmark sets. Our approach formulates the task as two machine learning
problems: detecting the entities in the question, and classifying the question
as one of the relation types in the KB. We train a recurrent neural network to
solve each problem. On the SimpleQuestions dataset, our approach yields
substantial improvements over previously published results --- even neural
networks based on much more complex architectures. The simplicity of our
approach also has practical advantages, such as efficiency and modularity, that
are valuable especially in an industry setting. In fact, we present a
preliminary analysis of the performance of our model on real queries from
Comcast's X1 entertainment platform with millions of users every day.Comment: 7 pages, to appear in EMNLP 201
Multidimensional counting grids: Inferring word order from disordered bags of words
Models of bags of words typically assume topic mixing so that the words in a
single bag come from a limited number of topics. We show here that many sets of
bag of words exhibit a very different pattern of variation than the patterns
that are efficiently captured by topic mixing. In many cases, from one bag of
words to the next, the words disappear and new ones appear as if the theme
slowly and smoothly shifted across documents (providing that the documents are
somehow ordered). Examples of latent structure that describe such ordering are
easily imagined. For example, the advancement of the date of the news stories
is reflected in a smooth change over the theme of the day as certain evolving
news stories fall out of favor and new events create new stories. Overlaps
among the stories of consecutive days can be modeled by using windows over
linearly arranged tight distributions over words. We show here that such
strategy can be extended to multiple dimensions and cases where the ordering of
data is not readily obvious. We demonstrate that this way of modeling
covariation in word occurrences outperforms standard topic models in
classification and prediction tasks in applications in biology, text modeling
and computer vision
Discovering Patterns in Biological Sequences by Optimal Segmentation
Computational methods for discovering patterns of local correlations in
sequences are important in computational biology. Here we show how to determine
the optimal partitioning of aligned sequences into non-overlapping segments
such that positions in the same segment are strongly correlated while positions
in different segments are not. Our approach involves discovering the hidden
variables of a Bayesian network that interact with observed sequences so as to
form a set of independent mixture models. We introduce a dynamic program to
efficiently discover the optimal segmentation, or equivalently the optimal set
of hidden variables. We evaluate our approach on two computational biology
tasks. One task is related to the design of vaccines against polymorphic
pathogens and the other task involves analysis of single nucleotide
polymorphisms (SNPs) in human DNA. We show how common tasks in these problems
naturally correspond to inference procedures in the learned models. Error rates
of our learned models for the prediction of missing SNPs are up to 1/3 less
than the error rates of a state-of-the-art SNP prediction method. Source code
is available at www.uwm.edu/~joebock/segmentation.Comment: Appears in Proceedings of the Twenty-Third Conference on Uncertainty
in Artificial Intelligence (UAI2007
Degrees of Freedom in Deep Neural Networks
In this paper, we explore degrees of freedom in deep sigmoidal neural
networks. We show that the degrees of freedom in these models is related to the
expected optimism, which is the expected difference between test error and
training error. We provide an efficient Monte-Carlo method to estimate the
degrees of freedom for multi-class classification methods. We show degrees of
freedom are lower than the parameter count in a simple XOR network. We extend
these results to neural nets trained on synthetic and real data, and
investigate impact of network's architecture and different regularization
choices. The degrees of freedom in deep networks are dramatically smaller than
the number of parameters, in some real datasets several orders of magnitude.
Further, we observe that for fixed number of parameters, deeper networks have
less degrees of freedom exhibiting a regularization-by-depth
Joint discovery of haplotype blocks and complex trait associations from SNP sequences
Haplotypes, the global patterns of DNA sequence variation, have important
implications for identifying complex traits. Recently, blocks of limited
haplotype diversity have been discovered in human chromosomes, intensifying the
research on modelling the block structure as well as the transitions or
co-occurrence of the alleles in these blocks as a way to compress the
variability and infer the associations more robustly. The haplotype block
structure analysis is typically complicated by the fact that the phase
information for each SNP is missing, i.e., the observed allele pairs are not
given in a consistent order across the sequence. The techniques for
circumventing this require additional information, such as family data, or a
more complex sequencing procedure. In this paper we present a hierarchical
statistical model and the associated learning and inference algorithms that
simultaneously deal with the allele ambiguity per locus, missing data, block
estimation, and the complex trait association. While the blo structure may
differ from the structures inferred by other methods, which use the pedigree
information or previously known alleles, the parameters we estimate, including
the learned block structure and the estimated block transitions per locus,
define a good model of variability in the set. The method is completely
datadriven and can detect Chron's disease from the SNP data taken from the
human chromosome 5q31 with the detection rate of 80% and a small error
variance.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in
Artificial Intelligence (UAI2004
Hierarchical learning of grids of microtopics
The counting grid is a grid of microtopics, sparse word/feature
distributions. The generative model associated with the grid does not use these
microtopics individually. Rather, it groups them in overlapping rectangular
windows and uses these grouped microtopics as either mixture or admixture
components. This paper builds upon the basic counting grid model and it shows
that hierarchical reasoning helps avoid bad local minima, produces better
classification accuracy and, most interestingly, allows for extraction of large
numbers of coherent microtopics even from small datasets. We evaluate this in
terms of consistency, diversity and clarity of the indexed content, as well as
in a user study on word intrusion tasks. We demonstrate that these models work
well as a technique for embedding raw images and discuss interesting parallels
between hierarchical CG models and other deep architectures.Comment: To Appear in Uncertainty in Artificial Intelligence - UAI 201
On the Suboptimality of Proximal Gradient Descent for Sparse Approximation
We study the proximal gradient descent (PGD) method for sparse
approximation problem as well as its accelerated optimization with randomized
algorithms in this paper. We first offer theoretical analysis of PGD showing
the bounded gap between the sub-optimal solution by PGD and the globally
optimal solution for the sparse approximation problem under
conditions weaker than Restricted Isometry Property widely used in compressive
sensing literature. Moreover, we propose randomized algorithms to accelerate
the optimization by PGD using randomized low rank matrix approximation
(PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized
algorithms substantially reduces the computation cost of the original PGD for
the sparse approximation problem, and the resultant sub-optimal
solution still enjoys provable suboptimality, namely, the sub-optimal solution
to the reduced problem still has bounded gap to the globally optimal solution
to the original problem
Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks
We tackle the novel problem of navigational voice queries posed against an
entertainment system, where viewers interact with a voice-enabled remote
controller to specify the program to watch. This is a difficult problem for
several reasons: such queries are short, even shorter than comparable voice
queries in other domains, which offers fewer opportunities for deciphering user
intent. Furthermore, ambiguity is exacerbated by underlying speech recognition
errors. We address these challenges by integrating word- and character-level
representations of the queries and by modeling voice search sessions to capture
the contextual dependencies in query sequences. Both are accomplished with a
probabilistic framework in which recurrent and feedforward neural network
modules are organized in a hierarchical manner. From a raw dataset of 32M voice
queries from 2.5M viewers on the Comcast Xfinity X1 entertainment system, we
extracted data to train and test our models. We demonstrate the benefits of our
hybrid representation and context-aware model, which significantly outperforms
models without context as well as the current deployed product
Learning Markov Chain in Unordered Dataset
The assumption that data samples are independently identically distributed is
the backbone of many learning algorithms. Nevertheless, datasets often exhibit
rich structure in practice, and we argue that there exist some unknown order
within the data instances. In this technical report, we introduce OrderNet that
can be used to extract the order of data instances in an unsupervised way. By
assuming that the instances are sampled from a Markov chain, our goal is to
learn the transitional operator of the underlying Markov chain, as well as the
order by maximizing the generation probability under all possible data
permutations. Specifically, we use neural network as a compact and soft lookup
table to approximate the possibly huge, but discrete transition matrix. This
strategy allows us to amortize the space complexity with a single model.
Furthermore, this simple and compact representation also provides a short
description to the dataset and generalizes to unseen instances as well. To
ensure that the learned Markov chain is ergodic, we propose a greedy batch-wise
permutation scheme that allows fast training. Empirically, we show that
OrderNet is able to discover an order among data instances. We also extend the
proposed OrderNet to one-shot recognition task and demonstrate favorable
results.Comment: This would be the final update for this technical report on learning
Markov Chain in the unordered datase
- …