Search CORE

473 research outputs found

In the sight of my wearable camera: Classifying my visual experience

Author: Jojic Nebojsa
Perina Alessandro
Publication venue
Publication date: 26/04/2013
Field of study

We introduce and we analyze a new dataset which resembles the input to biological vision systems much more than most previously published ones. Our analysis leaded to several important conclusions. First, it is possible to disambiguate over dozens of visual scenes (locations) encountered over the course of several weeks of a human life with accuracy of over 80%, and this opens up possibility for numerous novel vision applications, from early detection of dementia to everyday use of wearable camera streams for automatic reminders, and visual stream exchange. Second, our experimental results indicate that, generative models such as Latent Dirichlet Allocation or Counting Grids, are more suitable to such types of data, as they are more robust to overtraining and comfortable with images at low resolution, blurred and characterized by relatively random clutter and a mix of objects

arXiv.org e-Print Archive

No Need to Pay Attention: Simple Recurrent Neural Networks Work! (for Answering "Simple" Questions)

Author: Jojic Oliver
Ture Ferhan
Publication venue
Publication date: 28/07/2017
Field of study

First-order factoid question answering assumes that the question can be answered by a single fact in a knowledge base (KB). While this does not seem like a challenging task, many recent attempts that apply either complex linguistic reasoning or deep neural networks achieve 65%-76% accuracy on benchmark sets. Our approach formulates the task as two machine learning problems: detecting the entities in the question, and classifying the question as one of the relation types in the KB. We train a recurrent neural network to solve each problem. On the SimpleQuestions dataset, our approach yields substantial improvements over previously published results --- even neural networks based on much more complex architectures. The simplicity of our approach also has practical advantages, such as efficiency and modularity, that are valuable especially in an industry setting. In fact, we present a preliminary analysis of the performance of our model on real queries from Comcast's X1 entertainment platform with millions of users every day.Comment: 7 pages, to appear in EMNLP 201

arXiv.org e-Print Archive

Multidimensional counting grids: Inferring word order from disordered bags of words

Author: Jojic Nebojsa
Perina Alessandro
Publication venue
Publication date: 14/02/2012
Field of study

Models of bags of words typically assume topic mixing so that the words in a single bag come from a limited number of topics. We show here that many sets of bag of words exhibit a very different pattern of variation than the patterns that are efficiently captured by topic mixing. In many cases, from one bag of words to the next, the words disappear and new ones appear as if the theme slowly and smoothly shifted across documents (providing that the documents are somehow ordered). Examples of latent structure that describe such ordering are easily imagined. For example, the advancement of the date of the news stories is reflected in a smooth change over the theme of the day as certain evolving news stories fall out of favor and new events create new stories. Overlaps among the stories of consecutive days can be modeled by using windows over linearly arranged tight distributions over words. We show here that such strategy can be extended to multiple dimensions and cases where the ordering of data is not readily obvious. We demonstrate that this way of modeling covariation in word occurrences outperforms standard topic models in classification and prediction tasks in applications in biology, text modeling and computer vision

arXiv.org e-Print Archive

Discovering Patterns in Biological Sequences by Optimal Segmentation

Author: Bockhorst Joseph
Jojic Nebojsa
Publication venue
Publication date: 20/06/2012
Field of study

Computational methods for discovering patterns of local correlations in sequences are important in computational biology. Here we show how to determine the optimal partitioning of aligned sequences into non-overlapping segments such that positions in the same segment are strongly correlated while positions in different segments are not. Our approach involves discovering the hidden variables of a Bayesian network that interact with observed sequences so as to form a set of independent mixture models. We introduce a dynamic program to efficiently discover the optimal segmentation, or equivalently the optimal set of hidden variables. We evaluate our approach on two computational biology tasks. One task is related to the design of vaccines against polymorphic pathogens and the other task involves analysis of single nucleotide polymorphisms (SNPs) in human DNA. We show how common tasks in these problems naturally correspond to inference procedures in the learned models. Error rates of our learned models for the prediction of missing SNPs are up to 1/3 less than the error rates of a state-of-the-art SNP prediction method. Source code is available at www.uwm.edu/~joebock/segmentation.Comment: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007

arXiv.org e-Print Archive

Degrees of Freedom in Deep Neural Networks

Author: Gao Tianxiang
Jojic Vladimir
Publication venue
Publication date: 03/06/2016
Field of study

In this paper, we explore degrees of freedom in deep sigmoidal neural networks. We show that the degrees of freedom in these models is related to the expected optimism, which is the expected difference between test error and training error. We provide an efficient Monte-Carlo method to estimate the degrees of freedom for multi-class classification methods. We show degrees of freedom are lower than the parameter count in a simple XOR network. We extend these results to neural nets trained on synthetic and real data, and investigate impact of network's architecture and different regularization choices. The degrees of freedom in deep networks are dramatically smaller than the number of parameters, in some real datasets several orders of magnitude. Further, we observe that for fixed number of parameters, deeper networks have less degrees of freedom exhibiting a regularization-by-depth

arXiv.org e-Print Archive

Joint discovery of haplotype blocks and complex trait associations from SNP sequences

Author: Heckerman David
Jojic Nebojsa
Jojic Vladimir
Publication venue
Publication date: 11/07/2012
Field of study

Haplotypes, the global patterns of DNA sequence variation, have important implications for identifying complex traits. Recently, blocks of limited haplotype diversity have been discovered in human chromosomes, intensifying the research on modelling the block structure as well as the transitions or co-occurrence of the alleles in these blocks as a way to compress the variability and infer the associations more robustly. The haplotype block structure analysis is typically complicated by the fact that the phase information for each SNP is missing, i.e., the observed allele pairs are not given in a consistent order across the sequence. The techniques for circumventing this require additional information, such as family data, or a more complex sequencing procedure. In this paper we present a hierarchical statistical model and the associated learning and inference algorithms that simultaneously deal with the allele ambiguity per locus, missing data, block estimation, and the complex trait association. While the blo structure may differ from the structures inferred by other methods, which use the pedigree information or previously known alleles, the parameters we estimate, including the learned block structure and the estimated block transitions per locus, define a good model of variability in the set. The method is completely datadriven and can detect Chron's disease from the SNP data taken from the human chromosome 5q31 with the detection rate of 80% and a small error variance.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004

arXiv.org e-Print Archive

Hierarchical learning of grids of microtopics

Author: Jojic Nebojsa
Kim Dongwoo
Perina Alessandro
Publication venue
Publication date: 08/06/2016
Field of study

The counting grid is a grid of microtopics, sparse word/feature distributions. The generative model associated with the grid does not use these microtopics individually. Rather, it groups them in overlapping rectangular windows and uses these grouped microtopics as either mixture or admixture components. This paper builds upon the basic counting grid model and it shows that hierarchical reasoning helps avoid bad local minima, produces better classification accuracy and, most interestingly, allows for extraction of large numbers of coherent microtopics even from small datasets. We evaluate this in terms of consistency, diversity and clarity of the indexed content, as well as in a user study on word intrusion tasks. We demonstrate that these models work well as a technique for embedding raw images and discuss interesting parallels between hierarchical CG models and other deep architectures.Comment: To Appear in Uncertainty in Artificial Intelligence - UAI 201

arXiv.org e-Print Archive

On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation

Author: Feng Jiashi
Huang Thomas S.
Jojic Nebojsa
Yang Jianchao
Yang Yingzhen
Publication venue
Publication date: 05/09/2017
Field of study

We study the proximal gradient descent (PGD) method for

\ell^{0}

sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the

\ell^{0}

sparse approximation problem under conditions weaker than Restricted Isometry Property widely used in compressive sensing literature. Moreover, we propose randomized algorithms to accelerate the optimization by PGD using randomized low rank matrix approximation (PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized algorithms substantially reduces the computation cost of the original PGD for the

\ell^{0}

sparse approximation problem, and the resultant sub-optimal solution still enjoys provable suboptimality, namely, the sub-optimal solution to the reduced problem still has bounded gap to the globally optimal solution to the original problem

arXiv.org e-Print Archive

Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks

Author: He Hua
Jojic Oliver
Lin Jimmy
Rao Jinfeng
Ture Ferhan
Publication venue
Publication date: 13/05/2017
Field of study

We tackle the novel problem of navigational voice queries posed against an entertainment system, where viewers interact with a voice-enabled remote controller to specify the program to watch. This is a difficult problem for several reasons: such queries are short, even shorter than comparable voice queries in other domains, which offers fewer opportunities for deciphering user intent. Furthermore, ambiguity is exacerbated by underlying speech recognition errors. We address these challenges by integrating word- and character-level representations of the queries and by modeling voice search sessions to capture the contextual dependencies in query sequences. Both are accomplished with a probabilistic framework in which recurrent and feedforward neural network modules are organized in a hierarchical manner. From a raw dataset of 32M voice queries from 2.5M viewers on the Comcast Xfinity X1 entertainment system, we extracted data to train and test our models. We demonstrate the benefits of our hybrid representation and context-aware model, which significantly outperforms models without context as well as the current deployed product

arXiv.org e-Print Archive

Learning Markov Chain in Unordered Dataset

Author: Jojic Nebojsa
Salakhutdinov Ruslan
Tsai Yao-Hung Hubert
Zhao Han
Publication venue
Publication date: 05/03/2019
Field of study

The assumption that data samples are independently identically distributed is the backbone of many learning algorithms. Nevertheless, datasets often exhibit rich structure in practice, and we argue that there exist some unknown order within the data instances. In this technical report, we introduce OrderNet that can be used to extract the order of data instances in an unsupervised way. By assuming that the instances are sampled from a Markov chain, our goal is to learn the transitional operator of the underlying Markov chain, as well as the order by maximizing the generation probability under all possible data permutations. Specifically, we use neural network as a compact and soft lookup table to approximate the possibly huge, but discrete transition matrix. This strategy allows us to amortize the space complexity with a single model. Furthermore, this simple and compact representation also provides a short description to the dataset and generalizes to unseen instances as well. To ensure that the learned Markov chain is ergodic, we propose a greedy batch-wise permutation scheme that allows fast training. Empirically, we show that OrderNet is able to discover an order among data instances. We also extend the proposed OrderNet to one-shot recognition task and demonstrate favorable results.Comment: This would be the final update for this technical report on learning Markov Chain in the unordered datase

arXiv.org e-Print Archive