Search CORE

175 research outputs found

Learning Similarity for Character Recognition and 3D Object Recognition

Author: Breuel Thomas M.
Publication venue
Publication date: 02/12/2007
Field of study

I describe an approach to similarity motivated by Bayesian methods. This yields a similarity function that is learnable using a standard Bayesian methods. The relationship of the approach to variable kernel and variable metric methods is discussed. The approach is related to variable kernel Experimental results on character recognition and 3D object recognition are presented.

arXiv.org e-Print Archive

Possible Mechanisms for Neural Reconfigurability and their Implications

Author: Breuel Thomas M.
Publication venue
Publication date: 11/08/2015
Field of study

The paper introduces a biologically and evolutionarily plausible neural architecture that allows a single group of neurons, or an entire cortical pathway, to be dynamically reconfigured to perform multiple, potentially very different computations. The paper shows that reconfigurability can account for the observed stochastic and distributed coding behavior of neurons and provides a parsimonious explanation for timing phenomena in psychophysical experiments. It also shows that reconfigurable pathways correspond to classes of statistical classifiers that include decision lists, decision trees, and hierarchical Bayesian methods. Implications for the interpretation of neurophysiological and psychophysical results are discussed, and future experiments for testing the reconfigurability hypothesis are explored

arXiv.org e-Print Archive

A Note on Approximate Nearest Neighbor Methods

Author: Breuel Thomas M.
Publication venue
Publication date: 21/03/2007
Field of study

A number of authors have described randomized algorithms for solving the epsilon-approximate nearest neighbor problem. In this note I point out that the epsilon-approximate nearest neighbor property often fails to be a useful approximation property, since epsilon-approximate solutions fail to satisfy the necessary preconditions for using nearest neighbors for classification and related tasks.Comment: The report was originally written in 2005 and does not reference information after that dat

arXiv.org e-Print Archive

On the Convergence of SGD Training of Neural Networks

Author: Breuel Thomas M.
Publication venue
Publication date: 11/08/2015
Field of study

Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are motivated by ideas about the occurrence of local minima at different scales, valleys, and other phenomena in the objective function. Empirical results presented here suggest that these phenomena are not significant factors in SGD optimization of MLP-related objective functions, and that the behavior of stochastic gradient descent in these problems is better described as the simultaneous convergence at different rates of many, largely non-interacting subproblem

arXiv.org e-Print Archive

The Effects of Hyperparameters on SGD Training of Neural Networks

Author: Breuel Thomas M.
Publication venue
Publication date: 11/08/2015
Field of study

The performance of neural network classifiers is determined by a number of hyperparameters, including learning rate, batch size, and depth. A number of attempts have been made to explore these parameters in the literature, and at times, to develop methods for optimizing them. However, exploration of parameter spaces has often been limited. In this note, I report the results of large scale experiments exploring these different parameters and their interactions

arXiv.org e-Print Archive

Efficient Estimation of k for the Nearest Neighbors Class of Methods

Author: Breuel Thomas
Lodwich Aleksander
Shafait Faisal
Publication venue
Publication date: 13/06/2016
Field of study

The k Nearest Neighbors (kNN) method has received much attention in the past decades, where some theoretical bounds on its performance were identified and where practical optimizations were proposed for making it work fairly well in high dimensional spaces and on large datasets. From countless experiments of the past it became widely accepted that the value of k has a significant impact on the performance of this method. However, the efficient optimization of this parameter has not received so much attention in literature. Today, the most common approach is to cross-validate or bootstrap this value for all values in question. This approach forces distances to be recomputed many times, even if efficient methods are used. Hence, estimating the optimal k can become expensive even on modern systems. Frequently, this circumstance leads to a sparse manual search of k. In this paper we want to point out that a systematic and thorough estimation of the parameter k can be performed efficiently. The discussed approach relies on large matrices, but we want to argue, that in practice a higher space complexity is often much less of a problem than repetitive distance computations.Comment: Technical Report, 16p, alternative source: http://lodwich.net/Science.htm

arXiv.org e-Print Archive

Unsupervised Image-to-Image Translation Networks

Author: Breuel Thomas
Kautz Jan
Liu Ming-Yu
Publication venue
Publication date: 22/07/2018
Field of study

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit .Comment: NIPS 2017, 11 pages, 6 figure

arXiv.org e-Print Archive

View Based Methods can achieve Bayes-Optimal 3D Recognition

Author: Breuel Thomas M.
Publication venue
Publication date: 02/12/2007
Field of study

This paper proves that visual object recognition systems using only 2D Euclidean similarity measurements to compare object views against previously seen views can achieve the same recognition performance as observers having access to all coordinate information and able of using arbitrary 3D models internally. Furthermore, it demonstrates that such systems do not require more training views than Bayes-optimal 3D model-based systems. For building computer vision systems, these results imply that using view-based or appearance-based techniques with carefully constructed combination of evidence mechanisms may not be at a disadvantage relative to 3D model-based systems. For computational approaches to human vision, they show that it is impossible to distinguish view-based and 3D model-based techniques for 3D object recognition solely by comparing the performance achievable by human and 3D model-based systems.

arXiv.org e-Print Archive

On the Relationship between the Posterior and Optimal Similarity

Author: Breuel Thomas M.
Publication venue
Publication date: 02/12/2007
Field of study

For a classification problem described by the joint density

P(\omega,x)

, models of P(\omega\eq\omega'|x,x') (the ``Bayesian similarity measure'') have been shown to be an optimal similarity measure for nearest neighbor classification. This paper analyzes demonstrates several additional properties of that conditional distribution. The paper first shows that we can reconstruct, up to class labels, the class posterior distribution

P(\omega|x)

given P(\omega\eq\omega'|x,x'), gives a procedure for recovering the class labels, and gives an asymptotically Bayes-optimal classification procedure. It also shows, given such an optimal similarity measure, how to construct a classifier that outperforms the nearest neighbor classifier and achieves Bayes-optimal classification rates. The paper then analyzes Bayesian similarity in a framework where a classifier faces a number of related classification tasks (multitask learning) and illustrates that reconstruction of the class posterior distribution is not possible in general. Finally, the paper identifies a distinct class of classification problems using P(\omega\eq\omega'|x,x') and shows that using P(\omega\eq\omega'|x,x') to solve those problems is the Bayes optimal solution

arXiv.org e-Print Archive

Symbol Grounding Association in Multimodal Sequences with Missing Elements

Author: Breuel Thomas M.
Dengel Andreas
Liwicki Marcus
Raue Federico
Publication venue
Publication date: 07/12/2017
Field of study

In this paper, we extend a symbolic association framework for being able to handle missing elements in multimodal sequences. The general scope of the work is the symbolic associations of object-word mappings as it happens in language development in infants. In other words, two different representations of the same abstract concepts can associate in both directions. This scenario has been long interested in Artificial Intelligence, Psychology, and Neuroscience. In this work, we extend a recent approach for multimodal sequences (visual and audio) to also cope with missing elements in one or both modalities. Our method uses two parallel Long Short-Term Memories (LSTMs) with a learning rule based on EM-algorithm. It aligns both LSTM outputs via Dynamic Time Warping (DTW). We propose to include an extra step for the combination with the max operation for exploiting the common elements between both sequences. The motivation behind is that the combination acts as a condition selector for choosing the best representation from both LSTMs. We evaluated the proposed extension in the following scenarios: missing elements in one modality (visual or audio) and missing elements in both modalities (visual and sound). The performance of our extension reaches better results than the original model and similar results to individual LSTM trained in each modality.Comment: Under review on Journal of Artificial Intelligence Research (JAIR) -- Special Track on Deep Learning, Knowledge Representation, and Reasonin

arXiv.org e-Print Archive