175 research outputs found
Learning Similarity for Character Recognition and 3D Object Recognition
I describe an approach to similarity motivated by Bayesian methods. This
yields a similarity function that is learnable using a standard Bayesian
methods. The relationship of the approach to variable kernel and variable
metric methods is discussed. The approach is related to variable kernel
Experimental results on character recognition and 3D object recognition are
presented.
Possible Mechanisms for Neural Reconfigurability and their Implications
The paper introduces a biologically and evolutionarily plausible neural
architecture that allows a single group of neurons, or an entire cortical
pathway, to be dynamically reconfigured to perform multiple, potentially very
different computations. The paper shows that reconfigurability can account for
the observed stochastic and distributed coding behavior of neurons and provides
a parsimonious explanation for timing phenomena in psychophysical experiments.
It also shows that reconfigurable pathways correspond to classes of statistical
classifiers that include decision lists, decision trees, and hierarchical
Bayesian methods. Implications for the interpretation of neurophysiological and
psychophysical results are discussed, and future experiments for testing the
reconfigurability hypothesis are explored
A Note on Approximate Nearest Neighbor Methods
A number of authors have described randomized algorithms for solving the
epsilon-approximate nearest neighbor problem. In this note I point out that the
epsilon-approximate nearest neighbor property often fails to be a useful
approximation property, since epsilon-approximate solutions fail to satisfy the
necessary preconditions for using nearest neighbors for classification and
related tasks.Comment: The report was originally written in 2005 and does not reference
information after that dat
On the Convergence of SGD Training of Neural Networks
Neural networks are usually trained by some form of stochastic gradient
descent (SGD)). A number of strategies are in common use intended to improve
SGD optimization, such as learning rate schedules, momentum, and batching.
These are motivated by ideas about the occurrence of local minima at different
scales, valleys, and other phenomena in the objective function. Empirical
results presented here suggest that these phenomena are not significant factors
in SGD optimization of MLP-related objective functions, and that the behavior
of stochastic gradient descent in these problems is better described as the
simultaneous convergence at different rates of many, largely non-interacting
subproblem
The Effects of Hyperparameters on SGD Training of Neural Networks
The performance of neural network classifiers is determined by a number of
hyperparameters, including learning rate, batch size, and depth. A number of
attempts have been made to explore these parameters in the literature, and at
times, to develop methods for optimizing them. However, exploration of
parameter spaces has often been limited. In this note, I report the results of
large scale experiments exploring these different parameters and their
interactions
Efficient Estimation of k for the Nearest Neighbors Class of Methods
The k Nearest Neighbors (kNN) method has received much attention in the past
decades, where some theoretical bounds on its performance were identified and
where practical optimizations were proposed for making it work fairly well in
high dimensional spaces and on large datasets. From countless experiments of
the past it became widely accepted that the value of k has a significant impact
on the performance of this method. However, the efficient optimization of this
parameter has not received so much attention in literature. Today, the most
common approach is to cross-validate or bootstrap this value for all values in
question. This approach forces distances to be recomputed many times, even if
efficient methods are used. Hence, estimating the optimal k can become
expensive even on modern systems. Frequently, this circumstance leads to a
sparse manual search of k. In this paper we want to point out that a systematic
and thorough estimation of the parameter k can be performed efficiently. The
discussed approach relies on large matrices, but we want to argue, that in
practice a higher space complexity is often much less of a problem than
repetitive distance computations.Comment: Technical Report, 16p, alternative source:
http://lodwich.net/Science.htm
Unsupervised Image-to-Image Translation Networks
Unsupervised image-to-image translation aims at learning a joint distribution
of images in different domains by using images from the marginal distributions
in individual domains. Since there exists an infinite set of joint
distributions that can arrive the given marginal distributions, one could infer
nothing about the joint distribution from the marginal distributions without
additional assumptions. To address the problem, we make a shared-latent space
assumption and propose an unsupervised image-to-image translation framework
based on Coupled GANs. We compare the proposed framework with competing
approaches and present high quality image translation results on various
challenging unsupervised image translation tasks, including street scene image
translation, animal image translation, and face image translation. We also
apply the proposed framework to domain adaptation and achieve state-of-the-art
performance on benchmark datasets. Code and additional results are available in
https://github.com/mingyuliutw/unit .Comment: NIPS 2017, 11 pages, 6 figure
View Based Methods can achieve Bayes-Optimal 3D Recognition
This paper proves that visual object recognition systems using only 2D
Euclidean similarity measurements to compare object views against previously
seen views can achieve the same recognition performance as observers having
access to all coordinate information and able of using arbitrary 3D models
internally. Furthermore, it demonstrates that such systems do not require more
training views than Bayes-optimal 3D model-based systems. For building computer
vision systems, these results imply that using view-based or appearance-based
techniques with carefully constructed combination of evidence mechanisms may
not be at a disadvantage relative to 3D model-based systems. For computational
approaches to human vision, they show that it is impossible to distinguish
view-based and 3D model-based techniques for 3D object recognition solely by
comparing the performance achievable by human and 3D model-based systems.
On the Relationship between the Posterior and Optimal Similarity
For a classification problem described by the joint density ,
models of P(\omega\eq\omega'|x,x') (the ``Bayesian similarity measure'') have
been shown to be an optimal similarity measure for nearest neighbor
classification. This paper analyzes demonstrates several additional properties
of that conditional distribution. The paper first shows that we can
reconstruct, up to class labels, the class posterior distribution
given P(\omega\eq\omega'|x,x'), gives a procedure for recovering the class
labels, and gives an asymptotically Bayes-optimal classification procedure. It
also shows, given such an optimal similarity measure, how to construct a
classifier that outperforms the nearest neighbor classifier and achieves
Bayes-optimal classification rates. The paper then analyzes Bayesian similarity
in a framework where a classifier faces a number of related classification
tasks (multitask learning) and illustrates that reconstruction of the class
posterior distribution is not possible in general. Finally, the paper
identifies a distinct class of classification problems using
P(\omega\eq\omega'|x,x') and shows that using P(\omega\eq\omega'|x,x') to
solve those problems is the Bayes optimal solution
Symbol Grounding Association in Multimodal Sequences with Missing Elements
In this paper, we extend a symbolic association framework for being able to
handle missing elements in multimodal sequences. The general scope of the work
is the symbolic associations of object-word mappings as it happens in language
development in infants. In other words, two different representations of the
same abstract concepts can associate in both directions. This scenario has been
long interested in Artificial Intelligence, Psychology, and Neuroscience. In
this work, we extend a recent approach for multimodal sequences (visual and
audio) to also cope with missing elements in one or both modalities. Our method
uses two parallel Long Short-Term Memories (LSTMs) with a learning rule based
on EM-algorithm. It aligns both LSTM outputs via Dynamic Time Warping (DTW). We
propose to include an extra step for the combination with the max operation for
exploiting the common elements between both sequences. The motivation behind is
that the combination acts as a condition selector for choosing the best
representation from both LSTMs. We evaluated the proposed extension in the
following scenarios: missing elements in one modality (visual or audio) and
missing elements in both modalities (visual and sound). The performance of our
extension reaches better results than the original model and similar results to
individual LSTM trained in each modality.Comment: Under review on Journal of Artificial Intelligence Research (JAIR) --
Special Track on Deep Learning, Knowledge Representation, and Reasonin
- …