36,029 research outputs found
Multi-Information Source Optimization
We consider Bayesian optimization of an expensive-to-evaluate black-box
objective function, where we also have access to cheaper approximations of the
objective. In general, such approximations arise in applications such as
reinforcement learning, engineering, and the natural sciences, and are subject
to an inherent, unknown bias. This model discrepancy is caused by an inadequate
internal model that deviates from reality and can vary over the domain, making
the utilization of these approximations a non-trivial task.
We present a novel algorithm that provides a rigorous mathematical treatment
of the uncertainties arising from model discrepancies and noisy observations.
Its optimization decisions rely on a value of information analysis that extends
the Knowledge Gradient factor to the setting of multiple information sources
that vary in cost: each sampling decision maximizes the predicted benefit per
unit cost.
We conduct an experimental evaluation that demonstrates that the method
consistently outperforms other state-of-the-art techniques: it finds designs of
considerably higher objective value and additionally inflicts less cost in the
exploration process.Comment: Added: benchmark logistic regression on MNIST/USPS, comparison to
MTBO/entropy search, estimation of hyper-parameter
Greedy Sensor Placement with Cost Constraints
The problem of optimally placing sensors under a cost constraint arises
naturally in the design of industrial and commercial products, as well as in
scientific experiments. We consider a relaxation of the full optimization
formulation of this problem and then extend a well-established QR-based greedy
algorithm for the optimal sensor placement problem without cost constraints. We
demonstrate the effectiveness of this algorithm on data sets related to facial
recognition, climate science, and fluid mechanics. This algorithm is scalable
and often identifies sparse sensors with near optimal reconstruction
performance, while dramatically reducing the overall cost of the sensors. We
find that the cost-error landscape varies by application, with intuitive
connections to the underlying physics. Additionally, we include experiments for
various pre-processing techniques and find that a popular technique based on
the singular value decomposition is often sub-optimal.Comment: 13 pages, 12 figure
Weighted Bilinear Coding over Salient Body Parts for Person Re-identification
Deep convolutional neural networks (CNNs) have demonstrated dominant
performance in person re-identification (Re-ID). Existing CNN based methods
utilize global average pooling (GAP) to aggregate intermediate convolutional
features for Re-ID. However, this strategy only considers the first-order
statistics of local features and treats local features at different locations
equally important, leading to sub-optimal feature representation. To deal with
these issues, we propose a novel weighted bilinear coding (WBC) framework for
local feature aggregation in CNN networks to pursue more representative and
discriminative feature representations, which can adapt to other
state-of-the-art methods and improve their performance. In specific, bilinear
coding is used to encode the channel-wise feature correlations to capture
richer feature interactions. Meanwhile, a weighting scheme is applied on the
bilinear coding to adaptively adjust the weights of local features at different
locations based on their importance in recognition, further improving the
discriminability of feature aggregation. To handle the spatial misalignment
issue, we use a salient part net (spatial attention module) to derive salient
body parts, and apply the WBC model on each part. The final representation,
formed by concatenating the WBC encoded features of each part, is both
discriminative and resistant to spatial misalignment. Experiments on three
benchmarks including Market-1501, DukeMTMC-reID and CUHK03 evidence the
favorable performance of our method against other outstanding methods.Comment: 22 page
Deep Learning: Computational Aspects
In this article we review computational aspects of Deep Learning (DL). Deep
learning uses network architectures consisting of hierarchical layers of latent
variables to construct predictors for high-dimensional input-output models.
Training a deep learning architecture is computationally intensive, and
efficient linear algebra libraries is the key for training and inference.
Stochastic gradient descent (SGD) optimization and batch sampling are used to
learn from massive data sets
An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data
Dimensionality reduction (DR) methods are commonly used for analyzing and
visualizing multidimensional data. However, when data is a live streaming feed,
conventional DR methods cannot be directly used because of their computational
complexity and inability to preserve the projected data positions at previous
time points. In addition, the problem becomes even more challenging when the
dynamic data records have a varying number of dimensions as often found in
real-world applications. This paper presents an incremental DR solution. We
enhance an existing incremental PCA method in several ways to ensure its
usability for visualizing streaming multidimensional data. First, we use
geometric transformation and animation methods to help preserve a viewer's
mental map when visualizing the incremental results. Second, to handle data
dimension variants, we use an optimization method to estimate the projected
data positions, and also convey the resulting uncertainty in the visualization.
We demonstrate the effectiveness of our design with two case studies using
real-world datasets.Comment: This is the author's version of the article that has been published
in IEEE Transactions on Visualization and Computer Graphics. The final
version of this record is available at: 10.1109/TVCG.2019.293443
Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes
The transfer of a neural network (CNN) trained to recognize objects to the
task of scene classification is considered. A Bag-of-Semantics (BoS)
representation is first induced, by feeding scene image patches to the object
CNN, and representing the scene image by the ensuing bag of posterior class
probability vectors (semantic posteriors). The encoding of the BoS with a
Fisher vector(FV) is then studied. A link is established between the FV of any
probabilistic model and the Q-function of the expectation-maximization(EM)
algorithm used to estimate its parameters by maximum likelihood. A network
implementation of the MFA Fisher Score (MFA-FS), denoted as the MFAFSNet, is
finally proposed to enable end-to-end training. Experiments with various object
CNNs and datasets show that the approach has state-of-the-art transfer
performance. Somewhat surprisingly, the scene classification results are
superior to those of a CNN explicitly trained for scene classification, using a
large scene dataset (Places). This suggests that holistic analysis is
insufficient for scene classification. The modeling of local object semantics
appears to be at least equally important. The two approaches are also shown to
be strongly complementary, leading to very large scene classification gains
when combined, and outperforming all previous scene classification approaches
by a sizeable marginComment: 16 pages, 11 figures, accepted by TPAM
Variable Binding for Sparse Distributed Representations: Theory and Applications
Symbolic reasoning and neural networks are often considered incompatible
approaches. Connectionist models known as Vector Symbolic Architectures (VSAs)
can potentially bridge this gap. However, classical VSAs and neural networks
are still considered incompatible. VSAs encode symbols by dense pseudo-random
vectors, where information is distributed throughout the entire neuron
population. Neural networks encode features locally, often forming sparse
vectors of neural activation. Following Rachkovskij (2001); Laiho et al.
(2015), we explore symbolic reasoning with sparse distributed representations.
The core operations in VSAs are dyadic operations between vectors to express
variable binding and the representation of sets. Thus, algebraic manipulations
enable VSAs to represent and process data structures in a vector space of fixed
dimensionality. Using techniques from compressed sensing, we first show that
variable binding between dense vectors in VSAs is mathematically equivalent to
tensor product binding between sparse vectors, an operation which increases
dimensionality. This result implies that dimensionality-preserving binding for
general sparse vectors must include a reduction of the tensor matrix into a
single sparse vector. Two options for sparsity-preserving variable binding are
investigated. One binding method for general sparse vectors extends earlier
proposals to reduce the tensor product into a vector, such as circular
convolution. The other method is only defined for sparse block-codes,
block-wise circular convolution. Our experiments reveal that variable binding
for block-codes has ideal properties, whereas binding for general sparse
vectors also works, but is lossy, similar to previous proposals. We demonstrate
a VSA with sparse block-codes in example applications, cognitive reasoning and
classification, and discuss its relevance for neuroscience and neural networks.Comment: 15 pages, 9 figure
Goal-Driven Cognition in the Brain: A Computational Framework
Current theoretical and computational models of dopamine-based reinforcement
learning are largely rooted in the classical behaviorist tradition, and
envision the organism as a purely reactive recipient of rewards and
punishments, with resulting behavior that essentially reflects the sum of this
reinforcement history. This framework is missing some fundamental features of
the affective nervous system, most importantly, the central role of goals in
driving and organizing behavior in a teleological manner. Even when
goal-directed behaviors are considered in current frameworks, they are
typically conceived of as arising in reaction to the environment, rather than
being in place from the start. We hypothesize that goal-driven cognition is
primary, and organized into two discrete phases: goal selection and goal
engaged, which each have a substantially different effective value function.
This dichotomy can potentially explain a wide range of phenomena, playing a
central role in many clinical disorders, such as depression, OCD, ADHD, and
PTSD, and providing a sensible account of the detailed biology and function of
the dopamine system and larger limbic system, including critical ventral and
medial prefrontal cortex. Computationally, reasoning backward from active goals
to action selection is more tractable than projecting alternative action
choices forward to compute possible outcomes. An explicit computational model
of these brain areas and their function in this goal-driven framework is
described, as are numerous testable predictions from this framework.Comment: 62 pages, 11 figure
How Evolution Learns to Generalise: Principles of under-fitting, over-fitting and induction in the evolution of developmental organisation
One of the most intriguing questions in evolution is how organisms exhibit
suitable phenotypic variation to rapidly adapt in novel selective environments
which is crucial for evolvability. Recent work showed that when selective
environments vary in a systematic manner, it is possible that development can
constrain the phenotypic space in regions that are evolutionarily more
advantageous. Yet, the underlying mechanism that enables the spontaneous
emergence of such adaptive developmental constraints is poorly understood. How
can natural selection, given its myopic and conservative nature, favour
developmental organisations that facilitate adaptive evolution in future
previously unseen environments? Such capacity suggests a form of
\textit{foresight} facilitated by the ability of evolution to accumulate and
exploit information not only about the particular phenotypes selected in the
past, but regularities in the environment that are also relevant to future
environments. Here we argue that the ability of evolution to discover such
regularities is analogous to the ability of learning systems to generalise from
past experience. Conversely, the canalisation of evolved developmental
processes to past selective environments and failure of natural selection to
enhance evolvability in future selective environments is directly analogous to
the problem of over-fitting and failure to generalise in machine learning. We
show that this analogy arises from an underlying mechanistic equivalence by
showing that conditions corresponding to those that alleviate over-fitting in
machine learning enhance the evolution of generalised developmental
organisations under natural selection. This equivalence provides access to a
well-developed theoretical framework that enables us to characterise the
conditions where natural selection will find general rather than particular
solutions to environmental conditions
Variational optimization in the AI era: Computational Graph States and Supervised Wave-function Optimization
Representing a target quantum state by a compact, efficient variational
wave-function is an important approach to the quantum many-body problem. In
this approach, the main challenges include the design of a suitable variational
ansatz and optimization of its parameters. In this work, we address both of
these challenges. First, we define the variational class of Computational Graph
States (CGS) which gives a uniform framework for describing all computable
variational ansatz. Secondly, we develop a novel optimization scheme,
supervised wave-function optimization (SWO), which systematically improves the
optimized wave-function by drawing on ideas from supervised learning. While SWO
can be used independently of CGS, utilizing them together provides a flexible
framework for the rapid design, prototyping and optimization of variational
wave-functions. We demonstrate CGS and SWO by optimizing for the ground state
wave-function of 1D and 2D Heisenberg models on nine different variational
architectures including architectures not previously used to represent quantum
many-body wave-functions and find they are energetically competitive to other
approaches. One interesting application of this architectural exploration is
that we show that fully convolution neural network wave-functions can be
optimized for one system size and, using identical parameters, produce accurate
energies for a range of system sizes. We expect these methods to increase the
rate of discovery of novel variational ansatz and bring further insights to the
quantum many body problem.Comment: 10 + 4 pages; 8 + 3 figure
- …