36,029 research outputs found

    Multi-Information Source Optimization

    Full text link
    We consider Bayesian optimization of an expensive-to-evaluate black-box objective function, where we also have access to cheaper approximations of the objective. In general, such approximations arise in applications such as reinforcement learning, engineering, and the natural sciences, and are subject to an inherent, unknown bias. This model discrepancy is caused by an inadequate internal model that deviates from reality and can vary over the domain, making the utilization of these approximations a non-trivial task. We present a novel algorithm that provides a rigorous mathematical treatment of the uncertainties arising from model discrepancies and noisy observations. Its optimization decisions rely on a value of information analysis that extends the Knowledge Gradient factor to the setting of multiple information sources that vary in cost: each sampling decision maximizes the predicted benefit per unit cost. We conduct an experimental evaluation that demonstrates that the method consistently outperforms other state-of-the-art techniques: it finds designs of considerably higher objective value and additionally inflicts less cost in the exploration process.Comment: Added: benchmark logistic regression on MNIST/USPS, comparison to MTBO/entropy search, estimation of hyper-parameter

    Greedy Sensor Placement with Cost Constraints

    Full text link
    The problem of optimally placing sensors under a cost constraint arises naturally in the design of industrial and commercial products, as well as in scientific experiments. We consider a relaxation of the full optimization formulation of this problem and then extend a well-established QR-based greedy algorithm for the optimal sensor placement problem without cost constraints. We demonstrate the effectiveness of this algorithm on data sets related to facial recognition, climate science, and fluid mechanics. This algorithm is scalable and often identifies sparse sensors with near optimal reconstruction performance, while dramatically reducing the overall cost of the sensors. We find that the cost-error landscape varies by application, with intuitive connections to the underlying physics. Additionally, we include experiments for various pre-processing techniques and find that a popular technique based on the singular value decomposition is often sub-optimal.Comment: 13 pages, 12 figure

    Weighted Bilinear Coding over Salient Body Parts for Person Re-identification

    Full text link
    Deep convolutional neural networks (CNNs) have demonstrated dominant performance in person re-identification (Re-ID). Existing CNN based methods utilize global average pooling (GAP) to aggregate intermediate convolutional features for Re-ID. However, this strategy only considers the first-order statistics of local features and treats local features at different locations equally important, leading to sub-optimal feature representation. To deal with these issues, we propose a novel weighted bilinear coding (WBC) framework for local feature aggregation in CNN networks to pursue more representative and discriminative feature representations, which can adapt to other state-of-the-art methods and improve their performance. In specific, bilinear coding is used to encode the channel-wise feature correlations to capture richer feature interactions. Meanwhile, a weighting scheme is applied on the bilinear coding to adaptively adjust the weights of local features at different locations based on their importance in recognition, further improving the discriminability of feature aggregation. To handle the spatial misalignment issue, we use a salient part net (spatial attention module) to derive salient body parts, and apply the WBC model on each part. The final representation, formed by concatenating the WBC encoded features of each part, is both discriminative and resistant to spatial misalignment. Experiments on three benchmarks including Market-1501, DukeMTMC-reID and CUHK03 evidence the favorable performance of our method against other outstanding methods.Comment: 22 page

    Deep Learning: Computational Aspects

    Full text link
    In this article we review computational aspects of Deep Learning (DL). Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. Stochastic gradient descent (SGD) optimization and batch sampling are used to learn from massive data sets

    An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data

    Full text link
    Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets.Comment: This is the author's version of the article that has been published in IEEE Transactions on Visualization and Computer Graphics. The final version of this record is available at: 10.1109/TVCG.2019.293443

    Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes

    Full text link
    The transfer of a neural network (CNN) trained to recognize objects to the task of scene classification is considered. A Bag-of-Semantics (BoS) representation is first induced, by feeding scene image patches to the object CNN, and representing the scene image by the ensuing bag of posterior class probability vectors (semantic posteriors). The encoding of the BoS with a Fisher vector(FV) is then studied. A link is established between the FV of any probabilistic model and the Q-function of the expectation-maximization(EM) algorithm used to estimate its parameters by maximum likelihood. A network implementation of the MFA Fisher Score (MFA-FS), denoted as the MFAFSNet, is finally proposed to enable end-to-end training. Experiments with various object CNNs and datasets show that the approach has state-of-the-art transfer performance. Somewhat surprisingly, the scene classification results are superior to those of a CNN explicitly trained for scene classification, using a large scene dataset (Places). This suggests that holistic analysis is insufficient for scene classification. The modeling of local object semantics appears to be at least equally important. The two approaches are also shown to be strongly complementary, leading to very large scene classification gains when combined, and outperforming all previous scene classification approaches by a sizeable marginComment: 16 pages, 11 figures, accepted by TPAM

    Variable Binding for Sparse Distributed Representations: Theory and Applications

    Full text link
    Symbolic reasoning and neural networks are often considered incompatible approaches. Connectionist models known as Vector Symbolic Architectures (VSAs) can potentially bridge this gap. However, classical VSAs and neural networks are still considered incompatible. VSAs encode symbols by dense pseudo-random vectors, where information is distributed throughout the entire neuron population. Neural networks encode features locally, often forming sparse vectors of neural activation. Following Rachkovskij (2001); Laiho et al. (2015), we explore symbolic reasoning with sparse distributed representations. The core operations in VSAs are dyadic operations between vectors to express variable binding and the representation of sets. Thus, algebraic manipulations enable VSAs to represent and process data structures in a vector space of fixed dimensionality. Using techniques from compressed sensing, we first show that variable binding between dense vectors in VSAs is mathematically equivalent to tensor product binding between sparse vectors, an operation which increases dimensionality. This result implies that dimensionality-preserving binding for general sparse vectors must include a reduction of the tensor matrix into a single sparse vector. Two options for sparsity-preserving variable binding are investigated. One binding method for general sparse vectors extends earlier proposals to reduce the tensor product into a vector, such as circular convolution. The other method is only defined for sparse block-codes, block-wise circular convolution. Our experiments reveal that variable binding for block-codes has ideal properties, whereas binding for general sparse vectors also works, but is lossy, similar to previous proposals. We demonstrate a VSA with sparse block-codes in example applications, cognitive reasoning and classification, and discuss its relevance for neuroscience and neural networks.Comment: 15 pages, 9 figure

    Goal-Driven Cognition in the Brain: A Computational Framework

    Full text link
    Current theoretical and computational models of dopamine-based reinforcement learning are largely rooted in the classical behaviorist tradition, and envision the organism as a purely reactive recipient of rewards and punishments, with resulting behavior that essentially reflects the sum of this reinforcement history. This framework is missing some fundamental features of the affective nervous system, most importantly, the central role of goals in driving and organizing behavior in a teleological manner. Even when goal-directed behaviors are considered in current frameworks, they are typically conceived of as arising in reaction to the environment, rather than being in place from the start. We hypothesize that goal-driven cognition is primary, and organized into two discrete phases: goal selection and goal engaged, which each have a substantially different effective value function. This dichotomy can potentially explain a wide range of phenomena, playing a central role in many clinical disorders, such as depression, OCD, ADHD, and PTSD, and providing a sensible account of the detailed biology and function of the dopamine system and larger limbic system, including critical ventral and medial prefrontal cortex. Computationally, reasoning backward from active goals to action selection is more tractable than projecting alternative action choices forward to compute possible outcomes. An explicit computational model of these brain areas and their function in this goal-driven framework is described, as are numerous testable predictions from this framework.Comment: 62 pages, 11 figure

    How Evolution Learns to Generalise: Principles of under-fitting, over-fitting and induction in the evolution of developmental organisation

    Full text link
    One of the most intriguing questions in evolution is how organisms exhibit suitable phenotypic variation to rapidly adapt in novel selective environments which is crucial for evolvability. Recent work showed that when selective environments vary in a systematic manner, it is possible that development can constrain the phenotypic space in regions that are evolutionarily more advantageous. Yet, the underlying mechanism that enables the spontaneous emergence of such adaptive developmental constraints is poorly understood. How can natural selection, given its myopic and conservative nature, favour developmental organisations that facilitate adaptive evolution in future previously unseen environments? Such capacity suggests a form of \textit{foresight} facilitated by the ability of evolution to accumulate and exploit information not only about the particular phenotypes selected in the past, but regularities in the environment that are also relevant to future environments. Here we argue that the ability of evolution to discover such regularities is analogous to the ability of learning systems to generalise from past experience. Conversely, the canalisation of evolved developmental processes to past selective environments and failure of natural selection to enhance evolvability in future selective environments is directly analogous to the problem of over-fitting and failure to generalise in machine learning. We show that this analogy arises from an underlying mechanistic equivalence by showing that conditions corresponding to those that alleviate over-fitting in machine learning enhance the evolution of generalised developmental organisations under natural selection. This equivalence provides access to a well-developed theoretical framework that enables us to characterise the conditions where natural selection will find general rather than particular solutions to environmental conditions

    Variational optimization in the AI era: Computational Graph States and Supervised Wave-function Optimization

    Full text link
    Representing a target quantum state by a compact, efficient variational wave-function is an important approach to the quantum many-body problem. In this approach, the main challenges include the design of a suitable variational ansatz and optimization of its parameters. In this work, we address both of these challenges. First, we define the variational class of Computational Graph States (CGS) which gives a uniform framework for describing all computable variational ansatz. Secondly, we develop a novel optimization scheme, supervised wave-function optimization (SWO), which systematically improves the optimized wave-function by drawing on ideas from supervised learning. While SWO can be used independently of CGS, utilizing them together provides a flexible framework for the rapid design, prototyping and optimization of variational wave-functions. We demonstrate CGS and SWO by optimizing for the ground state wave-function of 1D and 2D Heisenberg models on nine different variational architectures including architectures not previously used to represent quantum many-body wave-functions and find they are energetically competitive to other approaches. One interesting application of this architectural exploration is that we show that fully convolution neural network wave-functions can be optimized for one system size and, using identical parameters, produce accurate energies for a range of system sizes. We expect these methods to increase the rate of discovery of novel variational ansatz and bring further insights to the quantum many body problem.Comment: 10 + 4 pages; 8 + 3 figure
    • …
    corecore