158,947 research outputs found

    Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

    Full text link
    Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). In this setting, we reveal that the success of CoT can be attributed to breaking down in-context learning of a compositional function into two distinct phases: focusing on data related to each step of the composition and in-context learning the single-step composition function. Through both experimental and theoretical evidence, we demonstrate how CoT significantly reduces the sample complexity of in-context learning (ICL) and facilitates the learning of complex functions that non-CoT methods struggle with. Furthermore, we illustrate how transformers can transition from vanilla in-context learning to mastering a compositional function with CoT by simply incorporating an additional layer that performs the necessary filtering for CoT via the attention mechanism. In addition to these test-time benefits, we highlight how CoT accelerates pretraining by learning shortcuts to represent complex functions and how filtering plays an important role in pretraining. These findings collectively provide insights into the mechanics of CoT, inviting further investigation of its role in complex reasoning tasks

    Inference via low-dimensional couplings

    Full text link
    We investigate the low-dimensional structure of deterministic transformations between random variables, i.e., transport maps between probability measures. In the context of statistics and machine learning, these transformations can be used to couple a tractable "reference" measure (e.g., a standard Gaussian) with a target measure of interest. Direct simulation from the desired measure can then be achieved by pushing forward reference samples through the map. Yet characterizing such a map---e.g., representing and evaluating it---grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of low-dimensional couplings, induced by transport maps that are sparse and/or decomposable. Our analysis not only facilitates the construction of transformations in high-dimensional settings, but also suggests new inference methodologies for continuous non-Gaussian graphical models. For instance, in the context of nonlinear state-space models, we describe new variational algorithms for filtering, smoothing, and sequential parameter inference. These algorithms can be understood as the natural generalization---to the non-Gaussian case---of the square-root Rauch-Tung-Striebel Gaussian smoother.Comment: 78 pages, 25 figure

    On the Projective Geometry of Kalman Filter

    Full text link
    Convergence of the Kalman filter is best analyzed by studying the contraction of the Riccati map in the space of positive definite (covariance) matrices. In this paper, we explore how this contraction property relates to a more fundamental non-expansiveness property of filtering maps in the space of probability distributions endowed with the Hilbert metric. This is viewed as a preliminary step towards improving the convergence analysis of filtering algorithms over general graphical models.Comment: 6 page

    Development of an automated aircraft subsystem architecture generation and analysis tool

    Get PDF
    Purpose – The purpose of this paper is to present a new computational framework to address future preliminary design needs for aircraft subsystems. The ability to investigate multiple candidate technologies forming subsystem architectures is enabled with the provision of automated architecture generation, analysis and optimization. Main focus lies with a demonstration of the frameworks workings, as well as the optimizers performance with a typical form of application problem. Design/methodology/approach – The core aspects involve a functional decomposition, coupled with a synergistic mission performance analysis on the aircraft, architecture and component levels. This may be followed by a complete enumeration of architectures, combined with a user defined technology filtering and concept ranking procedure. In addition, a hybrid heuristic optimizer, based on ant systems optimization and a genetic algorithm, is employed to produce optimal architectures in both component composition and design parameters. The optimizer is tested on a generic architecture design problem combined with modified Griewank and parabolic functions for the continuous space. Findings – Insights from the generalized application problem show consistent rediscovery of the optimal architectures with the optimizer, as compared to a full problem enumeration. In addition multi-objective optimization reveals a Pareto front with differences in component composition as well as continuous parameters. Research limitations/implications – This paper demonstrates the frameworks application on a generalized test problem only. Further publication will consider real engineering design problems. Originality/value – The paper addresses the need for future conceptual design methods of complex systems to consider a mixed concept space of both discrete and continuous nature via automated methods

    Recursive Estimation of Orientation Based on the Bingham Distribution

    Full text link
    Directional estimation is a common problem in many tracking applications. Traditional filters such as the Kalman filter perform poorly because they fail to take the periodic nature of the problem into account. We present a recursive filter for directional data based on the Bingham distribution in two dimensions. The proposed filter can be applied to circular filtering problems with 180 degree symmetry, i.e., rotations by 180 degrees cannot be distinguished. It is easily implemented using standard numerical techniques and suitable for real-time applications. The presented approach is extensible to quaternions, which allow tracking arbitrary three-dimensional orientations. We evaluate our filter in a challenging scenario and compare it to a traditional Kalman filtering approach

    Kolmogorov equations on spaces of measures associated to nonlinear filtering processes

    Full text link
    We introduce and study some backward Kolmogorov equations associated to stochastic filtering problems. Measure-valued processed arise naturally in the context of stochastic filtering and one can formulate two stochastic differential equations, called Zakai and Kushner-Stratonovitch equation, that are satisfied by a positive measure and a probability measure-valued process respectively. The associated Kolmogorov equations have been intensively studied, mainly assuming that the measure-valued processes admit a density and then by exploiting stochastic calculus techniques in Hilbert spaces. Our approach differs from this since we do not assume the existence of a density and we work directly in the context of measures. We first formulate two Kolmogorov equations of parabolic type, one on a space of positive measures and one on a space of probability measures, and then we prove existence and uniqueness of classical solutions. In order to do that, we prove some intermediate results of independent interest. In particular, we prove It\^o formulas for the composition of measure-valued filtering processes and real-valued functions. Moreover we study the regularity of the solution to the filtering equations with respect to the initial datum. In order to achieve these results, proper notions of derivatives on space of positive measures have been introduced and discussed

    Deformable kernels for early vision

    Get PDF
    Early vision algorithms often have a first stage of linear-filtering that `extracts' from the image information at multiple scales of resolution and multiple orientations. A common difficulty in the design and implementation of such schemes is that one feels compelled to discretize coarsely the space of scales and orientations in order to reduce computation and storage costs. A technique is presented that allows: 1) computing the best approximation of a given family using linear combinations of a small number of `basis' functions; and 2) describing all finite-dimensional families, i.e., the families of filters for which a finite dimensional representation is possible with no error. The technique is based on singular value decomposition and may be applied to generating filters in arbitrary dimensions and subject to arbitrary deformations. The relevant functional analysis results are reviewed and precise conditions for the decomposition to be feasible are stated. Experimental results are presented that demonstrate the applicability of the technique to generating multiorientation multi-scale 2D edge-detection kernels. The implementation issues are also discussed
    corecore