158,947 research outputs found
Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs
Chain-of-thought (CoT) is a method that enables language models to handle
complex reasoning tasks by decomposing them into simpler steps. Despite its
success, the underlying mechanics of CoT are not yet fully understood. In an
attempt to shed light on this, our study investigates the impact of CoT on the
ability of transformers to in-context learn a simple to study, yet general
family of compositional functions: multi-layer perceptrons (MLPs). In this
setting, we reveal that the success of CoT can be attributed to breaking down
in-context learning of a compositional function into two distinct phases:
focusing on data related to each step of the composition and in-context
learning the single-step composition function. Through both experimental and
theoretical evidence, we demonstrate how CoT significantly reduces the sample
complexity of in-context learning (ICL) and facilitates the learning of complex
functions that non-CoT methods struggle with. Furthermore, we illustrate how
transformers can transition from vanilla in-context learning to mastering a
compositional function with CoT by simply incorporating an additional layer
that performs the necessary filtering for CoT via the attention mechanism. In
addition to these test-time benefits, we highlight how CoT accelerates
pretraining by learning shortcuts to represent complex functions and how
filtering plays an important role in pretraining. These findings collectively
provide insights into the mechanics of CoT, inviting further investigation of
its role in complex reasoning tasks
Inference via low-dimensional couplings
We investigate the low-dimensional structure of deterministic transformations
between random variables, i.e., transport maps between probability measures. In
the context of statistics and machine learning, these transformations can be
used to couple a tractable "reference" measure (e.g., a standard Gaussian) with
a target measure of interest. Direct simulation from the desired measure can
then be achieved by pushing forward reference samples through the map. Yet
characterizing such a map---e.g., representing and evaluating it---grows
challenging in high dimensions. The central contribution of this paper is to
establish a link between the Markov properties of the target measure and the
existence of low-dimensional couplings, induced by transport maps that are
sparse and/or decomposable. Our analysis not only facilitates the construction
of transformations in high-dimensional settings, but also suggests new
inference methodologies for continuous non-Gaussian graphical models. For
instance, in the context of nonlinear state-space models, we describe new
variational algorithms for filtering, smoothing, and sequential parameter
inference. These algorithms can be understood as the natural
generalization---to the non-Gaussian case---of the square-root
Rauch-Tung-Striebel Gaussian smoother.Comment: 78 pages, 25 figure
On the Projective Geometry of Kalman Filter
Convergence of the Kalman filter is best analyzed by studying the contraction
of the Riccati map in the space of positive definite (covariance) matrices. In
this paper, we explore how this contraction property relates to a more
fundamental non-expansiveness property of filtering maps in the space of
probability distributions endowed with the Hilbert metric. This is viewed as a
preliminary step towards improving the convergence analysis of filtering
algorithms over general graphical models.Comment: 6 page
Development of an automated aircraft subsystem architecture generation and analysis tool
Purpose – The purpose of this paper is to present a new computational framework to address future
preliminary design needs for aircraft subsystems. The ability to investigate multiple candidate
technologies forming subsystem architectures is enabled with the provision of automated architecture
generation, analysis and optimization. Main focus lies with a demonstration of the frameworks
workings, as well as the optimizers performance with a typical form of application problem.
Design/methodology/approach – The core aspects involve a functional decomposition, coupled
with a synergistic mission performance analysis on the aircraft, architecture and component levels.
This may be followed by a complete enumeration of architectures, combined with a user defined
technology filtering and concept ranking procedure. In addition, a hybrid heuristic optimizer, based on
ant systems optimization and a genetic algorithm, is employed to produce optimal architectures in both
component composition and design parameters. The optimizer is tested on a generic architecture
design problem combined with modified Griewank and parabolic functions for the continuous space.
Findings – Insights from the generalized application problem show consistent rediscovery of the
optimal architectures with the optimizer, as compared to a full problem enumeration. In addition
multi-objective optimization reveals a Pareto front with differences in component composition as well
as continuous parameters.
Research limitations/implications – This paper demonstrates the frameworks application on a
generalized test problem only. Further publication will consider real engineering design problems.
Originality/value – The paper addresses the need for future conceptual design methods of complex
systems to consider a mixed concept space of both discrete and continuous nature via automated methods
Recursive Estimation of Orientation Based on the Bingham Distribution
Directional estimation is a common problem in many tracking applications.
Traditional filters such as the Kalman filter perform poorly because they fail
to take the periodic nature of the problem into account. We present a recursive
filter for directional data based on the Bingham distribution in two
dimensions. The proposed filter can be applied to circular filtering problems
with 180 degree symmetry, i.e., rotations by 180 degrees cannot be
distinguished. It is easily implemented using standard numerical techniques and
suitable for real-time applications. The presented approach is extensible to
quaternions, which allow tracking arbitrary three-dimensional orientations. We
evaluate our filter in a challenging scenario and compare it to a traditional
Kalman filtering approach
Kolmogorov equations on spaces of measures associated to nonlinear filtering processes
We introduce and study some backward Kolmogorov equations associated to
stochastic filtering problems. Measure-valued processed arise naturally in the
context of stochastic filtering and one can formulate two stochastic
differential equations, called Zakai and Kushner-Stratonovitch equation, that
are satisfied by a positive measure and a probability measure-valued process
respectively. The associated Kolmogorov equations have been intensively
studied, mainly assuming that the measure-valued processes admit a density and
then by exploiting stochastic calculus techniques in Hilbert spaces.
Our approach differs from this since we do not assume the existence of a
density and we work directly in the context of measures. We first formulate two
Kolmogorov equations of parabolic type, one on a space of positive measures and
one on a space of probability measures, and then we prove existence and
uniqueness of classical solutions. In order to do that, we prove some
intermediate results of independent interest. In particular, we prove It\^o
formulas for the composition of measure-valued filtering processes and
real-valued functions. Moreover we study the regularity of the solution to the
filtering equations with respect to the initial datum. In order to achieve
these results, proper notions of derivatives on space of positive measures have
been introduced and discussed
Deformable kernels for early vision
Early vision algorithms often have a first stage of linear-filtering that `extracts' from the image information at multiple scales of resolution and multiple orientations. A common difficulty in the design and implementation of such schemes is that one feels compelled to discretize coarsely the space of scales and orientations in order to reduce computation and storage costs. A technique is presented that allows: 1) computing the best approximation of a given family using linear combinations of a small number of `basis' functions; and 2) describing all finite-dimensional families, i.e., the families of filters for which a finite dimensional representation is possible with no error. The technique is based on singular value decomposition and may be applied to generating filters in arbitrary dimensions and subject to arbitrary deformations. The relevant functional analysis results are reviewed and precise conditions for the decomposition to be feasible are stated. Experimental results are presented that demonstrate the applicability of the technique to generating multiorientation multi-scale 2D edge-detection kernels. The implementation issues are also discussed
- …