156 research outputs found
Through the Wall Radar Imaging via Kronecker-structured Huber-type RPCA
The detection of multiple targets in an enclosed scene, from its outside, is
a challenging topic of research addressed by Through-the-Wall Radar Imaging
(TWRI). Traditionally, TWRI methods operate in two steps: first the removal of
wall clutter then followed by the recovery of targets positions. Recent
approaches manage in parallel the processing of the wall and targets via low
rank plus sparse matrix decomposition and obtain better performances. In this
paper, we reformulate this precisely via a RPCA-type problem, where the sparse
vector appears in a Kronecker product. We extend this approach by adding a
robust distance with flexible structure to handle heterogeneous noise and
outliers, which may appear in TWRI measurements. The resolution is achieved via
the Alternating Direction Method of Multipliers (ADMM) and variable splitting
to decouple the constraints. The removal of the front wall is achieved via a
closed-form proximal evaluation and the recovery of targets is possible via a
tailored Majorization-Minimization (MM) step. The analysis and validation of
our method is carried out using Finite-Difference Time-Domain (FDTD) simulated
data, which show the advantage of our method in detection performance over
complex scenarios
Reinforcement Learning Curricula as Interpolations between Task Distributions
In the last decade, the increased availability of powerful computing machinery has led to an increasingly widespread application of machine learning methods. Machine learning has been particularly successful when large models, typically neural networks with an ever-increasing number of parameters, can leverage vast data to make predictions.
While reinforcement learning (RL) has been no exception from this development, a distinguishing feature of RL is its well-known exploration-exploitation trade-off, whose optimal solution – while possible to model as a partially observable Markov decision process – evades computation in all but the simplest problems. Consequently, it seems unsurprising that notable demonstrations of reinforcement learning, such as an RL-based Go agent (AlphaGo) by Deepmind beating the professional Go player Lee Sedol, relied both on the availability of massive computing capabilities and specific forms of regularization that facilitate learning. In the case of AlphaGo, this regularization came in the form of self-play, enabling learning by interacting with gradually more proficient opponents.
In this thesis, we develop techniques that, similarly to the concept of self-play of AlphaGo, improve the learning performance of RL agents by training on sequences of increasingly complex tasks. These task sequences are typically called curricula and are known to side-step problems such as slow learning or convergence to poor behavior that may occur when directly learning in complicated tasks. The algorithms we develop in this thesis create curricula by minimizing distances or divergences between probability distributions of learning tasks, generating interpolations between an initial distribution of easy learning tasks and a target task distribution. Apart from improving the learning performance of RL agents in experiments, developing methods that realize curricula as interpolations between task distributions results in a nuanced picture of key aspects of successful reinforcement learning curricula.
In Chapter 1, we start this thesis by introducing required reinforcement learning notation and then motivating curriculum reinforcement learning from the perspective of continuation methods for non-linear optimization. Similar to curricula for reinforcement learning agents, continuation methods have been used in non-linear optimization to solve challenging optimization problems. This similarity provides an intuition about the effect of the curricula we aim to generate and their limits.
In Chapter 2, we transfer the concept of self-paced learning, initially proposed in the supervised learning community, to the problem of RL, showing that an automated curriculum generation for RL agents can be motivated by a regularized RL objective. This regularized RL objective implies generating a curriculum as a sequence of task distributions that trade off the expected agent performance against similarity to a specified distribution of target tasks. This view on curriculum RL contrasts existing approaches, as it motivates curricula via a regularized RL objective instead of generating them from a set of assumptions about an optimal curriculum. In experiments, we show that an approximate implementation of the aforementioned curriculum – that restricts the interpolating task distribution to a Gaussian – results in improved learning performance compared to regular reinforcement learning, matching or surpassing the performance of existing curriculum-based methods.
Subsequently, Chapter 3 builds up on the intuition of curricula as sequences of interpolating task distributions established in Chapter 2. Motivated by using more flexible task distribution representations, we show how parametric assumptions play a crucial role in the empirical success of the previous approach and subsequently uncover key ingredients that enable the generation of meaningful curricula without assuming a parametric model of the task distributions. One major ingredient is an explicit notion of task similarity via a distance function of two Markov Decision Processes. We turn towards optimal transport theory, allowing for flexible particle-based representations of the task distributions while properly considering the newly introduced metric structure of the task space. Combined with other improvements to our first method, such as a more aggressive restriction of the curriculum to tasks that are not too hard for the agent, the resulting approach delivers consistently high learning performance in multiple experiments.
In the final Chapter 4, we apply the refined method of Chapter 3 to a trajectory-tracking task, in which we task an RL agent to follow a three-dimensional reference trajectory with the tip of an inverted pendulum mounted on a Barrett Whole Arm Manipulator. The access to only positional information results in a partially observable system that, paired with its inherent instability, underactuation, and non-trivial kinematic structure, presents a challenge for modern reinforcement learning algorithms, which we tackle via curricula. The technically infinite-dimensional task space of target trajectories allows us to probe the developed curriculum learning method for flaws that have not surfaced in the rather low-dimensional experiments of the previous chapters. Through an improved optimization scheme that better respects the non-Euclidean structure of target trajectories, we reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to learn state estimation and control for non-linear tracking tasks jointly.
In summary, this thesis introduces a perspective on reinforcement learning curricula as interpolations between task distributions. The methods developed under this perspective enjoy a precise formulation as optimization problems and deliver empirical benefits throughout experiments. Building upon this precise formulation may allow future work to advance the formal understanding of reinforcement learning curricula and, with that, enable the solution of challenging decision-making and control problems with reinforcement learning
Majorization-Minimization for sparse SVMs
Several decades ago, Support Vector Machines (SVMs) were introduced for
performing binary classification tasks, under a supervised framework. Nowadays,
they often outperform other supervised methods and remain one of the most
popular approaches in the machine learning arena. In this work, we investigate
the training of SVMs through a smooth sparse-promoting-regularized squared
hinge loss minimization. This choice paves the way to the application of quick
training methods built on majorization-minimization approaches, benefiting from
the Lipschitz differentiabililty of the loss function. Moreover, the proposed
approach allows us to handle sparsity-preserving regularizers promoting the
selection of the most significant features, so enhancing the performance.
Numerical tests and comparisons conducted on three different datasets
demonstrate the good performance of the proposed methodology in terms of
qualitative metrics (accuracy, precision, recall, and F 1 score) as well as
computational cost
Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem
In this paper, we study a class of stochastic bilevel optimization problems,
also known as stochastic simple bilevel optimization, where we minimize a
smooth stochastic objective function over the optimal solution set of another
stochastic convex optimization problem. We introduce novel stochastic bilevel
optimization methods that locally approximate the solution set of the
lower-level problem via a stochastic cutting plane, and then run a conditional
gradient update with variance reduction techniques to control the error induced
by using stochastic gradients. For the case that the upper-level function is
convex, our method requires
stochastic
oracle queries to obtain a solution that is -optimal for the
upper-level and -optimal for the lower-level. This guarantee
improves the previous best-known complexity of
. Moreover, for the
case that the upper-level function is non-convex, our method requires at most
stochastic
oracle queries to find an -stationary point. In the
finite-sum setting, we show that the number of stochastic oracle calls required
by our method are and
for the convex and non-convex
settings, respectively, where
Maximally Machine-Learnable Portfolios
When it comes to stock returns, any form of predictability can bolster
risk-adjusted profitability. We develop a collaborative machine learning
algorithm that optimizes portfolio weights so that the resulting synthetic
security is maximally predictable. Precisely, we introduce MACE, a multivariate
extension of Alternating Conditional Expectations that achieves the
aforementioned goal by wielding a Random Forest on one side of the equation,
and a constrained Ridge Regression on the other. There are two key improvements
with respect to Lo and MacKinlay's original maximally predictable portfolio
approach. First, it accommodates for any (nonlinear) forecasting algorithm and
predictor set. Second, it handles large portfolios. We conduct exercises at the
daily and monthly frequency and report significant increases in predictability
and profitability using very little conditioning information. Interestingly,
predictability is found in bad as well as good times, and MACE successfully
navigates the debacle of 2022
Energy Transformer
Transformers have become the de facto models of choice in machine learning,
typically leading to impressive performance on many applications. At the same
time, the architectural development in the transformer world is mostly driven
by empirical findings, and the theoretical understanding of their architectural
building blocks is rather limited. In contrast, Dense Associative Memory models
or Modern Hopfield Networks have a well-established theoretical foundation, but
have not yet demonstrated truly impressive practical results. We propose a
transformer architecture that replaces the sequence of feedforward transformer
blocks with a single large Associative Memory model. Our novel architecture,
called Energy Transformer (or ET for short), has many of the familiar
architectural primitives that are often used in the current generation of
transformers. However, it is not identical to the existing architectures. The
sequence of transformer layers in ET is purposely designed to minimize a
specifically engineered energy function, which is responsible for representing
the relationships between the tokens. As a consequence of this computational
principle, the attention in ET is different from the conventional attention
mechanism. In this work, we introduce the theoretical foundations of ET,
explore it's empirical capabilities using the image completion task, and obtain
strong quantitative results on the graph anomaly detection task
Streaming Probabilistic PCA for Missing Data with Heteroscedastic Noise
Streaming principal component analysis (PCA) is an integral tool in
large-scale machine learning for rapidly estimating low-dimensional subspaces
of very high dimensional and high arrival-rate data with missing entries and
corrupting noise. However, modern trends increasingly combine data from a
variety of sources, meaning they may exhibit heterogeneous quality across
samples. Since standard streaming PCA algorithms do not account for non-uniform
noise, their subspace estimates can quickly degrade. On the other hand, the
recently proposed Heteroscedastic Probabilistic PCA Technique (HePPCAT)
addresses this heterogeneity, but it was not designed to handle missing entries
and streaming data, nor does it adapt to non-stationary behavior in time series
data. This paper proposes the Streaming HeteroscedASTic Algorithm for PCA
(SHASTA-PCA) to bridge this divide. SHASTA-PCA employs a stochastic alternating
expectation maximization approach that jointly learns the low-rank latent
factors and the unknown noise variances from streaming data that may have
missing entries and heteroscedastic noise, all while maintaining a low memory
and computational footprint. Numerical experiments validate the superior
subspace estimation of our method compared to state-of-the-art streaming PCA
algorithms in the heteroscedastic setting. Finally, we illustrate SHASTA-PCA
applied to highly-heterogeneous real data from astronomy.Comment: 19 pages, 6 figure
Training Methods of Multi-label Prediction Classifiers for Hyperspectral Remote Sensing Images
With their combined spectral depth and geometric resolution, hyperspectral
remote sensing images embed a wealth of complex, non-linear information that
challenges traditional computer vision techniques. Yet, deep learning methods
known for their representation learning capabilities prove more suitable for
handling such complexities. Unlike applications that focus on single-label,
pixel-level classification methods for hyperspectral remote sensing images, we
propose a multi-label, patch-level classification method based on a
two-component deep-learning network. We use patches of reduced spatial
dimension and a complete spectral depth extracted from the remote sensing
images. Additionally, we investigate three training schemes for our network:
Iterative, Joint, and Cascade. Experiments suggest that the Joint scheme is the
best-performing scheme; however, its application requires an expensive search
for the best weight combination of the loss constituents. The Iterative scheme
enables the sharing of features between the two parts of the network at the
early stages of training. It performs better on complex data with multi-labels.
Further experiments showed that methods designed with different architectures
performed well when trained on patches extracted and labeled according to our
sampling method.Comment: 1- Added references. 2- updated methodology figure and added new
figures to visualise the different training schemes and 3- Correcting typos
4- Revised introduction, no change in results or discussio
Robust and Sparse M-Estimation of DOA
A robust and sparse Direction of Arrival (DOA) estimator is derived for array
data that follows a Complex Elliptically Symmetric (CES) distribution with
zero-mean and finite second-order moments. The derivation allows to choose the
loss function and four loss functions are discussed in detail: the Gauss loss
which is the Maximum-Likelihood (ML) loss for the circularly symmetric complex
Gaussian distribution, the ML-loss for the complex multivariate
-distribution (MVT) with degrees of freedom, as well as Huber and
Tyler loss functions. For Gauss loss, the method reduces to Sparse Bayesian
Learning (SBL). The root mean square DOA error of the derived estimators is
discussed for Gaussian, MVT, and -contaminated data. The robust SBL
estimators perform well for all cases and nearly identical with classical SBL
for Gaussian noise
- …