10,244 research outputs found
Decentralized learning for wireless communications and networking
This chapter deals with decentralized learning algorithms for in-network
processing of graph-valued data. A generic learning problem is formulated and
recast into a separable form, which is iteratively minimized using the
alternating-direction method of multipliers (ADMM) so as to gain the desired
degree of parallelization. Without exchanging elements from the distributed
training sets and keeping inter-node communications at affordable levels, the
local (per-node) learners consent to the desired quantity inferred globally,
meaning the one obtained if the entire training data set were centrally
available. Impact of the decentralized learning framework to contemporary
wireless communications and networking tasks is illustrated through case
studies including target tracking using wireless sensor networks, unveiling
Internet traffic anomalies, power system state estimation, as well as spectrum
cartography for wireless cognitive radio networks.Comment: Contributed chapter to appear in Splitting Methods in Communication
and Imaging, Science and Engineering, R. Glowinski, S. Osher, and W. Yin,
Editors, New York, Springer, 201
Stochastic Primal-Dual Algorithms with Faster Convergence than for Problems without Bilinear Structure
Previous studies on stochastic primal-dual algorithms for solving min-max
problems with faster convergence heavily rely on the bilinear structure of the
problem, which restricts their applicability to a narrowed range of problems.
The main contribution of this paper is the design and analysis of new
stochastic primal-dual algorithms that use a mixture of stochastic gradient
updates and a logarithmic number of deterministic dual updates for solving a
family of convex-concave problems with no bilinear structure assumed. Faster
convergence rates than with being the number of stochastic
gradient updates are established under some mild conditions of involved
functions on the primal and the dual variable. For example, for a family of
problems that enjoy a weak strong convexity in terms of the primal variable and
has a strongly concave function of the dual variable, the convergence rate of
the proposed algorithm is . We also investigate the effectiveness of
the proposed algorithms for learning robust models and empirical AUC
maximization
Bandit Convex Optimization for Scalable and Dynamic IoT Management
The present paper deals with online convex optimization involving both
time-varying loss functions, and time-varying constraints. The loss functions
are not fully accessible to the learner, and instead only the function values
(a.k.a. bandit feedback) are revealed at queried points. The constraints are
revealed after making decisions, and can be instantaneously violated, yet they
must be satisfied in the long term. This setting fits nicely the emerging
online network tasks such as fog computing in the Internet-of-Things (IoT),
where online decisions must flexibly adapt to the changing user preferences
(loss functions), and the temporally unpredictable availability of resources
(constraints). Tailored for such human-in-the-loop systems where the loss
functions are hard to model, a family of bandit online saddle-point (BanSaP)
schemes are developed, which adaptively adjust the online operations based on
(possibly multiple) bandit feedback of the loss functions, and the changing
environment. Performance here is assessed by: i) dynamic regret that
generalizes the widely used static regret; and, ii) fit that captures the
accumulated amount of constraint violations. Specifically, BanSaP is proved to
simultaneously yield sub-linear dynamic regret and fit, provided that the best
dynamic solutions vary slowly over time. Numerical tests in fog computation
offloading tasks corroborate that our proposed BanSaP approach offers
competitive performance relative to existing approaches that are based on
gradient feedback
Information based approach to stochastic control problems
An information based method for solving stochastic control problems with
partial observation has been proposed. First, the information-theoretic lower
bounds of the cost function has been analysed. It has been shown, under rather
weak assumptions, that reduction of the expected cost with closed-loop control
compared to the best open-loop strategy is upper bounded by non-decreasing
function of mutual information between control variables and the state
trajectory. On the basis of this result, an Information Based Control method
has been developed. The main idea of the IBC consists in replacing the original
control task by a sequence of control problems that are relatively easy to
solve and such that information about the state of the system is actively
generated. Two examples of the operation of the IBC are given. It has been
shown that the IBC is able to find the optimal solution without using dynamic
programming at least in these examples. Hence the computational complexity of
the IBC is substantially smaller than complexity of dynamic programming, which
is the main advantage of the proposed method.Comment: This is a preprint of an article accepted for publication in
International Journal of Applied Mathematics and Computer Science, AMCS, 20
pages, 1 figur
Reinforcement Learning: Stochastic Approximation Algorithms for Markov Decision Processes
This article presents a short and concise description of stochastic
approximation algorithms in reinforcement learning of Markov decision
processes. The algorithms can also be used as a suboptimal method for partially
observed Markov decision processes
Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy
We consider empirical risk minimization for large-scale datasets. We
introduce Ada Newton as an adaptive algorithm that uses Newton's method with
adaptive sample sizes. The main idea of Ada Newton is to increase the size of
the training set by a factor larger than one in a way that the minimization
variable for the current training set is in the local neighborhood of the
optimal argument of the next training set. This allows to exploit the quadratic
convergence property of Newton's method and reach the statistical accuracy of
each training set with only one iteration of Newton's method. We show
theoretically and empirically that Ada Newton can double the size of the
training set in each iteration to achieve the statistical accuracy of the full
training set with about two passes over the dataset
Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset
Recent research on problem formulations based on decomposition into low-rank
plus sparse matrices shows a suitable framework to separate moving objects from
the background. The most representative problem formulation is the Robust
Principal Component Analysis (RPCA) solved via Principal Component Pursuit
(PCP) which decomposes a data matrix in a low-rank matrix and a sparse matrix.
However, similar robust implicit or explicit decompositions can be made in the
following problem formulations: Robust Non-negative Matrix Factorization
(RNMF), Robust Matrix Completion (RMC), Robust Subspace Recovery (RSR), Robust
Subspace Tracking (RST) and Robust Low-Rank Minimization (RLRM). The main goal
of these similar problem formulations is to obtain explicitly or implicitly a
decomposition into low-rank matrix plus additive matrices. In this context,
this work aims to initiate a rigorous and comprehensive review of the similar
problem formulations in robust subspace learning and tracking based on
decomposition into low-rank plus additive matrices for testing and ranking
existing algorithms for background/foreground separation. For this, we first
provide a preliminary review of the recent developments in the different
problem formulations which allows us to define a unified view that we called
Decomposition into Low-rank plus Additive Matrices (DLAM). Then, we examine
carefully each method in each robust subspace learning/tracking frameworks with
their decomposition, their loss functions, their optimization problem and their
solvers. Furthermore, we investigate if incremental algorithms and real-time
implementations can be achieved for background/foreground separation. Finally,
experimental results on a large-scale dataset called Background Models
Challenge (BMC 2012) show the comparative performance of 32 different robust
subspace learning/tracking methods.Comment: 121 pages, 5 figures, submitted to Computer Science Review. arXiv
admin note: text overlap with arXiv:1312.7167, arXiv:1109.6297,
arXiv:1207.3438, arXiv:1105.2126, arXiv:1404.7592, arXiv:1210.0805,
arXiv:1403.8067 by other authors, Computer Science Review, November 201
Scaling-up Distributed Processing of Data Streams for Machine Learning
Emerging applications of machine learning in numerous areas involve
continuous gathering of and learning from streams of data. Real-time
incorporation of streaming data into the learned models is essential for
improved inference in these applications. Further, these applications often
involve data that are either inherently gathered at geographically distributed
entities or that are intentionally distributed across multiple machines for
memory, computational, and/or privacy reasons. Training of models in this
distributed, streaming setting requires solving stochastic optimization
problems in a collaborative manner over communication links between the
physical entities. When the streaming data rate is high compared to the
processing capabilities of compute nodes and/or the rate of the communications
links, this poses a challenging question: how can one best leverage the
incoming data for distributed training under constraints on computing
capabilities and/or communications rate? A large body of research has emerged
in recent decades to tackle this and related problems. This paper reviews
recently developed methods that focus on large-scale distributed stochastic
optimization in the compute- and bandwidth-limited regime, with an emphasis on
convergence analysis that explicitly accounts for the mismatch between
computation, communication and streaming rates. In particular, it focuses on
methods that solve: (i) distributed stochastic convex problems, and (ii)
distributed principal component analysis, which is a nonconvex problem with
geometric structure that permits global convergence. For such methods, the
paper discusses recent advances in terms of distributed algorithmic designs
when faced with high-rate streaming data. Further, it reviews guarantees
underlying these methods, which show there exist regimes in which systems can
learn from distributed, streaming data at order-optimal rates.Comment: 45 pages, 9 figures; preprint of a journal paper published in
Proceedings of the IEEE (Special Issue on Optimization for Data-driven
Learning and Control
Low-complexity modeling of partially available second-order statistics: theory and an efficient matrix completion algorithm
State statistics of linear systems satisfy certain structural constraints
that arise from the underlying dynamics and the directionality of input
disturbances. In the present paper we study the problem of completing partially
known state statistics. Our aim is to develop tools that can be used in the
context of control-oriented modeling of large-scale dynamical systems. For the
type of applications we have in mind, the dynamical interaction between state
variables is known while the directionality and dynamics of input excitation is
often uncertain. Thus, the goal of the mathematical problem that we formulate
is to identify the dynamics and directionality of input excitation in order to
explain and complete observed sample statistics. More specifically, we seek to
explain correlation data with the least number of possible input disturbance
channels. We formulate this inverse problem as rank minimization, and for its
solution, we employ a convex relaxation based on the nuclear norm. The
resulting optimization problem is cast as a semidefinite program and can be
solved using general-purpose solvers. For problem sizes that these solvers
cannot handle, we develop a customized alternating minimization algorithm
(AMA). We interpret AMA as a proximal gradient for the dual problem and prove
sub-linear convergence for the algorithm with fixed step-size. We conclude with
an example that illustrates the utility of our modeling and optimization
framework and draw contrast between AMA and the commonly used alternating
direction method of multipliers (ADMM) algorithm.Comment: Submitted to IEEE Transactions on Automatic Contro
Asynchronous Decentralized Stochastic Optimization in Heterogeneous Networks
We consider expected risk minimization in multi-agent systems comprised of
distinct subsets of agents operating without a common time-scale. Each
individual in the network is charged with minimizing the global objective
function, which is an average of sum of the statistical average loss function
of each agent in the network. Since agents are not assumed to observe data from
identical distributions, the hypothesis that all agents seek a common action is
violated, and thus the hypothesis upon which consensus constraints are
formulated is violated. Thus, we consider nonlinear network proximity
constraints which incentivize nearby nodes to make decisions which are close to
one another but not necessarily coincide. Moreover, agents are not assumed to
receive their sequentially arriving observations on a common time index, and
thus seek to learn in an asynchronous manner. An asynchronous stochastic
variant of the Arrow-Hurwicz saddle point method is proposed to solve this
problem which operates by alternating primal stochastic descent steps and
Lagrange multiplier updates which penalize the discrepancies between agents.
This tool leads to an implementation that allows for each agent to operate
asynchronously with local information only and message passing with neighbors.
Our main result establishes that the proposed method yields convergence in
expectation both in terms of the primal sub-optimality and constraint violation
to radii of sizes and ,
respectively. Empirical evaluation on an asynchronously operating wireless
network that manages user channel interference through an adaptive
communications pricing mechanism demonstrates that our theoretical results
translates well to practice
- …