7,543 research outputs found
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
In this paper, we propose to adopt the diffusion approximation tools to study
the dynamics of Oja's iteration which is an online stochastic gradient descent
method for the principal component analysis. Oja's iteration maintains a
running estimate of the true principal component from streaming data and enjoys
less temporal and spatial complexities. We show that the Oja's iteration for
the top eigenvector generates a continuous-state discrete-time Markov chain
over the unit sphere. We characterize the Oja's iteration in three phases using
diffusion approximation and weak convergence tools. Our three-phase analysis
further provides a finite-sample error bound for the running estimate, which
matches the minimax information lower bound for principal component analysis
under the additional assumption of bounded samples.Comment: Appeared in NIPS 201
Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization
Relative to the large literature on upper bounds on complexity of convex
optimization, lesser attention has been paid to the fundamental hardness of
these problems. Given the extensive use of convex optimization in machine
learning and statistics, gaining an understanding of these complexity-theoretic
issues is important. In this paper, we study the complexity of stochastic
convex optimization in an oracle model of computation. We improve upon known
results and obtain tight minimax complexity estimates for various function
classes
Recommended from our members
A review of portfolio planning: Models and systems
In this chapter, we first provide an overview of a number of portfolio planning models
which have been proposed and investigated over the last forty years. We revisit the
mean-variance (M-V) model of Markowitz and the construction of the risk-return
efficient frontier. A piecewise linear approximation of the problem through a
reformulation involving diagonalisation of the quadratic form into a variable
separable function is also considered. A few other models, such as, the Mean
Absolute Deviation (MAD), the Weighted Goal Programming (WGP) and the
Minimax (MM) model which use alternative metrics for risk are also introduced,
compared and contrasted. Recently asymmetric measures of risk have gained in
importance; we consider a generic representation and a number of alternative
symmetric and asymmetric measures of risk which find use in the evaluation of
portfolios. There are a number of modelling and computational considerations which
have been introduced into practical portfolio planning problems. These include: (a)
buy-in thresholds for assets, (b) restriction on the number of assets (cardinality
constraints), (c) transaction roundlot restrictions. Practical portfolio models may also
include (d) dedication of cashflow streams, and, (e) immunization which involves
duration matching and convexity constraints. The modelling issues in respect of these
features are discussed. Many of these features lead to discrete restrictions involving
zero-one and general integer variables which make the resulting model a quadratic
mixed-integer programming model (QMIP). The QMIP is a NP-hard problem; the
algorithms and solution methods for this class of problems are also discussed. The
issues of preparing the analytic data (financial datamarts) for this family of portfolio
planning problems are examined. We finally present computational results which
provide some indication of the state-of-the-art in the solution of portfolio optimisation
problems
A shortest-path based clustering algorithm for joint human-machine analysis of complex datasets
Clustering is a technique for the analysis of datasets obtained by empirical
studies in several disciplines with a major application for biomedical
research. Essentially, clustering algorithms are executed by machines aiming at
finding groups of related points in a dataset. However, the result of grouping
depends on both metrics for point-to-point similarity and rules for
point-to-group association. Indeed, non-appropriate metrics and rules can lead
to undesirable clustering artifacts. This is especially relevant for datasets,
where groups with heterogeneous structures co-exist. In this work, we propose
an algorithm that achieves clustering by exploring the paths between points.
This allows both, to evaluate the properties of the path (such as gaps, density
variations, etc.), and expressing the preference for certain paths. Moreover,
our algorithm supports the integration of existing knowledge about admissible
and non-admissible clusters by training a path classifier. We demonstrate the
accuracy of the proposed method on challenging datasets including points from
synthetic shapes in publicly available benchmarks and microscopy data
Mathematical Programming formulations for the efficient solution of the -sum approval voting problem
In this paper we address the problem of electing a committee among a set of
candidates and on the basis of the preferences of a set of voters. We
consider the approval voting method in which each voter can approve as many
candidates as she/he likes by expressing a preference profile (boolean
-vector). In order to elect a committee, a voting rule must be established
to `transform' the voters' profiles into a winning committee. The problem
is widely studied in voting theory; for a variety of voting rules the problem
was shown to be computationally difficult and approximation algorithms and
heuristic techniques were proposed in the literature. In this paper we follow
an Ordered Weighted Averaging approach and study the -sum approval voting
(optimization) problem in the general case . For this problem we
provide different mathematical programming formulations that allow us to solve
it in an exact solution framework. We provide computational results showing
that our approach is efficient for medium-size test problems ( up to 200,
up to 60) since in all tested cases it was able to find the exact optimal
solution in very short computational times
- …