813 research outputs found
Recovering Trees with Convex Clustering
Convex clustering refers, for given , to the minimization of \begin{eqnarray*} u(\gamma) & = &
\underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}^{n}{\lVert x_i - u_i
\rVert^2} + \gamma \sum_{i,j=1}^{n}{w_{ij} \lVert u_i - u_j\rVert},\\
\end{eqnarray*} where is an affinity that quantifies the
similarity between and . We prove that if the affinities
reflect a tree structure in the , then the
convex clustering solution path reconstructs the tree exactly. The main
technical ingredient implies the following combinatorial byproduct: for every
set of
distinct points, there exist at least points with the property that for
any of these points there is a unit vector such that,
when viewed from , `most' points lie in the direction \begin{eqnarray*}
\frac{1}{n-1}\sum_{i=1 \atop x_i \neq x}^{n}{ \left\langle \frac{x_i -
x}{\lVert x_i - x \rVert}, v \right\rangle} & \geq & \frac{1}{4}.
\end{eqnarray*}Comment: 26 pages, 7 figure
Techniques for Solving Sudoku Puzzles
Solving Sudoku puzzles is one of the most popular pastimes in the world.
Puzzles range in difficulty from easy to very challenging; the hardest puzzles
tend to have the most empty cells. The current paper explains and compares
three algorithms for solving Sudoku puzzles. Backtracking, simulated annealing,
and alternating projections are generic methods for attacking combinatorial
optimization problems. Our results favor backtracking. It infallibly solves a
Sudoku puzzle or deduces that a unique solution does not exist. However,
backtracking does not scale well in high-dimensional combinatorial
optimization. Hence, it is useful to expose students in the mathematical
sciences to the other two solution techniques in a concrete setting. Simulated
annealing shares a common structure with MCMC (Markov chain Monte Carlo) and
enjoys wide applicability. The method of alternating projections solves the
feasibility problem in convex programming. Converting a discrete optimization
problem into a continuous optimization problem opens up the possibility of
handling combinatorial problems of much higher dimensionality.Comment: 11 pages, 5 figure
Stable Estimation of a Covariance Matrix Guided by Nuclear Norm Penalties
Estimation of covariance matrices or their inverses plays a central role in
many statistical methods. For these methods to work reliably, estimated
matrices must not only be invertible but also well-conditioned. In this paper
we present an intuitive prior that shrinks the classic sample covariance
estimator towards a stable target. We prove that our estimator is consistent
and asymptotically efficient. Thus, it gracefully transitions towards the
sample covariance matrix as the number of samples grows relative to the number
of covariates. We also demonstrate the utility of our estimator in two standard
situations -- discriminant analysis and EM clustering -- when the number of
samples is dominated by or comparable to the number of covariates.Comment: 25 pages, 3 figure
-POD: A Method for -Means Clustering of Missing Data
The -means algorithm is often used in clustering applications but its
usage requires a complete data matrix. Missing data, however, is common in many
applications. Mainstream approaches to clustering missing data reduce the
missing data problem to a complete data formulation through either deletion or
imputation but these solutions may incur significant costs. Our -POD method
presents a simple extension of -means clustering for missing data that works
even when the missingness mechanism is unknown, when external information is
unavailable, and when there is significant missingness in the data.Comment: 26 pages, 7 table
Making Tensor Factorizations Robust to Non-Gaussian Noise
Tensors are multi-way arrays, and the Candecomp/Parafac (CP) tensor
factorization has found application in many different domains. The CP model is
typically fit using a least squares objective function, which is a maximum
likelihood estimate under the assumption of i.i.d. Gaussian noise. We
demonstrate that this loss function can actually be highly sensitive to
non-Gaussian noise. Therefore, we propose a loss function based on the 1-norm
because it can accommodate both Gaussian and grossly non-Gaussian
perturbations. We also present an alternating majorization-minimization
algorithm for fitting a CP model using our proposed loss function.Comment: Contributed presentation at the NIPS Workshop on Tensors, Kernels,
and Machine Learning, Whistler, BC, Canada, December 10, 201
Distance Majorization and Its Applications
The problem of minimizing a continuously differentiable convex function over
an intersection of closed convex sets is ubiquitous in applied mathematics. It
is particularly interesting when it is easy to project onto each separate set,
but nontrivial to project onto their intersection. Algorithms based on Newton's
method such as the interior point method are viable for small to medium-scale
problems. However, modern applications in statistics, engineering, and machine
learning are posing problems with potentially tens of thousands of parameters
or more. We revisit this convex programming problem and propose an algorithm
that scales well with dimensionality. Our proposal is an instance of a
sequential unconstrained minimization technique and revolves around three
ideas: the majorization-minimization (MM) principle, the classical penalty
method for constrained optimization, and quasi-Newton acceleration of
fixed-point algorithms. The performance of our distance majorization algorithms
is illustrated in several applications.Comment: 29 pages, 6 figure
Generalized Linear Model Regression under Distance-to-set Penalties
Estimation in generalized linear models (GLM) is complicated by the presence
of constraints. One can handle constraints by maximizing a penalized
log-likelihood. Penalties such as the lasso are effective in high dimensions,
but often lead to unwanted shrinkage. This paper explores instead penalizing
the squared distance to constraint sets. Distance penalties are more flexible
than algebraic and regularization penalties, and avoid the drawback of
shrinkage. To optimize distance penalized objectives, we make use of the
majorization-minimization principle. Resulting algorithms constructed within
this framework are amenable to acceleration and come with global convergence
guarantees. Applications to shape constraints, sparse regression, and
rank-restricted matrix regression on synthetic and real data showcase strong
empirical performance, even under non-convex constraints.Comment: 5 figure
Baseline Drift Estimation for Air Quality Data Using Quantile Trend Filtering
We address the problem of estimating smoothly varying baseline trends in time
series data. This problem arises in a wide range of fields, including
chemistry, macroeconomics, and medicine; however, our study is motivated by the
analysis of data from low cost air quality sensors. Our methods extend the
quantile trend filtering framework to enable the estimation of multiple
quantile trends simultaneously while ensuring that the quantiles do not cross.
To handle the computational challenge posed by very long time series, we
propose a parallelizable alternating direction method of moments (ADMM)
algorithm. The ADMM algorthim enables the estimation of trends in a piecewise
manner, both reducing the computation time and extending the limits of the
method to larger data sizes. We also address smoothing parameter selection and
propose a modified criterion based on the extended Bayesian Information
Criterion. Through simulation studies and our motivating application to low
cost air quality sensor data, we demonstrate that our model provides better
quantile trend estimates than existing methods and improves signal
classification of low-cost air quality sensor output
Co-manifold learning with missing data
Representation learning is typically applied to only one mode of a data
matrix, either its rows or columns. Yet in many applications, there is an
underlying geometry to both the rows and the columns. We propose utilizing this
coupled structure to perform co-manifold learning: uncovering the underlying
geometry of both the rows and the columns of a given matrix, where we focus on
a missing data setting. Our unsupervised approach consists of three components.
We first solve a family of optimization problems to estimate a complete matrix
at multiple scales of smoothness. We then use this collection of smooth matrix
estimates to compute pairwise distances on the rows and columns based on a new
multi-scale metric that implicitly introduces a coupling between the rows and
the columns. Finally, we construct row and column representations from these
multi-scale metrics. We demonstrate that our approach outperforms competing
methods in both data visualization and clustering.Comment: 16 pages, 9 figure
Convex Biclustering
In the biclustering problem, we seek to simultaneously group observations and
features. While biclustering has applications in a wide array of domains,
ranging from text mining to collaborative filtering, the problem of identifying
structure in high dimensional genomic data motivates this work. In this
context, biclustering enables us to identify subsets of genes that are
co-expressed only within a subset of experimental conditions. We present a
convex formulation of the biclustering problem that possesses a unique global
minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it.
Our approach generates an entire solution path of possible biclusters as a
single tuning parameter is varied. We also show how to reduce the problem of
selecting this tuning parameter to solving a trivial modification of the convex
biclustering problem. The key contributions of our work are its simplicity,
interpretability, and algorithmic guarantees - features that arguably are
lacking in the current alternative algorithms. We demonstrate the advantages of
our approach, which includes stably and reproducibly identifying biclusterings,
on simulated and real microarray data.Comment: 29 pages, 3 figure
- β¦