44,166 research outputs found
A Quantile Variant of the EM Algorithm and Its Applications to Parameter Estimation with Interval Data
The expectation-maximization (EM) algorithm is a powerful computational
technique for finding the maximum likelihood estimates for parametric models
when the data are not fully observed. The EM is best suited for situations
where the expectation in each E-step and the maximization in each M-step are
straightforward. A difficulty with the implementation of the EM algorithm is
that each E-step requires the integration of the log-likelihood function in
closed form. The explicit integration can be avoided by using what is known as
the Monte Carlo EM (MCEM) algorithm. The MCEM uses a random sample to estimate
the integral at each E-step. However, the problem with the MCEM is that it
often converges to the integral quite slowly and the convergence behavior can
also be unstable, which causes a computational burden. In this paper, we
propose what we refer to as the quantile variant of the EM (QEM) algorithm. We
prove that the proposed QEM method has an accuracy of while the MCEM
method has an accuracy of . Thus, the proposed QEM method
possesses faster and more stable convergence properties when compared with the
MCEM algorithm. The improved performance is illustrated through the numerical
studies. Several practical examples illustrating its use in interval-censored
data problems are also provided
The Power of Randomization: Distributed Submodular Maximization on Massive Datasets
A wide variety of problems in machine learning, including exemplar
clustering, document summarization, and sensor placement, can be cast as
constrained submodular maximization problems. Unfortunately, the resulting
submodular optimization problems are often too large to be solved on a single
machine. We develop a simple distributed algorithm that is embarrassingly
parallel and it achieves provable, constant factor, worst-case approximation
guarantees. In our experiments, we demonstrate its efficiency in large problems
with different kinds of constraints with objective values always close to what
is achievable in the centralized setting
Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially.
Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-center variant which, given a set S of points from some metric space and a parameter k0, the algorithms yield solutions whose approximation ratios are a mere additive term \u3f5 away from those achievable by the best known polynomial-time sequential algorithms, a result that substantially improves upon the state of the art. Our algorithms are rather simple and adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Specifically, our analysis shows that the algorithms become very space-efficient for the important case of small (constant) D. These theoretical results are complemented with a set of experiments on real-world and synthetic datasets of up to over a billion points, which show that our algorithms yield better quality solutions over the state of the art while featuring excellent scalability, and that they also lend themselves to sequential implementations much faster than existing ones
- …