3,755 research outputs found
A Stream-Suitable Kolmogorov-Smirnov-Type Test for Big Data Analysis
Big Data has become an ever more commonplace setting that is encountered by
data analysts. In the Big Data setting, analysts are faced with very large
numbers of observations as well as data that arrive as a stream, both of which
are phenomena that many traditional statistical techniques are unable to
contend with. Unfortunately, many of these traditional techniques are useful
and cannot be discarded. One such technique is the Kolmogorov-Smirnov (KS) test
for goodness-of-fit (GoF). A Big Data and stream-appropriate KS-type test is
derived via the chunked-and-averaged (CA) estimator paradigm. The new test is
termed the CAKS GoF test. The CAKS test statistic is proved to be
asymptotically normal, allowing for the large sample testing of GoF.
Furthermore, theoretical results demonstrate that the CAKS test is consistent
against both fixed alternatives, where the null and the true data generating
distribution are a fixed distance apart, and alternatives that approach the
null at a slow enough rate. Numerical results demonstrate that the CAKS test is
effective in identifying deviation in the distribution with respect to changes
in mean, variance, and shape. Furthermore, it is found that the CAKS test is
faster than the KS test, for large numbers of observation, and can be applied
to sample sizes of 10^{9} and beyond
A Novel Algorithm for Clustering of Data on the Unit Sphere via Mixture Models
A new maximum approximate likelihood (ML) estimation algorithm for the
mixture of Kent distribution is proposed. The new algorithm is constructed via
the BSLM (block successive lower-bound maximization) framework and incorporates
manifold optimization procedures within it. The BSLM algorithm is iterative and
monotonically increases the approximate log-likelihood function in each step.
Under mild regularity conditions, the BSLM algorithm is proved to be convergent
and the approximate ML estimator is proved to be consistent. A Bayesian
information criterion-like (BIC-like) model selection criterion is also derive,
for the task of choosing the number of components in the mixture distribution.
The approximate ML estimator and the BIC-like criterion are both demonstrated
to be successful via simulation studies. A model-based clustering rule is
proposed and also assessed favorably via simulations. Example applications of
the developed methodology are provided via an image segmentation task and a
neural imaging clustering problem
Concentration-based confidence intervals for U-statistics
Concentration inequalities have become increasingly popular in machine
learning, probability, and statistical research. Using concentration
inequalities, one can construct confidence intervals (CIs) for many quantities
of interest. Unfortunately, many of these CIs require the knowledge of
population variances, which are generally unknown, making these CIs impractical
for numerical application. However, recent results regarding the simultaneous
bounding of the probabilities of quantities of interest and their variances
have permitted the construction of empirical CIs, where variances are replaced
by their sample estimators. Among these new results are two-sided empirical CIs
for U-statistics, which are useful for the construction of CIs for a rich class
of parameters. In this article, we derive a number of new one-sided empirical
CIs for U-statistics and their variances. We show that our one-sided CIs can be
used to construct tighter two-sided CIs for U-statistics, than those currently
reported. We also demonstrate how our CIs can be used to construct new
empirical CIs for the mean, which provide tighter bounds than currently known
CIs for the same number of observations, under various settings
A Note on the Convergence of the Gaussian Mean Shift Algorithm
Mean shift (MS) algorithms are popular methods for mode finding in pattern
analysis. Each MS algorithm can be phrased as a fixed-point iteration scheme,
which operates on a kernel density estimate (KDE) based on some data. The
ability of an MS algorithm to obtain the modes of its KDE depends on whether or
not the fixed-point scheme converges. The convergence of MS algorithms have
recently been proved under some general conditions via first principle
arguments. We complement the recent proofs by demonstrating that the MS
algorithm operating on a Gaussian KDE can be viewed as an MM
(minorization-maximization) algorithm, and thus permits the application of
convergence techniques for such constructions. For the Gaussian case, we extend
upon the previously results by showing that the fixed-points of the MS
algorithm are all stationary points of the KDE in cases where the stationary
points may not necessarily be isolated
A Simple Online Parameter Estimation Technique with Asymptotic Guarantees
In many modern settings, data are acquired iteratively over time, rather than
all at once. Such settings are known as online, as opposed to offline or batch.
We introduce a simple technique for online parameter estimation, which can
operate in low memory settings, settings where data are correlated, and only
requires a single inspection of the available data at each time period. We show
that the estimators---constructed via the technique---are asymptotically normal
under generous assumptions, and present a technique for the online computation
of the covariance matrices for such estimators. A set of numerical studies
demonstrates that our estimators can be as efficient as their offline
counterparts, and that our technique generates estimates and confidence
intervals that match their offline counterparts in various parameter estimation
settings
Construction of Complete Embedded Self-Similar Surfaces under Mean Curvature Flow. Part II
We study the Dirichlet problem associated to the equation for self-similar
surfaces for graphs over the Euclidean plane with a disk removed. We show the
existence of a solution provided the boundary conditions on the boundary circle
are small enough and satisfy some symmetries. This is the second step towards
the construction of new examples of complete embedded self similar surfaces
under mean curvature flow.Comment: 30 page
The Decision-Theoretic Interactive Video Advisor
The need to help people choose among large numbers of items and to filter
through large amounts of information has led to a flood of research in
construction of personal recommendation agents. One of the central issues in
constructing such agents is the representation and elicitation of user
preferences or interests. This topic has long been studied in Decision Theory,
but surprisingly little work in the area of recommender systems has made use of
formal decision-theoretic techniques. This paper describes DIVA, a
decision-theoretic agent for recommending movies that contains a number of
novel features. DIVA represents user preferences using pairwise comparisons
among items, rather than numeric ratings. It uses a novel similarity measure
based on the concept of the probability of conflict between two orderings of
items. The system has a rich representation of preference, distinguishing
between a user's general taste in movies and his immediate interests. It takes
an incremental approach to preference elicitation in which the user can provide
feedback if not satisfied with the recommendation list. We empirically evaluate
the performance of the system using the EachMovie collaborative filtering
database.Comment: Appears in Proceedings of the Fifteenth Conference on Uncertainty in
Artificial Intelligence (UAI1999
On the necessary condition for entire function with the increasing second quotients of Taylor coefficients to belong to the Laguerre-P\'olya class
For an entire function we show
that does not belong to the Laguerre-P\'olya class if the quotients
are increasing in , and is smaller than an
absolute constant $(q_\infty\approx 3{.}2336) .
Shrinking doughnuts via variational methods
We use variational methods and a modified curvature flow to give an
alternative proof of the existence of a self-shrinking torus under mean
curvature flow. As a consequence of the proof, we establish an upper bound for
the weighted energy of our shrinking doughnuts.Comment: 18 pages, 2 figure
Mean curvature flow of an entire graph evolving away from the heat flow
We present two initial graphs over the entire , for
which the mean curvature flow behaves differently from the heat flow. In the
first example, the two flows stabilize at different heights. With our second
example, the mean curvature flow oscillates indefinitely while the heat flow
stabilizes. These results highlight the difference between dimensions and dimension , where Nara-Taniguchi proved that entire graphs in
evolving under curve shortening flow converge to
solutions to the heat equation with the same initial data.Comment: To appear in Proceedings of the AM
- β¦