133,122 research outputs found
Fast k-means based on KNN Graph
In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means
Variational Hamiltonian Monte Carlo via Score Matching
Traditionally, the field of computational Bayesian statistics has been
divided into two main subfields: variational methods and Markov chain Monte
Carlo (MCMC). In recent years, however, several methods have been proposed
based on combining variational Bayesian inference and MCMC simulation in order
to improve their overall accuracy and computational efficiency. This marriage
of fast evaluation and flexible approximation provides a promising means of
designing scalable Bayesian inference methods. In this paper, we explore the
possibility of incorporating variational approximation into a state-of-the-art
MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required gradient
computation in the simulation of Hamiltonian flow, which is the bottleneck for
many applications of HMC in big data problems. To this end, we use a {\it
free-form} approximation induced by a fast and flexible surrogate function
based on single-hidden layer feedforward neural networks. The surrogate
provides sufficiently accurate approximation while allowing for fast
exploration of parameter space, resulting in an efficient approximate inference
algorithm. We demonstrate the advantages of our method on both synthetic and
real data problems
Hamiltonian Monte Carlo Acceleration Using Surrogate Functions with Random Bases
For big data analysis, high computational cost for Bayesian methods often
limits their applications in practice. In recent years, there have been many
attempts to improve computational efficiency of Bayesian inference. Here we
propose an efficient and scalable computational technique for a
state-of-the-art Markov Chain Monte Carlo (MCMC) methods, namely, Hamiltonian
Monte Carlo (HMC). The key idea is to explore and exploit the structure and
regularity in parameter space for the underlying probabilistic model to
construct an effective approximation of its geometric properties. To this end,
we build a surrogate function to approximate the target distribution using
properly chosen random bases and an efficient optimization process. The
resulting method provides a flexible, scalable, and efficient sampling
algorithm, which converges to the correct target distribution. We show that by
choosing the basis functions and optimization process differently, our method
can be related to other approaches for the construction of surrogate functions
such as generalized additive models or Gaussian process models. Experiments
based on simulated and real data show that our approach leads to substantially
more efficient sampling algorithms compared to existing state-of-the art
methods
Robust Subspace Clustering via Smoothed Rank Approximation
Matrix rank minimizing subject to affine constraints arises in many
application areas, ranging from signal processing to machine learning. Nuclear
norm is a convex relaxation for this problem which can recover the rank exactly
under some restricted and theoretically interesting conditions. However, for
many real-world applications, nuclear norm approximation to the rank function
can only produce a result far from the optimum. To seek a solution of higher
accuracy than the nuclear norm, in this paper, we propose a rank approximation
based on Logarithm-Determinant. We consider using this rank approximation for
subspace clustering application. Our framework can model different kinds of
errors and noise. Effective optimization strategy is developed with theoretical
guarantee to converge to a stationary point. The proposed method gives
promising results on face clustering and motion segmentation tasks compared to
the state-of-the-art subspace clustering algorithms.Comment: Journal, code is availabl
A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition
This paper addresses the problem of simultaneous 3D reconstruction and
material recognition and segmentation. Enabling robots to recognise different
materials (concrete, metal etc.) in a scene is important for many tasks, e.g.
robotic interventions in nuclear decommissioning. Previous work on 3D semantic
reconstruction has predominantly focused on recognition of everyday domestic
objects (tables, chairs etc.), whereas previous work on material recognition
has largely been confined to single 2D images without any 3D reconstruction.
Meanwhile, most 3D semantic reconstruction methods rely on computationally
expensive post-processing, using Fully-Connected Conditional Random Fields
(CRFs), to achieve consistent segmentations. In contrast, we propose a deep
learning method which performs 3D reconstruction while simultaneously
recognising different types of materials and labelling them at the pixel level.
Unlike previous methods, we propose a fully end-to-end approach, which does not
require hand-crafted features or CRF post-processing. Instead, we use only
learned features, and the CRF segmentation constraints are incorporated inside
the fully end-to-end learned system. We present the results of experiments, in
which we trained our system to perform real-time 3D semantic reconstruction for
23 different materials in a real-world application. The run-time performance of
the system can be boosted to around 10Hz, using a conventional GPU, which is
enough to achieve real-time semantic reconstruction using a 30fps RGB-D camera.
To the best of our knowledge, this work is the first real-time end-to-end
system for simultaneous 3D reconstruction and material recognition.Comment: 8 pages, 7 figures, 4 table
Global well-posedness and scattering for nonlinear Schr\"odinger equations with combined nonlinearities in the radial case
We consider the Cauchy problem for the nonlinear Schr\"odinger equation with
combined nonlinearities, one of which is defocusing mass-critical and the other
is focusing energy-critical or energy-subcritical. The threshold is given by
means of variational argument. We establish the profile decomposition in
and then utilize the concentration-compactness method to show
the global wellposedness and scattering versus blowup in below
the threshold for radial data when .Comment: 40 pages, 0 figure
and decays in intermediate meson loops model
With the recent measurement of and and , we investigate the transitions from the
and to bottomonium states with emission of a pion via
intermediate meson loops. The experimental data can be reproduced
in this approach with a commonly accepted range of values for the form factor
cutoff parameter . The decay channels appear to
experience obvious threshold effects which can be understood by the property of
the loop integrals. By investigating the -dependence of partial decay
widths and ratios between different decay channels, we show that the
intermediate meson loops are crucial for driving the transitions of
with , and with
and 2.Comment: 9 pages, 5 figure
Active Clinical Trials for Personalized Medicine
Individualized treatment rules (ITRs) tailor treatments according to
individual patient characteristics. They can significantly improve patient care
and are thus becoming increasingly popular. The data collected during
randomized clinical trials are often used to estimate the optimal ITRs.
However, these trials are generally expensive to run, and, moreover, they are
not designed to efficiently estimate ITRs. In this paper, we propose a
cost-effective estimation method from an active learning perspective. In
particular, our method recruits only the "most informative" patients (in terms
of learning the optimal ITRs) from an ongoing clinical trial. Simulation
studies and real-data examples show that our active clinical trial method
significantly improves on competing methods. We derive risk bounds and show
that they support these observed empirical advantages.Comment: 48 Page, 9 Figures. To Appear in JASA--T&
- …
