90,045 research outputs found
Isotropic Dynamic Hierarchical Clustering
We face a need of discovering a pattern in locations of a great number of
points in a high-dimensional space. Goal is to group the close points together.
We are interested in a hierarchical structure, like a B-tree. B-Trees are
hierarchical, balanced, and they can be constructed dynamically. B-Tree
approach allows to determine the structure without any supervised learning or a
priori knowlwdge. The space is Euclidean and isotropic. Unfortunately, there
are no B-Tree implementations processing indices in a symmetrical and
isotropical way. Some implementations are based on constructing compound
asymmetrical indices from point coordinates; and the others split the nodes
along the coordinate hyper-planes. We need to process tens of millions of
points in a thousand-dimensional space. The application has to be scalable.
Ideally, a cluster should be an ellipsoid, but it would require to store O(n2)
ellipse axes. So, we are using multi-dimensional balls defined by the centers
and radii. Calculation of statistical values like the mean and the average
deviation, can be done in an incremental way. While adding a point to a tree,
the statistical values for nodes recalculated in O(1) time. We support both,
brute force O(2n) and greedy O(n2) split algorithms. Statistical and aggregated
node information also allows to manipulate (to search, to delete) aggregated
sets of closely located points. Hierarchical information retrieval. When
searching, the user is provided with the highest appropriate nodes in the tree
hierarchy, with the most important clusters emerging in the hierarchy
automatically. Then, if interested, the user may navigate down the tree to more
specific points. The system is implemented as a library of Java classes
representing Points, Sets of points with aggregated statistical information,
B-tree, and Nodes with a support of serialization and storage in a MySQL
database.Comment: 6 pages with 3 example
On Randomly Projected Hierarchical Clustering with Guarantees
Hierarchical clustering (HC) algorithms are generally limited to small data
instances due to their runtime costs. Here we mitigate this shortcoming and
explore fast HC algorithms based on random projections for single (SLC) and
average (ALC) linkage clustering as well as for the minimum spanning tree
problem (MST). We present a thorough adaptive analysis of our algorithms that
improve prior work from by up to a factor of for a
dataset of points in Euclidean space. The algorithms maintain, with
arbitrary high probability, the outcome of hierarchical clustering as well as
the worst-case running-time guarantees. We also present parameter-free
instances of our algorithms.Comment: This version contains the conference paper "On Randomly Projected
Hierarchical Clustering with Guarantees'', SIAM International Conference on
Data Mining (SDM), 2014 and, additionally, proofs omitted in the conference
versio
Wide Field Imaging. I. Applications of Neural Networks to object detection and star/galaxy classification
[Abriged] Astronomical Wide Field Imaging performed with new large format CCD
detectors poses data reduction problems of unprecedented scale which are
difficult to deal with traditional interactive tools. We present here NExt
(Neural Extractor): a new Neural Network (NN) based package capable to detect
objects and to perform both deblending and star/galaxy classification in an
automatic way. Traditionally, in astronomical images, objects are first
discriminated from the noisy background by searching for sets of connected
pixels having brightnesses above a given threshold and then they are classified
as stars or as galaxies through diagnostic diagrams having variables choosen
accordingly to the astronomer's taste and experience. In the extraction step,
assuming that images are well sampled, NExt requires only the simplest a priori
definition of "what an object is" (id est, it keeps all structures composed by
more than one pixels) and performs the detection via an unsupervised NN
approaching detection as a clustering problem which has been thoroughly studied
in the artificial intelligence literature. In order to obtain an objective and
reliable classification, instead of using an arbitrarily defined set of
features, we use a NN to select the most significant features among the large
number of measured ones, and then we use their selected features to perform the
classification task. In order to optimise the performances of the system we
implemented and tested several different models of NN. The comparison of the
NExt performances with those of the best detection and classification package
known to the authors (SExtractor) shows that NExt is at least as effective as
the best traditional packages.Comment: MNRAS, in press. Paper with higher resolution images is available at
http://www.na.astro.it/~andreon/listapub.htm
Methods of Hierarchical Clustering
We survey agglomerative hierarchical clustering algorithms and discuss
efficient implementations that are available in R and other software
environments. We look at hierarchical self-organizing maps, and mixture models.
We review grid-based clustering, focusing on hierarchical density-based
approaches. Finally we describe a recently developed very efficient (linear
time) hierarchical clustering algorithm, which can also be viewed as a
hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
- …