Search CORE

90,045 research outputs found

Isotropic Dynamic Hierarchical Clustering

Author: Rutishauser Oliver
Sadikov Victor
Publication venue
Publication date: 15/05/2016
Field of study

We face a need of discovering a pattern in locations of a great number of points in a high-dimensional space. Goal is to group the close points together. We are interested in a hierarchical structure, like a B-tree. B-Trees are hierarchical, balanced, and they can be constructed dynamically. B-Tree approach allows to determine the structure without any supervised learning or a priori knowlwdge. The space is Euclidean and isotropic. Unfortunately, there are no B-Tree implementations processing indices in a symmetrical and isotropical way. Some implementations are based on constructing compound asymmetrical indices from point coordinates; and the others split the nodes along the coordinate hyper-planes. We need to process tens of millions of points in a thousand-dimensional space. The application has to be scalable. Ideally, a cluster should be an ellipsoid, but it would require to store O(n2) ellipse axes. So, we are using multi-dimensional balls defined by the centers and radii. Calculation of statistical values like the mean and the average deviation, can be done in an incremental way. While adding a point to a tree, the statistical values for nodes recalculated in O(1) time. We support both, brute force O(2n) and greedy O(n2) split algorithms. Statistical and aggregated node information also allows to manipulate (to search, to delete) aggregated sets of closely located points. Hierarchical information retrieval. When searching, the user is provided with the highest appropriate nodes in the tree hierarchy, with the most important clusters emerging in the hierarchy automatically. Then, if interested, the user may navigate down the tree to more specific points. The system is implemented as a library of Java classes representing Points, Sets of points with aggregated statistical information, B-tree, and Nodes with a support of serialization and storage in a MySQL database.Comment: 6 pages with 3 example

arXiv.org e-Print Archive

Global Journal of Computer Science and Technology (GJCST)

On Randomly Projected Hierarchical Clustering with Guarantees

Author: Schneider Johannes
Vlachos Michail
Publication venue
Publication date: 22/01/2014
Field of study

Hierarchical clustering (HC) algorithms are generally limited to small data instances due to their runtime costs. Here we mitigate this shortcoming and explore fast HC algorithms based on random projections for single (SLC) and average (ALC) linkage clustering as well as for the minimum spanning tree problem (MST). We present a thorough adaptive analysis of our algorithms that improve prior work from

O(N^2)

by up to a factor of

N/(\log N)^2

for a dataset of

N

points in Euclidean space. The algorithms maintain, with arbitrary high probability, the outcome of hierarchical clustering as well as the worst-case running-time guarantees. We also present parameter-free instances of our algorithms.Comment: This version contains the conference paper "On Randomly Projected Hierarchical Clustering with Guarantees'', SIAM International Conference on Data Mining (SDM), 2014 and, additionally, proofs omitted in the conference versio

arXiv.org e-Print Archive

Crossref

Wide Field Imaging. I. Applications of Neural Networks to object detection and star/galaxy classification

Author: Arnaboldi
Baldi
Bazell
Bertin
Ferguson
Fritzke
G. Gargiulo
G. Longo
Godwin
Infante
Infante
Jarvis
Jutten
Karhunen
Karhunen
Koh
Kohonen
Lanzetta
Lipovetsky
Lloyd
Martinetz
Miller
N. Capuano
Naim
Odewahn
Oja
Oja
Oja
Pal
Plumbley
R. Tagliaferri
Rose
S. Andreon
Sanger
Tagliaferri
Tagliaferri
Tagliaferri
Publication venue: 'Wiley'
Publication date: 01/01/2000
Field of study

[Abriged] Astronomical Wide Field Imaging performed with new large format CCD detectors poses data reduction problems of unprecedented scale which are difficult to deal with traditional interactive tools. We present here NExt (Neural Extractor): a new Neural Network (NN) based package capable to detect objects and to perform both deblending and star/galaxy classification in an automatic way. Traditionally, in astronomical images, objects are first discriminated from the noisy background by searching for sets of connected pixels having brightnesses above a given threshold and then they are classified as stars or as galaxies through diagnostic diagrams having variables choosen accordingly to the astronomer's taste and experience. In the extraction step, assuming that images are well sampled, NExt requires only the simplest a priori definition of "what an object is" (id est, it keeps all structures composed by more than one pixels) and performs the detection via an unsupervised NN approaching detection as a clustering problem which has been thoroughly studied in the artificial intelligence literature. In order to obtain an objective and reliable classification, instead of using an arbitrarily defined set of features, we use a NN to select the most significant features among the large number of measured ones, and then we use their selected features to perform the classification task. In order to optimise the performances of the system we implemented and tested several different models of NN. The comparison of the NExt performances with those of the best detection and classification package known to the authors (SExtractor) shows that NExt is at least as effective as the best traditional packages.Comment: MNRAS, in press. Paper with higher resolution images is available at http://www.na.astro.it/~andreon/listapub.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Salerno

CERN Document Server

Methods of Hierarchical Clustering

Author: Contreras Pedro
Murtagh Fionn
Publication venue
Publication date: 01/01/2011
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure

Multivariate Approaches to Classification in Extragalactic Astronomy

Author: Chattopadhyay Asis Kumar
Fraix-Burnet Didier
Thuillard Marc
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono-or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.Comment: Open Access paper. http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>. \<10.3389/fspas.2015.00003 \&g

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Frontiers - Publisher Connector

HAL Descartes

HAL-INSU

HAL Université de Savoie