1,878 research outputs found

    Methods for fast and reliable clustering

    Get PDF

    Multivariate Approaches to Classification in Extragalactic Astronomy

    Get PDF
    Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono-or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.Comment: Open Access paper. http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>. \<10.3389/fspas.2015.00003 \&g

    Recent Developments in Document Clustering

    Get PDF
    This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed

    Face Recognition using Segmental Euclidean Distance

    Get PDF
    In this paper an attempt has been made to detect the face using the combination of integral image along with the cascade structured classifier which is built using Adaboost learning algorithm. The detected faces are then passed through a filtering process for discarding the non face regions. They are individually split up into five segments consisting of forehead, eyes, nose, mouth and chin. Each segment is considered as a separate image and Eigenface also called principal component analysis (PCA) features of each segment is computed. The faces having a slight pose are also aligned for proper segmentation. The test image is also segmented similarly and its PCA features are found. The segmental Euclidean distance classifier is used for matching the test image with the stored one. The success rate comes out to be 88 per cent on the CG(full) database created from the databases of California Institute and Georgia Institute. However the performance of this approach on ORL(full) database with the same features is only 70 per cent. For the sake of comparison, DCT(full) and fuzzy features are tried on CG and ORL databases but using a well known classifier, support vector machine (SVM). Results of recognition rate with DCT features on SVM classifier are increased by 3 per cent over those due to PCA features and Euclidean distance classifier on the CG database. The results of recognition are improved to 96 per cent with fuzzy features on ORL database with SVM.Defence Science Journal, 2011, 61(5), pp.431-442, DOI:http://dx.doi.org/10.14429/dsj.61.117

    Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning

    Get PDF

    Semi-supervised model-based clustering with controlled clusters leakage

    Full text link
    In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

    Novel hybrid extraction systems for fetal heart rate variability monitoring based on non-invasive fetal electrocardiogram

    Get PDF
    This study focuses on the design, implementation and subsequent verification of a new type of hybrid extraction system for noninvasive fetal electrocardiogram (NI-fECG) processing. The system designed combines the advantages of individual adaptive and non-adaptive algorithms. The pilot study reviews two innovative hybrid systems called ICA-ANFIS-WT and ICA-RLS-WT. This is a combination of independent component analysis (ICA), adaptive neuro-fuzzy inference system (ANFIS) algorithm or recursive least squares (RLS) algorithm and wavelet transform (WT) algorithm. The study was conducted on clinical practice data (extended ADFECGDB database and Physionet Challenge 2013 database) from the perspective of non-invasive fetal heart rate variability monitoring based on the determination of the overall probability of correct detection (ACC), sensitivity (SE), positive predictive value (PPV) and harmonic mean between SE and PPV (F1). System functionality was verified against a relevant reference obtained by an invasive way using a scalp electrode (ADFECGDB database), or relevant reference obtained by annotations (Physionet Challenge 2013 database). The study showed that ICA-RLS-WT hybrid system achieve better results than ICA-ANFIS-WT. During experiment on ADFECGDB database, the ICA-RLS-WT hybrid system reached ACC > 80 % on 9 recordings out of 12 and the ICA-ANFIS-WT hybrid system reached ACC > 80 % only on 6 recordings out of 12. During experiment on Physionet Challenge 2013 database the ICA-RLS-WT hybrid system reached ACC > 80 % on 13 recordings out of 25 and the ICA-ANFIS-WT hybrid system reached ACC > 80 % only on 7 recordings out of 25. Both hybrid systems achieve provably better results than the individual algorithms tested in previous studies.Web of Science713178413175
    corecore