Search CORE

54,674 research outputs found

Clustering Approaches for Evaluation and Analysis on Formal Gene Expression Cancer Datasets

Author: Ramachandro Majji, Ravi Bramaramba
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2017
Field of study

Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms

International Journal on Recent and Innovation Trends in Computing and Communication

S-RASTER: Contraction Clustering for Evolving Data Streams

Author: Gustavsson Emil
Jirstrand Mats
Nilsson Adrian
Smith Simon
Ulm Gregor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Contraction Clustering (RASTER) is a single-pass algorithm for density-based clustering of 2D data. It can process arbitrary amounts of data in linear time and in constant memory, quickly identifying approximate clusters. It also exhibits good scalability in the presence of multiple CPU cores. RASTER exhibits very competitive performance compared to standard clustering algorithms, but at the cost of decreased precision. Yet, RASTER is limited to batch processing and unable to identify clusters that only exist temporarily. In contrast, S-RASTER is an adaptation of RASTER to the stream processing paradigm that is able to identify clusters in evolving data streams. This algorithm retains the main benefits of its parent algorithm, i.e. single-pass linear time cost and constant memory requirements for each discrete time step within a sliding window. The sliding window is efficiently pruned, and clustering is still performed in linear time. Like RASTER, S-RASTER trades off an often negligible amount of precision for speed. Our evaluation shows that competing algorithms are at least 50% slower. Furthermore, S-RASTER shows good qualitative results, based on standard metrics. It is very well suited to real-world scenarios where clustering does not happen continually but only periodically.Comment: 24 pages, 5 figures, 2 table

arXiv.org e-Print Archive

Chalmers Research

Density-based projected clustering of data streams

Author: Gaber M.
Hassani M.
Seidl T.
Spaus P.
Publication venue
Publication date: 01/01/2012
Field of study

Portsmouth University Research Portal (Pure)

Publikationsserver der RWTH Aachen University

MOA: Massive Online Analysis, a framework for stream classification and clustering.

Author: Bifet Albert
Holmes Geoffrey
Jansen Timm
Kranen Philipp
Kremer Hardy
Pfahringer Bernhard
Seidl Thomas
Publication venue: JMLR
Publication date: 01/01/2010
Field of study

Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling up the implementation of state of the art algorithms to real world dataset sizes. It contains collection of offline and online for both classification and clustering as well as tools for evaluation. In particular, for classification it implements boosting, bagging, and Hoeffding Trees, all with and without Naive Bayes classifiers at the leaves. For clustering, it implements StreamKM++, CluStream, ClusTree, Den-Stream, D-Stream and CobWeb. Researchers benefit from MOA by getting insights into workings and problems of different approaches, practitioners can easily apply and compare several algorithms to real world data set and settings. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license

CiteSeerX

Research Commons@Waikato

Publikationsserver der RWTH Aachen University

A Scalable Clustering Algorithm for High-dimensional Data Streams over Sliding Windows

Author: 연종흠
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 이상구.Data stream clustering over sliding windows generates clustering results whenever a window moves. However, iterative clustering using all data in a window is highly inefficient in terms of memory and computation time. In this thesis, we address problem of data stream clustering over sliding windows using sliding window aggregation and nearest neighbor search techniques. Our algorithm constructs and maintains temporal group features as a summary of the window using the sliding window aggregation technique. The technique divides a window into disjoint chunks, computes partial aggregates over each chunk, and merges the partial aggregates to compute overall aggregates. To maintain constant size of the summary, the algorithm reduces the size of summary by joining the nearest neighbor. We exploit Locality-Sensitive Hashing for fast nearest neighbor search. We show that Locality-Sensitive Hashing can serve as an effective method for reducing synopses while minimizing the impact on quality. In addition, we also suggest re-clustering policy, which decides whether to append new summary to pre-existing clusters or to perform clustering on whole summary. Our experiments on real-world and synthetic datasets demonstrate that our algorithm can achieve a significant improvement when performing continuous clustering on data streams with sliding windows.1 Introduction 1 2. Preliminaries and Related Work 7 2.1 Data Streams 7 2.2 Window Models 7 2.3 kMeans Clustering 11 2.4 Coreset 12 2.5 Group Features 14 2.6 Related Work 16 2.7 Problem Statement 31 3. GFCS: Group Featurebased Data Stream Clustering with Sliding Windows 35 3.1 2-Level Coresets Construction 35 3.2 2-Level Coresets Maintenance 38 3.3 Clustering on 2-Level Coresets 40 4. CSCS: Coresetbased Data Stream Clustering with Sliding Windows 46 4.1 Coreset Construction based on Nearest Neighbor Search 47 4.2 Coreset Construction based on LocalitySensitive Hashing 60 4.3 Reclustering Policy 66 5. Empirical Evaluation of Data Stream Clustering with Sliding Windows 69 5.1 Experimental Setup 69 5.2 Experimental Results 71 6. Application: Documents Clustering 78 6.1 Vector Representation of Documents 78 6.2 Extension to Other Clustering Algorithms 83 6.3 Evaluation 88 7. Conclusion 95 A. Appendix 109 A.1 Experimental Results of GFCS and CSCS 109 A.2 Experimental Results of Document Clustering 117Docto

SNU Open Repository and Archive

Recommended from our members

A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

Author: Caraffini Fabio
Homapour E.
Milani Alfredo
Santucci Valentino
Yeoh Jia Ming
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

open access articleThis article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses “microclusters” and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream’s robustness to noise and resiliency to parameter changes

Nottingham Trent Institutional Repository (IRep)

De Montfort University Open Research Archive