3,917 research outputs found

    Tetkik: Akan veri kümeleme algoritmalarını çalıştırma ve karşılaştırma

    Get PDF
    12th Turkish National Software Engineering Symposium, UYMS 2018; Istanbul; Turkey; 10 September 2018 through 12 September 2018Recently, clustering data streams have become an incredibly important research area for knowledge discovery as applications produce more and more unstoppable streaming data. In this paper we introduce clustering, streams and data streaming clustering algorithms, as well as discussions of the most important stream clustering algorithms, considering their structure. As an additional contribution of our work and differently from review and survey papers in stream clustering, we offer the practical part of the most known stream clustering algorithms, namely: (i) CluStream; (ii) DenStream; (iii) D-Stream; and (iv) ClusTree, showing their experimental results along with some performance metrics computation of for each, depending on MOA framework.Son zamanlarda, veri akışlarını kümelemek uygulamalar daha fazla durdurulamaz veri akışı üretirken bilgi keşfi için inanılmaz derecede önemli bir araştırma alanı haline gelmiştir.Bu makalede, kümeleme, akışlar ve veri akışlarını kümeleme algoritmalarını en önemli akım kümeleme algoritmalarının irdelenmesini yapılarını da göz önünde bulundurarak tanıtıyoruz. Çalışmamızın ek bir katkısı ve akış kümeleme alanında yapılmış tetkit ve gözden geçirme makalelerinden farklı olarak en bilinen akış kümeleme algoritmalarının Pratik kısmını, yani: (i) CluStream; (ii) DenStream; (iii) D-Stream; and (iv) ClusTree, MOA Java çerçevesine bağlı olarak, her biri için bazı performans metriklerinin hesaplanmasıyla birlikte deney sonuçlarını göstererek sunuyoruz

    Adaptive inferential sensors based on evolving fuzzy models

    Get PDF
    A new technique to the design and use of inferential sensors in the process industry is proposed in this paper, which is based on the recently introduced concept of evolving fuzzy models (EFMs). They address the challenge that the modern process industry faces today, namely, to develop such adaptive and self-calibrating online inferential sensors that reduce the maintenance costs while keeping the high precision and interpretability/transparency. The proposed new methodology makes possible inferential sensors to recalibrate automatically, which reduces significantly the life-cycle efforts for their maintenance. This is achieved by the adaptive and flexible open-structure EFM used. The novelty of this paper lies in the following: (1) the overall concept of inferential sensors with evolving and self-developing structure from the data streams; (2) the new methodology for online automatic selection of input variables that are most relevant for the prediction; (3) the technique to detect automatically a shift in the data pattern using the age of the clusters (and fuzzy rules); (4) the online standardization technique used by the learning procedure of the evolving model; and (5) the application of this innovative approach to several real-life industrial processes from the chemical industry (evolving inferential sensors, namely, eSensors, were used for predicting the chemical properties of different products in The Dow Chemical Company, Freeport, TX). It should be noted, however, that the methodology and conclusions of this paper are valid for the broader area of chemical and process industries in general. The results demonstrate that well-interpretable and with-simple-structure inferential sensors can automatically be designed from the data stream in real time, which predict various process variables of interest. The proposed approach can be used as a basis for the development of a new generation of adaptive and evolving inferential sensors that can a- ddress the challenges of the modern advanced process industry

    Data stream mining techniques: a review

    Get PDF
    A plethora of infinite data is generated from the Internet and other information sources. Analyzing this massive data in real-time and extracting valuable knowledge using different mining applications platforms have been an area for research and industry as well. However, data stream mining has different challenges making it different from traditional data mining. Recently, many studies have addressed the concerns on massive data mining problems and proposed several techniques that produce impressive results. In this paper, we review real time clustering and classification mining techniques for data stream. We analyze the characteristics of data stream mining and discuss the challenges and research issues of data steam mining. Finally, we present some of the platforms for data stream mining

    Finding and tracking multi-density clusters in an online dynamic data stream

    Get PDF
    The file attached to this record is the author's final peer reviewed version.Change is one of the biggest challenges in dynamic stream mining. From a data-mining perspective, adapting and tracking change is desirable in order to understand how and why change has occurred. Clustering, a form of unsupervised learning, can be used to identify the underlying patterns in a stream. Density-based clustering identifies clusters as areas of high density separated by areas of low density. This paper proposes a Multi-Density Stream Clustering (MDSC) algorithm to address these two problems; the multi-density problem and the problem of discovering and tracking changes in a dynamic stream. MDSC consists of two on-line components; discovered, labelled clusters and an outlier buffer. Incoming points are assigned to a live cluster or passed to the outlier buffer. New clusters are discovered in the buffer using an ant-inspired swarm intelligence approach. The newly discovered cluster is uniquely labelled and added to the set of live clusters. Processed data is subject to an ageing function and will disappear when it is no longer relevant. MDSC is shown to perform favourably to state-of-the-art peer stream-clustering algorithms on a range of real and synthetic data-streams. Experimental results suggest that MDSC can discover qualitatively useful patterns while being scalable and robust to noise

    Next challenges for adaptive learning systems

    Get PDF
    Learning from evolving streaming data has become a 'hot' research topic in the last decade and many adaptive learning algorithms have been developed. This research was stimulated by rapidly growing amounts of industrial, transactional, sensor and other business data that arrives in real time and needs to be mined in real time. Under such circumstances, constant manual adjustment of models is in-efficient and with increasing amounts of data is becoming infeasible. Nevertheless, adaptive learning models are still rarely employed in business applications in practice. In the light of rapidly growing structurally rich 'big data', new generation of parallel computing solutions and cloud computing services as well as recent advances in portable computing devices, this article aims to identify the current key research directions to be taken to bring the adaptive learning closer to application needs. We identify six forthcoming challenges in designing and building adaptive learning (pre-diction) systems: making adaptive systems scalable, dealing with realistic data, improving usability and trust, integrat-ing expert knowledge, taking into account various application needs, and moving from adaptive algorithms towards adaptive tools. Those challenges are critical for the evolving stream settings, as the process of model building needs to be fully automated and continuous.</jats:p

    SOTXTSTREAM: Density-based self-organizing clustering of text streams

    Get PDF
    A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets

    Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

    Get PDF
    Abstract Analyzing data streams has received considerable attention over the past decades due to the widespread usage of sensors, social media and other streaming data sources. A core research area in this field is stream clustering which aims to recognize patterns in an unordered, infinite and evolving stream of observations. Clustering can be a crucial support in decision making, since it aims for an optimized aggregated representation of a continuous data stream over time and allows to identify patterns in large and high-dimensional data. A multitude of algorithms and approaches has been developed that are able to find and maintain clusters over time in the challenging streaming scenario. This survey explores, summarizes and categorizes a total of 51 stream clustering algorithms and identifies core research threads over the past decades. In particular, it identifies categories of algorithms based on distance thresholds, density grids and statistical models as well as algorithms for high dimensional data. Furthermore, it discusses applications scenarios, available software and how to configure stream clustering algorithms. This survey is considerably more extensive than comparable studies, more up-to-date and highlights how concepts are interrelated and have been developed over time
    corecore