Search CORE

8 research outputs found

Detecting change via competence model

Author: A. Dries
A. Tsymbal
B. Smyth
B. Smyth
B. Smyth
C.-J. Tsai
G. Hulten
G. Widmer
G. Widmer
H. Wang
J. Gama
J. Gao
J.Z. Kolter
K. Nishida
L. Cohen
M.A. Maloof
N. Lu
P. Zhang
R. Klinkenberg
S.J. Delany
W. Fan
W.N. Street
X. Song
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In real world applications, interested concepts are more likely to change rather than remain stable, which is known as concept drift. This situation causes problems on predictions for many learning algorithms including case-base reasoning (CBR). When learning under concept drift, a critical issue is to identify and determine "when" and "how" the concept changes. In this paper, we developed a competence-based empirical distance between case chunks and then proposed a change detection method based on it. As a main contribution of our work, the change detection method provides an approach to measure the distribution change of cases of an infinite domain through finite samples and requires no prior knowledge about the case distribution, which makes it more practical in real world applications. Also, different from many other change detection methods, we not only detect the change of concepts but also quantify and describe this change. © 2010 Springer-Verlag

Crossref

OPUS - University of Technology Sydney

Neural visualization of network traffic data for intrusion detection

Author: Corchado Emilio
Herrero Cosío Álvaro
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

This study introduces and describes a novel intrusion detection system (IDS) called MOVCIDS (mobile visualization connectionist IDS). This system applies neural projection architectures to detect anomalous situations taking place in a computer network. By its advanced visualization facilities, the proposed IDS allows providing an overview of the network traffic as well as identifying anomalous situations tackled by computer networks, responding to the challenges presented by volume, dynamics and diversity of the traffic, including novel (0-day) attacks. MOVCIDS provides a novel point of view in the field of IDSs by enabling the most interesting projections (based on the fourth order statistics; the kurtosis index) of a massive traffic dataset to be extracted. These projections are then depicted through a functional and mobile visualization interface, providing visual information of the internal structure of the traffic data. The interface makes MOVCIDS accessible from any mobile device to give more accessibility to network administrators, enabling continuous visualization, monitoring and supervision of computer networks. Additionally, a novel testing technique has been developed to evaluate MOVCIDS and other IDSs employing numerical datasets. To show the performance and validate the proposed IDS, it has been tested in different real domains containing several attacks and anomalous situations. In addition, the importance of the temporal dimension on intrusion detection, and the ability of this IDS to process it, are emphasized in this workJunta de Castilla and Leon project BU006A08, Business intelligence for production within the framework of the Instituto Tecnologico de Cas-tilla y Leon (ITCL) and the Agencia de Desarrollo Empresarial (ADE), and the Spanish Ministry of Education and Innovation project CIT-020000-2008-2. The authors would also like to thank the vehicle interior manufacturer, Grupo Antolin Ingenieria S. A., within the framework of the project MAGNO2008-1028-CENIT Project funded by the Spanish Government

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Gestion del Repositorio Documental de la Universidad de Salamanca

Repositorio Institucional de la Universidad de Burgos

Learning Concept Drift Using Adaptive Training Set Formation Strategy

Author: Kohail Sarah Nabeel Jameel
Publication venue: The Islamic University College Journal
Publication date: 01/01/2011
Field of study

We live in a dynamic world, where changes are a part of everyday ‘s life. When there is a shift in data, the classification or prediction models need to be adaptive to the changes. In data mining the phenomenon of change in data distribution over time is known as concept drift. In this research, we propose an adaptive supervised learning with delayed labeling methodology. As a part of this methodology, we introduce an adaptive training set formation algorithm called SFDL, which is based on selective training set formation. Our proposed solution considered as the first systematic training set formation approach that take into account delayed labeling problem. It can be used with any base classifier without the need to change the implementation or setting of this classifier. We test our algorithm implementation using synthetic and real dataset from various domains which might have different drift types (sudden, gradual, incremental recurrences) with different speed of change. The experimental results confirm improvement in classification accuracy as compared to ordinary classifier for all drift types. Our approach is able to increase the classifications accuracy with 20% in average and 56% in the best cases of our experimentations and it has not been worse than the ordinary classifiers in any case. Finally a comparison study with other four related methods to deal with changing in user interest over time and handle recurrence drift is performed. Results indicate the effectiveness of the proposed method over other methods in terms of classification accuracy

Institutional Repository of the Islamic University of Gaza

COMPOSE: Compacted object sample extraction a framework for semi-supervised learning in nonstationary environments

Author: Dyer Karl
Publication venue: Rowan Digital Works
Publication date: 21/10/2015
Field of study

An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this thesis, compacted object sample extraction (COMPOSE) is introduced - a computational geometry-based framework to learn from nonstationary streaming data - where labels are unavailable (or presented very sporadically) after initialization. The feasibility and performance of the algorithm are evaluated on several synthetic and real-world data sets, which present various different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we also compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopulation tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch

Rowan University

Dynamic Data Mining: Methodology and Algorithms

Author: Deng Xiong
Deng Xiong
Publication venue: Computing, Imperial College London
Publication date: 01/07/2011
Field of study

Supervised data stream mining has become an important and challenging data mining task in modern organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions. To address these three challenges, this thesis proposes the novel dynamic data mining (DDM) methodology by effectively applying supervised ensemble models to data stream mining. DDM can be loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired by the idea that although the underlying concepts in a data stream are time-varying, their distinctions can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in order to classify incoming examples of similar concepts. First, following the general paradigm of DDM, we examine the different concept-drifting stream mining scenarios and propose corresponding effective and efficient data mining algorithms. • To address concept drift caused merely by changes of variable distributions, which we term pseudo concept drift, base models built on categorized streaming data are organized and selected in line with their corresponding variable distribution characteristics. • To address concept drift caused by changes of variable and class joint distributions, which we term true concept drift, an effective data categorization scheme is introduced. A group of working models is dynamically organized and selected for reacting to the drifting concept. Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce easily six effective algorithms for mining data streams with skewed class distributions. In addition, we also introduce a new ensemble model approach for batch learning, following the same methodology. Both theoretical and empirical studies demonstrate its effectiveness. Future work would be targeted at improving the effectiveness and efficiency of the proposed algorithms. Meantime, we would explore the possibilities of using the integration framework to solve other open stream mining research problems

Spiral - Imperial College Digital Repository

Recommended from our members

Expressive and modular rule-based classifier for data streams

Author: Le Duyen Thien
Publication venue
Publication date: 31/07/2019
Field of study

The advances in computing software, hardware, connected devices and wireless communication infrastructure in recent years have led to the desire to work with streaming data sources. Yet the number of techniques, approaches and algorithms which can work with data from a streaming source is still very limited, compared with batched data. Although data mining techniques have been a well-studied topic of knowledge discovery for decades, many unique properties as well as challenges in learning from a data stream have not been considered properly due to the actual presence of and the real needs to mine information from streaming data sources. This thesis aims to contribute to the knowledge by developing a rule-based algorithm to specifically learn classification rules from data streams, with the learned rules are expressive so that a human user can easily interpret the concept and rationale behind the predictions of the created model. There are two main structures to represent a classification model; the ‘tree-based’ structure and the ‘rule-based’ structure. Even though both forms of representation are popular and well-known in traditional data mining, they are different when it comes to interpretability and quality of models in certain circumstances. The first part of this thesis analyses background work and relevant topics in learning classification rules from data streams. This study provides information about the essential requirements to produce high quality classification rules from data streams and how many systems, algorithms and techniques related to learn the classification of a static dataset are not applicable in a streaming environment. The second part of the thesis investigates at a new technique to improve the efficiency and accuracy in learning heuristics from numeric features from a streaming data source. The computational cost is one of the important factors to be considered for an effective and practical learning algorithm/system because of the needs to learn from continuous arrivals of data examples sequentially and discard the seen data examples. If the computing cost is too expensive, then one may not be able to keep pace with the arrival of high velocity and possibly unbound data streams. The proposed technique was first discussed in the context of the use of Gaussian distribution as heuristics for building rule terms on numeric features. Secondly, empirical evaluation shows the successful integration of the proposed technique into an existing rule-based algorithm for the data stream, eRules. Continuing on the topic of a rule-based algorithm for classification data streams, the use of Hoeffding’s Inequality addresses another problem in learning from a data stream, namely how much data should be seen from a data stream before starting learning and how to keep the model updated over time. By incorporating the theory from Hoeffding’s Inequality, this study presents the Hoeffding Rules algorithm, which can induce modular rules directly from a streaming data source with dynamic window sizes throughout the learning period to ensure the efficiency and robustness towards the concept drifts. Concept drift is another unique challenge in mining data streams which the underlying concept of the data can change either gradually or abruptly over time and the learner should adapt to these changes as quickly as possible. This research focuses on the development of a rule-based algorithm, Hoeffding Rules, for data stream which considers streaming environments as primary data sources and addresses several unique challenges in learning rules from data streams such as concept drifts and computational efficiency. This knowledge facilitates the need and the importance of an interpretable machine learning model; applying new studies to improve the ability to mine useful insights from potentially high velocity, high volume and unbounded data streams. More broadly, this research complements the study in learning classification rules from data streams to address some of the unique challenges in data streams compared with conventional batch data, with the knowledge necessary to systematically and effectively learn expressive and modular classification rules from data streams

Central Archive at the University of Reading