1,701 research outputs found
Expressive and modular rule-based classifier for data streams
The advances in computing software, hardware, connected devices and wireless
communication infrastructure in recent years have led to the desire to
work with streaming data sources. Yet the number of techniques, approaches
and algorithms which can work with data from a streaming source is still very
limited, compared with batched data. Although data mining techniques have
been a well-studied topic of knowledge discovery for decades, many unique
properties as well as challenges in learning from a data stream have not been
considered properly due to the actual presence of and the real needs to mine
information from streaming data sources. This thesis aims to contribute to
the knowledge by developing a rule-based algorithm to specifically learn classification
rules from data streams, with the learned rules are expressive so
that a human user can easily interpret the concept and rationale behind the
predictions of the created model. There are two main structures to represent
a classification model; the ‘tree-based’ structure and the ‘rule-based’ structure.
Even though both forms of representation are popular and well-known
in traditional data mining, they are different when it comes to interpretability
and quality of models in certain circumstances.
The first part of this thesis analyses background work and relevant topics in learning classification rules from data streams. This study provides information
about the essential requirements to produce high quality classification
rules from data streams and how many systems, algorithms and techniques
related to learn the classification of a static dataset are not applicable in a
streaming environment.
The second part of the thesis investigates at a new technique to improve
the efficiency and accuracy in learning heuristics from numeric features from
a streaming data source. The computational cost is one of the important factors
to be considered for an effective and practical learning algorithm/system
because of the needs to learn from continuous arrivals of data examples sequentially
and discard the seen data examples. If the computing cost is too
expensive, then one may not be able to keep pace with the arrival of high
velocity and possibly unbound data streams. The proposed technique was
first discussed in the context of the use of Gaussian distribution as heuristics
for building rule terms on numeric features. Secondly, empirical evaluation
shows the successful integration of the proposed technique into an existing
rule-based algorithm for the data stream, eRules.
Continuing on the topic of a rule-based algorithm for classification data
streams, the use of Hoeffding’s Inequality addresses another problem in learning
from a data stream, namely how much data should be seen from a data
stream before starting learning and how to keep the model updated over time.
By incorporating the theory from Hoeffding’s Inequality, this study presents
the Hoeffding Rules algorithm, which can induce modular rules directly from
a streaming data source with dynamic window sizes throughout the learning
period to ensure the efficiency and robustness towards the concept drifts. Concept drift is another unique challenge in mining data streams which the
underlying concept of the data can change either gradually or abruptly over
time and the learner should adapt to these changes as quickly as possible.
This research focuses on the development of a rule-based algorithm, Hoeffding
Rules, for data stream which considers streaming environments as
primary data sources and addresses several unique challenges in learning
rules from data streams such as concept drifts and computational efficiency.
This knowledge facilitates the need and the importance of an interpretable
machine learning model; applying new studies to improve the ability to mine
useful insights from potentially high velocity, high volume and unbounded
data streams. More broadly, this research complements the study in learning
classification rules from data streams to address some of the unique challenges
in data streams compared with conventional batch data, with the
knowledge necessary to systematically and effectively learn expressive and
modular classification rules from data streams
28th International Symposium on Temporal Representation and Reasoning (TIME 2021)
The 28th International Symposium on Temporal Representation and Reasoning (TIME 2021) was planned to take place in Klagenfurt, Austria, but had to move to an online conference due to the insecurities and restrictions caused by the pandemic. Since its frst edition in 1994, TIME Symposium is quite unique in the panorama of the scientifc conferences as its main goal is to bring together researchers from distinct research areas involving the management and representation of temporal data as well as the reasoning about temporal aspects of information. Moreover, TIME Symposium aims to bridge theoretical and applied research, as well as to serve as an interdisciplinary forum for exchange among researchers from the areas of artifcial intelligence, database management, logic and verifcation, and beyond
- …