59,709 research outputs found
Robust and cost-effective approach for discovering action rules
The main goal of Knowledge Discovery in
Databases is to find interesting and usable patterns, meaningful
in their domain. Actionable Knowledge Discovery came to
existence as a direct respond to the need of finding more usable
patterns called actionable patterns. Traditional data mining
and algorithms are often confined to deliver frequent patterns
and come short for suggesting how to make these patterns
actionable. In this scenario the users are expected to act.
However, the users are not advised about what to do with
delivered patterns in order to make them usable. In this paper,
we present an automated approach to focus on not only creating
rules but also making the discovered rules actionable.
Up to now few works have been reported in this field which
lacking incomprehensibility to the user, overlooking the cost
and not providing rule generality. Here we attempt to present a
method to resolving these issues. In this paper CEARDM
method is proposed to discover cost-effective action rules from
data. These rules offer some cost-effective changes to
transferring low profitable instances to higher profitable ones.
We also propose an idea for improving in CEARDM method
Rough sets theory and uncertainty into information system
This article is focused on rough sets approach to expression of uncertainty into information system. We assume that the data are presented in the decision table and that some attribute values are lost. At first the theoretical background is described and after that, computations on real-life data are presented. In computation we wok with uncertainty coming from missing attribute values
Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline
From medical charts to national census, healthcare has traditionally operated
under a paper-based paradigm. However, the past decade has marked a long and
arduous transformation bringing healthcare into the digital age. Ranging from
electronic health records, to digitized imaging and laboratory reports, to
public health datasets, today, healthcare now generates an incredible amount of
digital information. Such a wealth of data presents an exciting opportunity for
integrated machine learning solutions to address problems across multiple
facets of healthcare practice and administration. Unfortunately, the ability to
derive accurate and informative insights requires more than the ability to
execute machine learning models. Rather, a deeper understanding of the data on
which the models are run is imperative for their success. While a significant
effort has been undertaken to develop models able to process the volume of data
obtained during the analysis of millions of digitalized patient records, it is
important to remember that volume represents only one aspect of the data. In
fact, drawing on data from an increasingly diverse set of sources, healthcare
data presents an incredibly complex set of attributes that must be accounted
for throughout the machine learning pipeline. This chapter focuses on
highlighting such challenges, and is broken down into three distinct
components, each representing a phase of the pipeline. We begin with attributes
of the data accounted for during preprocessing, then move to considerations
during model building, and end with challenges to the interpretation of model
output. For each component, we present a discussion around data as it relates
to the healthcare domain and offer insight into the challenges each may impose
on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20
Pages, 1 Figur
Constraint-based Sequential Pattern Mining with Decision Diagrams
Constrained sequential pattern mining aims at identifying frequent patterns
on a sequential database of items while observing constraints defined over the
item attributes. We introduce novel techniques for constraint-based sequential
pattern mining that rely on a multi-valued decision diagram representation of
the database. Specifically, our representation can accommodate multiple item
attributes and various constraint types, including a number of non-monotone
constraints. To evaluate the applicability of our approach, we develop an
MDD-based prefix-projection algorithm and compare its performance against a
typical generate-and-check variant, as well as a state-of-the-art
constraint-based sequential pattern mining algorithm. Results show that our
approach is competitive with or superior to these other methods in terms of
scalability and efficiency.Comment: AAAI201
HANDLING MISSING ATTRIBUTE VALUES IN DECISION TABLES USING VALUED TOLERANCE APPROACH
Rule induction is one of the key areas in data mining as it is applied to a large number of real life data. However, in such real life data, the information is incompletely specified most of the time. To induce rules from these incomplete data, more powerful algorithms are necessary. This research work mainly focuses on a probabilistic approach based on the valued tolerance relation. This thesis is divided into two parts. The first part describes the implementation of the valued tolerance relation. The induced rules are then evaluated based on the error rate due to incorrectly classified and unclassified examples. The second part of this research work shows a comparison of the rules induced by the MLEM2 algorithm that has been implemented before, with the rules induced by the valued tolerance based approach which was implemented as part of this research. Hence, through this thesis, the error rate for the MLEM2 algorithm and the valued tolerance based approach are compared and the results are documented
ACon: A learning-based approach to deal with uncertainty in contextual requirements at runtime
Context: Runtime uncertainty such as unpredictable operational environment and failure of sensors that gather environmental data is a well-known challenge for adaptive systems.
Objective: To execute requirements that depend on context correctly, the system needs up-to-date knowledge about the context relevant to such requirements. Techniques to cope with uncertainty in contextual requirements are currently underrepresented. In this paper we present ACon (Adaptation of Contextual requirements), a data-mining approach to deal with runtime uncertainty affecting contextual requirements.
Method: ACon uses feedback loops to maintain up-to-date knowledge about contextual requirements based on current context information in which contextual requirements are valid at runtime. Upon detecting that contextual requirements are affected by runtime uncertainty, ACon analyses and mines contextual data, to (re-)operationalize context and therefore update the information about contextual requirements.
Results: We evaluate ACon in an empirical study of an activity scheduling system used by a crew of 4 rowers in a wild and unpredictable environment using a complex monitoring infrastructure. Our study focused on evaluating the data mining part of ACon and analysed the sensor data collected onboard from 46 sensors and 90,748 measurements per sensor.
Conclusion: ACon is an important step in dealing with uncertainty affecting contextual requirements at runtime while considering end-user interaction. ACon supports systems in analysing the environment to adapt contextual requirements and complements existing requirements monitoring approaches by keeping the requirements monitoring specification up-to-date. Consequently, it avoids manual analysis that is usually costly in today’s complex system environments.Peer ReviewedPostprint (author's final draft
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
- …