18 research outputs found
Data mining using rule extraction from Kohonen self-organising maps
The Kohonen self-organising feature map (SOM) has several important properties that can be used within the data mining/knowledge discovery and exploratory data analysis process. A key characteristic of the SOM is its topology preserving ability to map a multi-dimensional input into a two-dimensional form. This feature is used for classification and clustering of data. However, a great deal of effort is still required to interpret the cluster boundaries. In this paper we present a technique which can be used to extract propositional IF..THEN type rules from the SOM network’s internal parameters. Such extracted rules can provide a human understandable description of the discovered clusters
Mining Implicit Patterns of Customer Purchasing Behavior Based On The Consideration Of RFM Model
Association rules have been developed for years and applied successfully for market basket analysis and cross selling among other business applications. One of the most used approaches in association rules is the Apriori algorithm. However the Apriori algorithm, has long known for its weaknesses that generate enormous amount of rules and alreadyknown facts. In this study, we integrate the RFM attributes with the classical association rule mining, Apriori. Based on RFM model, two indicators, RF score and Sale ratio, are used as measure of interestingness. We propose two algorithms, DWRF and DWRFE, to mine for implicit pattern. In our experimental evaluation, the performance of Apriori, DWRF and DWRFE are compared. The result of our algorithms offers an effective measurement of interesting patterns. Moreover, the DWRF algorithm that uses the RF score as a measure of interestingness seems to be able to promptly reflect the fast-changing customer’s purchase patterns
Building and Querying Large Modelbases
Model building is one of the most important objectives of
data mining and data analysis. As many data mining
applications, such as personalization, bioinformatics and
some large enterprise-wide business applications, become
increasingly complex and require a very large number of
models, it is becoming progressively more difficult for data
analysts to built and to manage a large number of models
in these applications on their own. Therefore, development
of software tools helping data analysts in these tasks is
becoming a pressing issue. This paper presents a model
management system supporting various types of data
mining models. It describes how to build and populate
large heterogeneous modelbases. It also presents a query
language for querying these modelbases and examines
performance results for some of the queries.Information Systems Working Papers Serie
Unexpected rules using a conceptual distance based on fuzzy ontology
AbstractOne of the major drawbacks of data mining methods is that they generate a notably large number of rules that are often obvious or useless or, occasionally, out of the user’s interest. To address such drawbacks, we propose in this paper an approach that detects a set of unexpected rules in a discovered association rule set. Generally speaking, the proposed approach investigates the discovered association rules using the user’s domain knowledge, which is represented by a fuzzy domain ontology. Next, we rank the discovered rules according to the conceptual distances of the rules
CAS-MINE: Providing personalized services in context-aware applications by means of generalized rules
Context-aware systems acquire and exploit information on the user context to tailor services to a particular user, place, time, and/or event. Hence, they allowservice providers to adapt their services to actual user needs, by offering personalized services depending on the current user context. Service providers are usually interested in profiling users both
to increase client satisfaction and to broaden the set of offered services. Novel and efficient techniques are needed to tailor service supply to the user (or the user category) and to the situation inwhich he/she is involved. This paper presents the CAS-Mine framework to efficiently
discover relevant relationships between user context data and currently asked services for both user and service profiling. CAS-Mine efficiently extracts generalized association rules, which provide a high-level abstraction of both user habits and service characteristics depending
on the context. A lazy (analyst-provided) taxonomy evaluation performed on different attributes (e.g., a geographic hierarchy on spatial coordinates, a classification of provided services) drives the rule generalization process. Extracted rules are classified into groups according to their semantic meaning and ranked by means of quality indices, thus allowing a domain expert to focus on the most relevant patterns. Experiments performed on three context-aware datasets, obtained by logging user requests and context information for three
real applications, show the effectiveness and the efficiency of the CAS-Mine framework in mining different valuable types of correlations between user habits, context information, and provided services
A Survey on Actionable Knowledge
Actionable Knowledge Discovery (AKD) is a crucial aspect of data mining that
is gaining popularity and being applied in a wide range of domains. This is
because AKD can extract valuable insights and information, also known as
knowledge, from large datasets. The goal of this paper is to examine different
research studies that focus on various domains and have different objectives.
The paper will review and discuss the methods used in these studies in detail.
AKD is a process of identifying and extracting actionable insights from data,
which can be used to make informed decisions and improve business outcomes. It
is a powerful tool for uncovering patterns and trends in data that can be used
for various applications such as customer relationship management, marketing,
and fraud detection. The research studies reviewed in this paper will explore
different techniques and approaches for AKD in different domains, such as
healthcare, finance, and telecommunications. The paper will provide a thorough
analysis of the current state of AKD in the field and will review the main
methods used by various research studies. Additionally, the paper will evaluate
the advantages and disadvantages of each method and will discuss any novel or
new solutions presented in the field. Overall, this paper aims to provide a
comprehensive overview of the methods and techniques used in AKD and the impact
they have on different domains
Developing A New Decision Support System for University Student Recruitment
This paper investigates the practical issues surrounding the development and implementation of Decision Support Systems (DSS). The paper describes the traditional development approaches analyzing their drawbacks and introduces a new DSS development methodology.
The proposed DSS methodology is based upon four modules; needs’ analysis, data warehouse (DW), knowledge discovery in database (KDD), and a DSS module. The proposed DSS methodology is applied to and evaluated using the admission and registration functions in Egyptian Universities. The paper investigates the organizational requirements that are required to underpin these functions in Egyptian Universities. These requirements have been identified following an in-depth survey of the recruitment process in the Egyptian Universities. This survey employed a multi-part admission and registration DSS questionnaire (ARDSSQ) to identify the required data sources together with the likely users and their information needs. The questionnaire was sent to senior managers within the Egyptian Universities (both private and government) with responsibility for student recruitment, in particular admission and registration.
Further, access to a large database has allowed the evaluation of the practical suitability of using a DW structure and knowledge management tools within the decision making framework. 2000 records have been used to build and test the data mining techniques within the KDD process. The records were drawn from the Arab Academy for Science and Technology and Maritime Transport (AASTMT) students’ database (DB).
Moreover, the paper has analyzed the key characteristics of DW and explored the advantages and disadvantages of such data structures. This evaluation has been used to build a DW for the Egyptian Universities that handle their admission and registration related archival data. The decision makers’ potential benefits of the DW within the student recruitment process will be explored.
The design of the proposed admission and registration DSS (ARDSS) will be developed and tested using Cool: Gen (5.0) CASE tools by Computer Associates (CA), connected to a MS-SQL Server (6.5), in a Windows NT (4.0) environment. Crystal Reports (4.6) by Seagate will be used as a report generation tool. CLUSTAN Graphics (5.0) by CLUSTAN software will also be used as a clustering package.
The ARDSS software could be adjusted for usage in different countries for the same purpose, it is also scalable to handle new decision situations and can be integrated with other systems
Building and Querying Large Modelbases
Model building is one of the most important objectives of
data mining and data analysis. As many data mining
applications, such as personalization, bioinformatics and
some large enterprise-wide business applications, become
increasingly complex and require a very large number of
models, it is becoming progressively more difficult for data
analysts to built and to manage a large number of models
in these applications on their own. Therefore, development
of software tools helping data analysts in these tasks is
becoming a pressing issue. This paper presents a model
management system supporting various types of data
mining models. It describes how to build and populate
large heterogeneous modelbases. It also presents a query
language for querying these modelbases and examines
performance results for some of the queries.Information Systems Working Papers Serie
Predicate based association rules mining with new interestingness measure
Association Rule Mining (ARM) is one of the fundamental components in the field of data mining that discovers frequent itemsets and interesting relationships for predicting the associative and correlative behaviours for new data. However, traditional ARM techniques are based on support-confidence that discovers interesting association rules (ARs) using predefined minimum support (minsupp) and minimum confidence (minconf) threshold. In addition, traditional AR techniques only consider frequent items while ignoring rare ones. Thus, a new parameter-less predicated based ARM technique was proposed to address these limitations, which was enhanced to handle the frequent and rare items at the same time. Furthermore, a new interestingness measure, called g measure, was developed to select only highly interesting rules. In this proposed technique, interesting combinations were firstly selected by considering both the frequent and the rare items from a dataset. They were then mapped to the pseudo implications using predefined logical conditions. Later, inference rules were used to validate the pseudo-implications to discover rules within the set of mapped pseudo-implications. The resultant set of interesting rules was then referred to as the predicate based association rules. Zoo, breast cancer, and car evaluation datasets were used for conducting experiments. The results of the experiments were evaluated by its comparison with various classification techniques, traditional ARM technique and the coherent rule mining technique. The predicate-based rule mining approach gained an accuracy of 93.33%. In addition, the results of the g measure were compared with a state-of-the-art interestingness measure developed for a coherent rule mining technique called the h value. Predicate rules were discovered with an average confidence value of 0.754 for the zoo dataset and 0.949 for the breast cancer dataset, while the average confidence of the predicate rules found from the car evaluation dataset was 0.582. Results of this study showed that a set of interesting and highly reliable rules were discovered, including frequent, rare and negative association rules that have a higher confidence value. This research resulted in designing a methodology in rule mining which does not rely on the minsupp and minconf threshold. Also, a complete set of association rules are discovered by the proposed technique. Finally, the interestingness measure property for the selection of combinations from datasets makes it possible to reduce the exponential searching of the rules