18 research outputs found

    Data mining using rule extraction from Kohonen self-organising maps

    Get PDF
    The Kohonen self-organising feature map (SOM) has several important properties that can be used within the data mining/knowledge discovery and exploratory data analysis process. A key characteristic of the SOM is its topology preserving ability to map a multi-dimensional input into a two-dimensional form. This feature is used for classification and clustering of data. However, a great deal of effort is still required to interpret the cluster boundaries. In this paper we present a technique which can be used to extract propositional IF..THEN type rules from the SOM network’s internal parameters. Such extracted rules can provide a human understandable description of the discovered clusters

    Mining Implicit Patterns of Customer Purchasing Behavior Based On The Consideration Of RFM Model

    Get PDF
    Association rules have been developed for years and applied successfully for market basket analysis and cross selling among other business applications. One of the most used approaches in association rules is the Apriori algorithm. However the Apriori algorithm, has long known for its weaknesses that generate enormous amount of rules and alreadyknown facts. In this study, we integrate the RFM attributes with the classical association rule mining, Apriori. Based on RFM model, two indicators, RF score and Sale ratio, are used as measure of interestingness. We propose two algorithms, DWRF and DWRFE, to mine for implicit pattern. In our experimental evaluation, the performance of Apriori, DWRF and DWRFE are compared. The result of our algorithms offers an effective measurement of interesting patterns. Moreover, the DWRF algorithm that uses the RF score as a measure of interestingness seems to be able to promptly reflect the fast-changing customer’s purchase patterns

    Building and Querying Large Modelbases

    Get PDF
    Model building is one of the most important objectives of data mining and data analysis. As many data mining applications, such as personalization, bioinformatics and some large enterprise-wide business applications, become increasingly complex and require a very large number of models, it is becoming progressively more difficult for data analysts to built and to manage a large number of models in these applications on their own. Therefore, development of software tools helping data analysts in these tasks is becoming a pressing issue. This paper presents a model management system supporting various types of data mining models. It describes how to build and populate large heterogeneous modelbases. It also presents a query language for querying these modelbases and examines performance results for some of the queries.Information Systems Working Papers Serie

    Unexpected rules using a conceptual distance based on fuzzy ontology

    Get PDF
    AbstractOne of the major drawbacks of data mining methods is that they generate a notably large number of rules that are often obvious or useless or, occasionally, out of the user’s interest. To address such drawbacks, we propose in this paper an approach that detects a set of unexpected rules in a discovered association rule set. Generally speaking, the proposed approach investigates the discovered association rules using the user’s domain knowledge, which is represented by a fuzzy domain ontology. Next, we rank the discovered rules according to the conceptual distances of the rules

    CAS-MINE: Providing personalized services in context-aware applications by means of generalized rules

    Get PDF
    Context-aware systems acquire and exploit information on the user context to tailor services to a particular user, place, time, and/or event. Hence, they allowservice providers to adapt their services to actual user needs, by offering personalized services depending on the current user context. Service providers are usually interested in profiling users both to increase client satisfaction and to broaden the set of offered services. Novel and efficient techniques are needed to tailor service supply to the user (or the user category) and to the situation inwhich he/she is involved. This paper presents the CAS-Mine framework to efficiently discover relevant relationships between user context data and currently asked services for both user and service profiling. CAS-Mine efficiently extracts generalized association rules, which provide a high-level abstraction of both user habits and service characteristics depending on the context. A lazy (analyst-provided) taxonomy evaluation performed on different attributes (e.g., a geographic hierarchy on spatial coordinates, a classification of provided services) drives the rule generalization process. Extracted rules are classified into groups according to their semantic meaning and ranked by means of quality indices, thus allowing a domain expert to focus on the most relevant patterns. Experiments performed on three context-aware datasets, obtained by logging user requests and context information for three real applications, show the effectiveness and the efficiency of the CAS-Mine framework in mining different valuable types of correlations between user habits, context information, and provided services

    A Survey on Actionable Knowledge

    Full text link
    Actionable Knowledge Discovery (AKD) is a crucial aspect of data mining that is gaining popularity and being applied in a wide range of domains. This is because AKD can extract valuable insights and information, also known as knowledge, from large datasets. The goal of this paper is to examine different research studies that focus on various domains and have different objectives. The paper will review and discuss the methods used in these studies in detail. AKD is a process of identifying and extracting actionable insights from data, which can be used to make informed decisions and improve business outcomes. It is a powerful tool for uncovering patterns and trends in data that can be used for various applications such as customer relationship management, marketing, and fraud detection. The research studies reviewed in this paper will explore different techniques and approaches for AKD in different domains, such as healthcare, finance, and telecommunications. The paper will provide a thorough analysis of the current state of AKD in the field and will review the main methods used by various research studies. Additionally, the paper will evaluate the advantages and disadvantages of each method and will discuss any novel or new solutions presented in the field. Overall, this paper aims to provide a comprehensive overview of the methods and techniques used in AKD and the impact they have on different domains

    Developing A New Decision Support System for University Student Recruitment

    Get PDF
    This paper investigates the practical issues surrounding the development and implementation of Decision Support Systems (DSS). The paper describes the traditional development approaches analyzing their drawbacks and introduces a new DSS development methodology. The proposed DSS methodology is based upon four modules; needs’ analysis, data warehouse (DW), knowledge discovery in database (KDD), and a DSS module. The proposed DSS methodology is applied to and evaluated using the admission and registration functions in Egyptian Universities. The paper investigates the organizational requirements that are required to underpin these functions in Egyptian Universities. These requirements have been identified following an in-depth survey of the recruitment process in the Egyptian Universities. This survey employed a multi-part admission and registration DSS questionnaire (ARDSSQ) to identify the required data sources together with the likely users and their information needs. The questionnaire was sent to senior managers within the Egyptian Universities (both private and government) with responsibility for student recruitment, in particular admission and registration. Further, access to a large database has allowed the evaluation of the practical suitability of using a DW structure and knowledge management tools within the decision making framework. 2000 records have been used to build and test the data mining techniques within the KDD process. The records were drawn from the Arab Academy for Science and Technology and Maritime Transport (AASTMT) students’ database (DB). Moreover, the paper has analyzed the key characteristics of DW and explored the advantages and disadvantages of such data structures. This evaluation has been used to build a DW for the Egyptian Universities that handle their admission and registration related archival data. The decision makers’ potential benefits of the DW within the student recruitment process will be explored. The design of the proposed admission and registration DSS (ARDSS) will be developed and tested using Cool: Gen (5.0) CASE tools by Computer Associates (CA), connected to a MS-SQL Server (6.5), in a Windows NT (4.0) environment. Crystal Reports (4.6) by Seagate will be used as a report generation tool. CLUSTAN Graphics (5.0) by CLUSTAN software will also be used as a clustering package. The ARDSS software could be adjusted for usage in different countries for the same purpose, it is also scalable to handle new decision situations and can be integrated with other systems

    Building and Querying Large Modelbases

    Get PDF
    Model building is one of the most important objectives of data mining and data analysis. As many data mining applications, such as personalization, bioinformatics and some large enterprise-wide business applications, become increasingly complex and require a very large number of models, it is becoming progressively more difficult for data analysts to built and to manage a large number of models in these applications on their own. Therefore, development of software tools helping data analysts in these tasks is becoming a pressing issue. This paper presents a model management system supporting various types of data mining models. It describes how to build and populate large heterogeneous modelbases. It also presents a query language for querying these modelbases and examines performance results for some of the queries.Information Systems Working Papers Serie

    Predicate based association rules mining with new interestingness measure

    Get PDF
    Association Rule Mining (ARM) is one of the fundamental components in the field of data mining that discovers frequent itemsets and interesting relationships for predicting the associative and correlative behaviours for new data. However, traditional ARM techniques are based on support-confidence that discovers interesting association rules (ARs) using predefined minimum support (minsupp) and minimum confidence (minconf) threshold. In addition, traditional AR techniques only consider frequent items while ignoring rare ones. Thus, a new parameter-less predicated based ARM technique was proposed to address these limitations, which was enhanced to handle the frequent and rare items at the same time. Furthermore, a new interestingness measure, called g measure, was developed to select only highly interesting rules. In this proposed technique, interesting combinations were firstly selected by considering both the frequent and the rare items from a dataset. They were then mapped to the pseudo implications using predefined logical conditions. Later, inference rules were used to validate the pseudo-implications to discover rules within the set of mapped pseudo-implications. The resultant set of interesting rules was then referred to as the predicate based association rules. Zoo, breast cancer, and car evaluation datasets were used for conducting experiments. The results of the experiments were evaluated by its comparison with various classification techniques, traditional ARM technique and the coherent rule mining technique. The predicate-based rule mining approach gained an accuracy of 93.33%. In addition, the results of the g measure were compared with a state-of-the-art interestingness measure developed for a coherent rule mining technique called the h value. Predicate rules were discovered with an average confidence value of 0.754 for the zoo dataset and 0.949 for the breast cancer dataset, while the average confidence of the predicate rules found from the car evaluation dataset was 0.582. Results of this study showed that a set of interesting and highly reliable rules were discovered, including frequent, rare and negative association rules that have a higher confidence value. This research resulted in designing a methodology in rule mining which does not rely on the minsupp and minconf threshold. Also, a complete set of association rules are discovered by the proposed technique. Finally, the interestingness measure property for the selection of combinations from datasets makes it possible to reduce the exponential searching of the rules

    Mining subjectively interesting patterns in rich data

    Get PDF
    corecore