8,484 research outputs found

    FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases

    Full text link
    In recent years, discovery of association rules among itemsets in a large database has been described as an important database-mining problem. The problem of discovering association rules has received considerable research attention and several algorithms for mining frequent itemsets have been developed. Many algorithms have been proposed to discover rules at single concept level. However, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. The discovery of multiple level association rules is very much useful in many applications. In most of the studies for multiple level association rule mining, the database is scanned repeatedly which affects the efficiency of mining process. In this research paper, a new method for discovering multilevel association rules is proposed. It is based on FP-tree structure and uses cooccurrence frequent item tree to find frequent items in multilevel concept hierarchy.Comment: Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis

    Pattern Detection with Rare Item-set Mining

    Full text link
    The discovery of new and interesting patterns in large datasets, known as data mining, draws more and more interest as the quantities of available data are exploding. Data mining techniques may be applied to different domains and fields such as computer science, health sector, insurances, homeland security, banking and finance, etc. In this paper we are interested by the discovery of a specific category of patterns, known as rare and non-present patterns. We present a novel approach towards the discovery of non-present patterns using rare item-set mining.Comment: 17 pages, 5 figures, International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.1, No.1, August 201

    A Tight Upper Bound on the Number of Candidate Patterns

    Full text link
    In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans

    Mining Generalized Patterns from Large Databases using Ontologies

    Full text link
    Formal Concept Analysis (FCA) is a mathematical theory based on the formalization of the notions of concept and concept hierarchies. It has been successfully applied to several Computer Science fields such as data mining,software engineering, and knowledge engineering, and in many domains like medicine, psychology, linguistics and ecology. For instance, it has been exploited for the design, mapping and refinement of ontologies. In this paper, we show how FCA can benefit from a given domain ontology by analyzing the impact of a taxonomy (on objects and/or attributes) on the resulting concept lattice. We willmainly concentrate on the usage of a taxonomy to extract generalized patterns (i.e., knowledge generated from data when elements of a given domain ontology are used) in the form of concepts and rules, and improve navigation through these patterns. To that end, we analyze three generalization cases and show their impact on the size of the generalized pattern set. Different scenarios of simultaneous generalizations on both objects and attributes are also discusse

    Temporal data mining for root-cause analysis of machine faults in automotive assembly lines

    Full text link
    Engine assembly is a complex and heavily automated distributed-control process, with large amounts of faults data logged everyday. We describe an application of temporal data mining for analyzing fault logs in an engine assembly plant. Frequent episode discovery framework is a model-free method that can be used to deduce (temporal) correlations among events from the logs in an efficient manner. In addition to being theoretically elegant and computationally efficient, frequent episodes are also easy to interpret in the form actionable recommendations. Incorporation of domain-specific information is critical to successful application of the method for analyzing fault logs in the manufacturing domain. We show how domain-specific knowledge can be incorporated using heuristic rules that act as pre-filters and post-filters to frequent episode discovery. The system described here is currently being used in one of the engine assembly plants of General Motors and is planned for adaptation in other plants. To the best of our knowledge, this paper presents the first real, large-scale application of temporal data mining in the manufacturing domain. We believe that the ideas presented in this paper can help practitioners engineer tools for analysis in other similar or related application domains as well

    Abstract Representations and Frequent Pattern Discovery

    Full text link
    We discuss the frequent pattern mining problem in a general setting. From an analysis of abstract representations, summarization and frequent pattern mining, we arrive at a generalization of the problem. Then, we show how the problem can be cast into the powerful language of algorithmic information theory. This allows us to formulate a simple algorithm to mine for all frequent patterns

    Intelligent Search of Correlated Alarms from Database containing Noise Data

    Full text link
    Alarm correlation plays an important role in improving the service and reliability in modern telecommunications networks. Most previous research of alarm correlation didn't consider the effect of noise data in Database. This paper focuses on the method of discovering alarm correlation rules from database containing noise data. We firstly define two parameters Win_freq and Win_add as the measure of noise data and then present the Robust_search algorithm to solve the problem. At different size of Win_freq and Win_add, experiments with alarm data containing noise data show that the Robust_search Algorithm can discover the more rules with the bigger size of Win_add. We also experimentally compare two different interestingness measures of confidence and correlation.Comment: 15 pages,4 figure

    ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

    Full text link
    This paper documents the release of the ELKI data mining framework, version 0.7.5. ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version. We also include an appendix presenting an overview on the overall implemented functionality

    Efficient Web Log Mining using Doubly Linked Tree

    Full text link
    World Wide Web is a huge data repository and is growing with the explosive rate of about 1 million pages a day. As the information available on World Wide Web is growing the usage of the web sites is also growing. Web log records each access of the web page and number of entries in the web logs is increasing rapidly. These web logs, when mined properly can provide useful information for decision-making. The designer of the web site, analyst and management executives are interested in extracting this hidden information from web logs for decision making. Web access pattern, which is the frequently used sequence of accesses, is one of the important information that can be mined from the web logs. This information can be used to gather business intelligence to improve sales and advertisement, personalization for a user, to analyze system performance and to improve the web site organization. There exist many techniques to mine access patterns from the web logs. This paper describes the powerful algorithm that mines the web logs efficiently. Proposed algorithm firstly converts the web access data available in a special doubly linked tree. Each access is called an event. This tree keeps the critical mining related information in very compressed form based on the frequent event count. Proposed recursive algorithm uses this tree to efficiently find all access patterns that satisfy user specified criteria. To prove that our algorithm is efficient from the other GSP (Generalized Sequential Pattern) algorithms we have done experimental studies on sample data.Comment: 5 pages, International Journal of Computer Science and Information Security, ISSN 1947 5500, Impact Factor 0.42

    A framework for redescription set construction

    Get PDF
    Redescription mining is a field of knowledge discovery that aims at finding different descriptions of similar subsets of instances in the data. These descriptions are represented as rules inferred from one or more disjoint sets of attributes, called views. As such, they support knowledge discovery process and help domain experts in formulating new hypotheses or constructing new knowledge bases and decision support systems. In contrast to previous approaches that typically create one smaller set of redescriptions satisfying a pre-defined set of constraints, we introduce a framework that creates large and heterogeneous redescription set from which user/expert can extract compact sets of differing properties, according to its own preferences. Construction of large and heterogeneous redescription set relies on CLUS-RM algorithm and a novel, conjunctive refinement procedure that facilitates generation of larger and more accurate redescription sets. The work also introduces the variability of redescription accuracy when missing values are present in the data, which significantly extends applicability of the method. Crucial part of the framework is the redescription set extraction based on heuristic multi-objective optimization procedure that allows user to define importance levels towards one or more redescription quality criteria. We provide both theoretical and empirical comparison of the novel framework against current state of the art redescription mining algorithms and show that it represents more efficient and versatile approach for mining redescriptions from data
    corecore