32 research outputs found

    Revisiting Numerical Pattern Mining with Formal Concept Analysis

    Get PDF
    In this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way, and we prove it. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and used in an evaluation involving real-world data, showing the predominance of the present approach

    Survey Paper on Pattern-Enhanced Topic Model for Data Filtering

    Get PDF
    The machine learning & text mining area topic modeling has been extensively accepted etc. To generate statistical model to classify various topics in a collection of documents topic modelling was proposed. A elementary presumption for those approaches is that the documents in the collection are all about one topic. To represent number of topics in a collection of documents, Latent Dirichlet Allocation (LDA) topic modelling technique was proposed, it is also used in the fields of information retrieval. But its effectiveness in information filtering has not been well evaluated. Patterns are usually thought to be more discriminating than single terms for demonstrating documents. To discovered pattern become crucial when selection of the most representative and discriminating patterns from the huge amount. To overcome limitations and problems, a new information model approach is proposed. Proposed model includes user information important to generate in terms of various topics where each topic is represented by patterns. Patterns are generated from topic models and are organized in terms of their statistical and taxonomic features and the most discriminating and representative patterns are proposed to estimate the document relevant to the user?s information needs in order to filter out irrelevant documents. To access the propose model TREC data collection and Reuters Corpus vol. 1 are used for performanc

    Experimental Study of Concise Representations of Concepts and Dependencies

    Full text link
    In this paper we are interested in studying concise representations of concepts and dependencies, i.e., implications and association rules. Such representations are based on equivalence classes and their elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises, and pseudo-intents. All these sets of attributes are significant and well studied from the computational point of view, while their statistical properties remain to be studied. This is the purpose of this paper to study these singular attribute sets and in parallel to study how to evaluate the complexity of a dataset from an FCA point of view. In the paper we analyze the empirical distributions and the sizes of these particular attribute sets. In addition we propose several measures of data complexity, such as distributivity, linearity, size of concepts, size of minimum generators, for the analysis of real-world and synthetic datasets

    CORON: A Framework for Levelwise Itemset Mining Algorithms

    Get PDF
    CORON is a framework for levelwise algorithms that are designed to find frequent and/or frequent closed itemsets in binary contexts. Datasets can be very different in size, number of objects, number of attributes, density, etc. As there is no one best algorithm for arbitrary datasets, we want to give a possibility for users to try different algorithms and choose the one that best suits their needs

    A MATRIX MODEL FOR MINING FREQUENT PATTERNS IN LARGE DATABASES

    Get PDF
    Abstract: This paper proposes a model for mining frequent patterns in large databases by implementing a matrix approach. The whole database is scanned only once and the data is compressed in the form of a matrix. The frequent patterns are then mined from this compressed database which brings efficiency in data mining, as the number of database scans is effectively less than two. The computation time is reduced as some of the patterns are mined simultaneously and searching is minimized. Appropriate mathematical operations are designed and performed on matrices to achieve this efficiency

    An Improved Association Rule Mining Technique Using Transposed Database

    Get PDF
    Discovering the association rules among the large databases is the most important feature of data mining. Many algorithms had been introduced by various researchers for finding association rules. Among these algorithms, the FP-growth method is the most proficient. It mines the frequent item set without candidate set generation. The setbacks of FP growth are, it requires two scans of overall database and it uses large number of conditional FP tree to generate frequent itemsets. To overcome these limitations a new approach has been proposed by the name TransTrie, it will use the reduced sorted transposed database. After this it will scan the database and generate a TRIE, in the same step it will also compute the occurrences of each item. Then, using Depth first traversal it will identify the maximal itemsets, from which all frequent itemsets are derived using apriori property.  It also counts the support of frequent itemsets which are used to find the valuable association rules

    An Experiment on Mining Chemical Reaction Databases

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceIn this paper, we present an experiment on knowledge discovery in chemical reaction databases. Chemical reactions are the main elements on which relies synthesis in organic chemistry, and this is why chemical reactions databases are of first importance. From a problem-solving process perspective, synthesis in organic chemistry must be considered at several levels of abstraction: mainly a strategic level where general synthesis methods are involved, and a tactic level where actual chemical reactions are applied. The research work presented in this paper is aimed at discovering general synthesis methods from chemical reaction databases in order to design generic and reusable synthesis plans. The knowledge discovery process relies on frequent levelwise itemset search and association rule extraction, but also on chemical knowledge involved within every step of the knowledge discovery process. Moreover, the overall process is supervised by an expert of the domain

    A Fast Algorithm For Data Mining

    Get PDF
    In the past few years, there has been a keen interest in mining frequent itemsets in large data repositories. Frequent itemsets correspond to the set of items that occur frequently in transactions in a database. Several novel algorithms have been developed recently to mine closed frequent itemsets - these itemsets are a subset of the frequent itemsets. These algorithms are of practical value: they can be applied to real-world applications to extract patterns of interest in data repositories. However, prior to using an algorithm in practice, it is necessary to know its performance as well implementation issues. In this project, we address such a need for the algorithm “Using Attribute Value Lattice to Find Frequent Itemsets” that was developed by Lin et. al. We clarify some aspects of the algorithm, develop an implementation of the algorithm, and present the results of a performance study. In our experiments we find that the running time of the algorithm for certain input datasets grows exponentially. To address this problem, we develop a novel procedure for binning the data. Our results show that with binned data, the running time of the algorithm grows linearly. This allows one to obtain trends for the dataset
    corecore