Search CORE

32 research outputs found

Revisiting Numerical Pattern Mining with Formal Concept Analysis

Author: Kaytoue Mehdi
Kuznetsov Sergei O.
Napoli Amedeo
Publication venue
Publication date: 01/06/2011
Field of study

In this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way, and we prove it. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and used in an evaluation involving real-world data, showing the predominance of the present approach

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Survey Paper on Pattern-Enhanced Topic Model for Data Filtering

Author: Chandrakant S. Aher, Dr. Rekha Rathore
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2017
Field of study

The machine learning & text mining area topic modeling has been extensively accepted etc. To generate statistical model to classify various topics in a collection of documents topic modelling was proposed. A elementary presumption for those approaches is that the documents in the collection are all about one topic. To represent number of topics in a collection of documents, Latent Dirichlet Allocation (LDA) topic modelling technique was proposed, it is also used in the fields of information retrieval. But its effectiveness in information filtering has not been well evaluated. Patterns are usually thought to be more discriminating than single terms for demonstrating documents. To discovered pattern become crucial when selection of the most representative and discriminating patterns from the huge amount. To overcome limitations and problems, a new information model approach is proposed. Proposed model includes user information important to generate in terms of various topics where each topic is represented by patterns. Patterns are generated from topic models and are organized in terms of their statistical and taxonomic features and the most discriminating and representative patterns are proposed to estimate the document relevant to the user?s information needs in order to filter out irrelevant documents. To access the propose model TREC data collection and Reuters Corpus vol. 1 are used for performanc

International Journal on Recent and Innovation Trends in Computing and Communication

Experimental Study of Concise Representations of Concepts and Dependencies

Author: Buzmakov Aleksey
Dudyrev Egor
Kuznetsov Sergei O.
Makhalova Tatiana
Napoli Amedeo
Publication venue
Publication date: 20/06/2022
Field of study

In this paper we are interested in studying concise representations of concepts and dependencies, i.e., implications and association rules. Such representations are based on equivalence classes and their elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises, and pseudo-intents. All these sets of attributes are significant and well studied from the computational point of view, while their statistical properties remain to be studied. This is the purpose of this paper to study these singular attribute sets and in parallel to study how to evaluate the complexity of a dataset from an FCA point of view. In the paper we analyze the empirical distributions and the sizes of these particular attribute sets. In addition we propose several measures of data complexity, such as distributivity, linearity, size of concepts, size of minimum generators, for the analysis of real-world and synthetic datasets

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

CORON: A Framework for Levelwise Itemset Mining Algorithms

Author: Napoli Amedeo
Szathmary Laszlo
Publication venue: HAL CCSD
Publication date: 01/02/2005
Field of study

CORON is a framework for levelwise algorithms that are designed to find frequent and/or frequent closed itemsets in binary contexts. Datasets can be very different in size, number of objects, number of attributes, density, etc. As there is no one best algorithm for arbitrary datasets, we want to give a possibility for users to try different algorithms and choose the one that best suits their needs

INRIA a CCSD electronic archive server

A MATRIX MODEL FOR MINING FREQUENT PATTERNS IN LARGE DATABASES

Author: Divya Bhatnagar
Kamalraj Pardasani
Publication venue
Publication date: 02/04/2020
Field of study

Abstract: This paper proposes a model for mining frequent patterns in large databases by implementing a matrix approach. The whole database is scanned only once and the data is compressed in the form of a matrix. The frequent patterns are then mined from this compressed database which brings efficiency in data mining, as the number of database scans is effectively less than two. The computation time is reduced as some of the patterns are mined simultaneously and searching is minimized. Appropriate mathematical operations are designed and performed on matrices to achieve this efficiency

CiteSeerX

An Improved Association Rule Mining Technique Using Transposed Database

Author: Garg Kanwal
Yadav Ruchika
Publication venue: American Academic Scientific Research Journal for Engineering, Technology, and Sciences
Publication date: 04/08/2015
Field of study

Discovering the association rules among the large databases is the most important feature of data mining. Many algorithms had been introduced by various researchers for finding association rules. Among these algorithms, the FP-growth method is the most proficient. It mines the frequent item set without candidate set generation. The setbacks of FP growth are, it requires two scans of overall database and it uses large number of conditional FP tree to generate frequent itemsets. To overcome these limitations a new approach has been proposed by the name TransTrie, it will use the reduced sorted transposed database. After this it will scan the database and generate a TRIE, in the same step it will also compute the occurrences of each item. Then, using Depth first traversal it will identify the maximal itemsets, from which all frequent itemsets are derived using apriori property. It also counts the support of frequent itemsets which are used to find the valuable association rules

American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS)

An Experiment on Mining Chemical Reaction Databases

Author: Berasaluce Sandra
Laurenço Claude
Napoli Amedeo
Niel Gilles
Publication venue: Hermes Science Publishing, London
Publication date: 01/01/2004
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceIn this paper, we present an experiment on knowledge discovery in chemical reaction databases. Chemical reactions are the main elements on which relies synthesis in organic chemistry, and this is why chemical reactions databases are of first importance. From a problem-solving process perspective, synthesis in organic chemistry must be considered at several levels of abstraction: mainly a strategic level where general synthesis methods are involved, and a tactic level where actual chemical reactions are applied. The research work presented in this paper is aimed at discovering general synthesis methods from chemical reaction databases in order to design generic and reusable synthesis plans. The knowledge discovery process relies on frequent levelwise itemset search and association rule extraction, but also on chemical knowledge involved within every step of the knowledge discovery process. Moreover, the overall process is supervised by an expert of the domain

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

A Fast Algorithm For Data Mining

Author: Raghu Aarathi
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2006
Field of study

In the past few years, there has been a keen interest in mining frequent itemsets in large data repositories. Frequent itemsets correspond to the set of items that occur frequently in transactions in a database. Several novel algorithms have been developed recently to mine closed frequent itemsets - these itemsets are a subset of the frequent itemsets. These algorithms are of practical value: they can be applied to real-world applications to extract patterns of interest in data repositories. However, prior to using an algorithm in practice, it is necessary to know its performance as well implementation issues. In this project, we address such a need for the algorithm “Using Attribute Value Lattice to Find Frequent Itemsets” that was developed by Lin et. al. We clarify some aspects of the algorithm, develop an implementation of the algorithm, and present the results of a performance study. In our experiments we find that the running time of the algorithm for certain input datasets grows exponentially. To address this problem, we develop a novel procedure for binning the data. Our results show that with binned data, the running time of the algorithm grows linearly. This allows one to obtain trends for the dataset

SJSU ScholarWorks