Search CORE

13 research outputs found

Computing iceberg concept lattices with Titanic

Author: Bastide Yves
Lakhal Lotfi
Pasquier Nicolas
Stumme Gerd
Taouil Rafik
Publication venue: 'Elsevier BV'
Publication date: 01/08/2002
Field of study

International audienceWe introduce the notion of iceberg concept lattices and show their use in knowledge discovery in databases. Iceberg lattices are a conceptual clustering method, which is well suited for analyzing very large databases. They also serve as a condensed representation of frequent itemsets, as starting point for computing bases of association rules, and as a visualization method for association rules. Iceberg concept lattices are based on the theory of Formal Concept Analysis, a mathematical theory with applications in data analysis, information retrieval, and knowledge discovery. We present a new algorithm called TITANIC for computing (iceberg) concept lattices. It is based on data mining techniques with a level-wise approach. In fact, TITANIC can be used for a more general problem: Computing arbitrary closure systems when the closure operator comes along with a so-called weight function. The use of weight functions for computing closure systems has not been discussed in the literature up to now. Applications providing such a weight function include association rule mining, functional dependencies in databases, conceptual clustering, and ontology engineering. The algorithm is experimentally evaluated and compared with Ganter's Next-Closure algorithm. The evaluation shows an important gain in eﬃciency, especially for weakly correlated data

HAL-UNICE

HAL AMU

HAL Clermont Université

INRIA a CCSD electronic archive server

Closed Association Rules

Author: Szathmáry László
Publication venue: 'Annales Mathematicae et Informaticae - AMI'
Publication date: 01/01/2020
Field of study

In this paper we present a new basis for association rules called Closed Association Rules (CR). This basis contains all valid association rules that can be generated from frequent closed itemsets. CR is a lossless representation of all association rules. Regarding the number of rules, our basis is between all association rules (AR) and minimal non-redundant association rules (MNR), filling a gap between them. The new basis provides a framework for some other bases and we show that MNR is a subset of CR. Our experiments show that CR is a good alternative for all association rules. The number of generated rules can be much less, and beside frequent closed itemsets nothing else is required

Repository of the Academy's Library

Set Representation for Rule Generation Algorithms

Author: Kharkongor Carynthia
Nath Bhabesh
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2022
Field of study

The task of mining the association rule has become one of the most widely used discovery pattern methods in Knowledge Discovery in Databases (KDD). One such task is to represent the itemset in the memory. The representation of the itemset largely depend on the type of data structure that is used for storing them. Computing the process of mining the association rule im- pacts the memory and time requirement of the itemset. With the increase in the dimensionality of data and datasets, mining such large volume of datasets will be difficult since all these itemsets cannot be placed in the main memory. As representation of an itemset greatly affects the efficiency of the rule mining association, a compact and compress representation of an itemset is needed. In this paper, a set representation is introduced which is more memory and cost efficient. Bitmap representation takes one byte for an element but the set representation uses one bit. The set representation is being incorporated in Apriori Algorithm. Set representation is also being tested for different rule generation algorithms. The complexities of these different rule generation algorithms using set representation are being compared in terms of memory and time execution

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

What did I do Wrong in my MOBA Game?: Mining Patterns Discriminating Deviant Behaviours

Author: Boulicaut Jean-François
Cavadenti Olivier
Codocedo Victor
Kaytoue Mehdi
Publication venue: HAL CCSD
Publication date: 17/10/2016
Field of study

International audienceThe success of electronic sports (eSports), where professional gamers participate in competitive leagues and tournaments , brings new challenges for the video game industry. Other than fun, games must be difficult and challenging for eSports professionals but still easy and enjoyable for amateurs. In this article, we consider Multi-player Online Battle Arena games (MOBA) and particularly, " Defense of the Ancients 2 " , commonly known simply as DOTA2. In this context, a challenge is to propose data analysis methods and metrics that help players to improve their skills. We design a data mining-based method that discovers strategic patterns from historical behavioral traces: Given a model encoding an expected way of playing (the norm), we are interested in patterns deviating from the norm that may explain a game outcome from which player can learn more efficient ways of playing. The method is formally introduced and shown to be adaptable to different scenarios. Finally, we provide an experimental evaluation over a dataset of 10, 000 behavioral game traces

Crossref

Closed Association Rules

Author: Szathmáry László
Publication venue
Publication date
Field of study

EKE Repository of Publications

Scalable And Efficient Outlier Detection In Large Distributed Data Sets With Mixed-type Attributes

Author: Koufakou Anna
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2009
Field of study

An important problem that appears often when analyzing data involves identifying irregular or abnormal data points called outliers. This problem broadly arises under two scenarios: when outliers are to be removed from the data before analysis, and when useful information or knowledge can be extracted by the outliers themselves. Outlier Detection in the context of the second scenario is a research field that has attracted significant attention in a broad range of useful applications. For example, in credit card transaction data, outliers might indicate potential fraud; in network traffic data, outliers might represent potential intrusion attempts. The basis of deciding if a data point is an outlier is often some measure or notion of dissimilarity between the data point under consideration and the rest. Traditional outlier detection methods assume numerical or ordinal data, and compute pair-wise distances between data points. However, the notion of distance or similarity for categorical data is more difficult to define. Moreover, the size of currently available data sets dictates the need for fast and scalable outlier detection methods, thus precluding distance computations. Additionally, these methods must be applicable to data which might be distributed among different locations. In this work, we propose novel strategies to efficiently deal with large distributed data containing mixed-type attributes. Specifically, we first propose a fast and scalable algorithm for categorical data (AVF), and its parallel version based on MapReduce (MR-AVF). We extend AVF and introduce a fast outlier detection algorithm for large distributed data with mixed-type attributes (ODMAD). Finally, we modify ODMAD in order to deal with very high-dimensional categorical data. Experiments with large real-world and synthetic data show that the proposed methods exhibit large performance gains and high scalability compared to the state-of-the-art, while achieving similar accuracy detection rates

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

IDEAS-1997-2021-Final-Programs

Author: Desai Bipin C.
Publication venue
Publication date: 31/08/2021
Field of study

This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

Concordia University Research Repository

Annales Mathematicae et Informaticae (51.)

Author: Bácsó Sándor
Gorjanc Sonja
Gyimóthy Tibor
Hoffmann Miklós
Holovács József
Juhász Tibor
Kovács László
Kovásznai Gergely
Kozma László
Liptai Kálmán
Luca Florian
Mastroianni Giuseppe
Mátyás Ferenc
Pintér Ákos
Rontó Miklós
Szalay László
Sztrik János
Walsh Gary
Publication venue: Eszterházy Károly University - Institute of Mathematics and Informatics
Publication date
Field of study

EKE Repository of Publications