13 research outputs found

    Computing iceberg concept lattices with Titanic

    Get PDF
    International audienceWe introduce the notion of iceberg concept lattices and show their use in knowledge discovery in databases. Iceberg lattices are a conceptual clustering method, which is well suited for analyzing very large databases. They also serve as a condensed representation of frequent itemsets, as starting point for computing bases of association rules, and as a visualization method for association rules. Iceberg concept lattices are based on the theory of Formal Concept Analysis, a mathematical theory with applications in data analysis, information retrieval, and knowledge discovery. We present a new algorithm called TITANIC for computing (iceberg) concept lattices. It is based on data mining techniques with a level-wise approach. In fact, TITANIC can be used for a more general problem: Computing arbitrary closure systems when the closure operator comes along with a so-called weight function. The use of weight functions for computing closure systems has not been discussed in the literature up to now. Applications providing such a weight function include association rule mining, functional dependencies in databases, conceptual clustering, and ontology engineering. The algorithm is experimentally evaluated and compared with Ganter's Next-Closure algorithm. The evaluation shows an important gain in efficiency, especially for weakly correlated data

    Closed Association Rules

    Get PDF
    In this paper we present a new basis for association rules called Closed Association Rules (CR). This basis contains all valid association rules that can be generated from frequent closed itemsets. CR is a lossless representation of all association rules. Regarding the number of rules, our basis is between all association rules (AR) and minimal non-redundant association rules (MNR), filling a gap between them. The new basis provides a framework for some other bases and we show that MNR is a subset of CR. Our experiments show that CR is a good alternative for all association rules. The number of generated rules can be much less, and beside frequent closed itemsets nothing else is required

    Set Representation for Rule Generation Algorithms

    Get PDF
    The task of mining the association rule has become one of the most widely used discovery pattern methods in Knowledge Discovery in Databases (KDD). One such task is to represent the itemset in the memory. The representation of the itemset largely depend on the type of data structure that is used for storing them. Computing the process of mining the association rule im- pacts the memory and time requirement of the itemset. With the increase in the dimensionality of data and datasets, mining such large volume of datasets will be difficult since all these itemsets cannot be placed in the main memory. As representation of an itemset greatly affects the efficiency of the rule mining association, a compact and compress representation of an itemset is needed. In this paper, a set representation is introduced which is more memory and cost efficient. Bitmap representation takes one byte for an element but the set representation uses one bit. The set representation is being incorporated in Apriori Algorithm. Set representation is also being tested for different rule generation algorithms. The complexities of these different rule generation algorithms using set representation are being compared in terms of memory and time execution

    What did I do Wrong in my MOBA Game?: Mining Patterns Discriminating Deviant Behaviours

    Get PDF
    International audienceThe success of electronic sports (eSports), where professional gamers participate in competitive leagues and tournaments , brings new challenges for the video game industry. Other than fun, games must be difficult and challenging for eSports professionals but still easy and enjoyable for amateurs. In this article, we consider Multi-player Online Battle Arena games (MOBA) and particularly, " Defense of the Ancients 2 " , commonly known simply as DOTA2. In this context, a challenge is to propose data analysis methods and metrics that help players to improve their skills. We design a data mining-based method that discovers strategic patterns from historical behavioral traces: Given a model encoding an expected way of playing (the norm), we are interested in patterns deviating from the norm that may explain a game outcome from which player can learn more efficient ways of playing. The method is formally introduced and shown to be adaptable to different scenarios. Finally, we provide an experimental evaluation over a dataset of 10, 000 behavioral game traces

    Closed Association Rules

    Get PDF
    In this paper we present a new basis for association rules called Closed Association Rules (CR). This basis contains all valid association rules that can be generated from frequent closed itemsets. CR is a lossless representation of all association rules. Regarding the number of rules, our basis is between all association rules (AR) and minimal non-redundant association rules (MNR), filling a gap between them. The new basis provides a framework for some other bases and we show that MNR is a subset of CR. Our experiments show that CR is a good alternative for all association rules. The number of generated rules can be much less, and beside frequent closed itemsets nothing else is required

    Scalable And Efficient Outlier Detection In Large Distributed Data Sets With Mixed-type Attributes

    Get PDF
    An important problem that appears often when analyzing data involves identifying irregular or abnormal data points called outliers. This problem broadly arises under two scenarios: when outliers are to be removed from the data before analysis, and when useful information or knowledge can be extracted by the outliers themselves. Outlier Detection in the context of the second scenario is a research field that has attracted significant attention in a broad range of useful applications. For example, in credit card transaction data, outliers might indicate potential fraud; in network traffic data, outliers might represent potential intrusion attempts. The basis of deciding if a data point is an outlier is often some measure or notion of dissimilarity between the data point under consideration and the rest. Traditional outlier detection methods assume numerical or ordinal data, and compute pair-wise distances between data points. However, the notion of distance or similarity for categorical data is more difficult to define. Moreover, the size of currently available data sets dictates the need for fast and scalable outlier detection methods, thus precluding distance computations. Additionally, these methods must be applicable to data which might be distributed among different locations. In this work, we propose novel strategies to efficiently deal with large distributed data containing mixed-type attributes. Specifically, we first propose a fast and scalable algorithm for categorical data (AVF), and its parallel version based on MapReduce (MR-AVF). We extend AVF and introduce a fast outlier detection algorithm for large distributed data with mixed-type attributes (ODMAD). Finally, we modify ODMAD in order to deal with very high-dimensional categorical data. Experiments with large real-world and synthetic data show that the proposed methods exhibit large performance gains and high scalability compared to the state-of-the-art, while achieving similar accuracy detection rates

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)
    corecore