2,027 research outputs found

    Efficient construction of the lattice of frequent closed patterns and simultaneous extraction of generic bases of rules

    Full text link
    In the last few years, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analyzed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose a new algorithm, called PRINCE. Its main feature is the construction of a partially ordered structure for extracting subsets of association rules, called generic bases. Without loss of information these subsets form representation of the whole association rule set. To reduce the cost of such a construction, the partially ordered structure is built thanks to the minimal generators associated to frequent closed patterns. The closed ones are simultaneously derived with generic bases thanks to a simple bottom-up traversal of the obtained structure. The experimentations we carried out in benchmark and "worst case" contexts showed the efficiency of the proposed algorithm, compared to algorithms like CLOSE, A-CLOSE and TITANIC.Comment: 50 pages, in Frenc

    Feature Extraction and Duplicate Detection for Text Mining: A Survey

    Get PDF
    Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

    Construction efficace du treillis des motifs fermés fréquents et extraction simultanée des bases génériques de règles

    Get PDF
    Durant ces dernières années, les quantités de données collectées, dans divers domaines d’application de l’informatique, deviennent de plus en plus importantes. Ces quantités suscitent le besoin d’analyse et d’interprétation afin d’en extraire des connaissances utiles. Dans ce travail, nous nous intéressons à la technique d’extraction des règles d’association à partir de larges contextes. Cette dernière est parmi les techniques les plus fréquemment utilisées en fouille de données. Toutefois, le nombre de règles extraites est généralement important avec en outre la présence de règles redondantes. Dans cet article, nous proposons un nouvel algorithme, appelé PRINCE, dont la principale originalité est de construire une structure partiellement ordonnée (nommée treillis d’Iceberg) dans l’objectif d’extraire des ensembles réduits de règles, appelés bases génériques. Ces bases forment un sous-ensemble, sans perte d’information, des règles d’association. Pour réduire le coût de cette construction, le treillis d’Iceberg est calculé grâce aux générateurs minimaux, associés aux motifs fermés fréquents. Ces derniers sont simultanément dérivés avec les bases génériques grâce à un simple parcours ascendant de la structure construite. Les expérimentations que nous avons réalisées sur des contextes de référence et « pire des cas » ont montré l’efficacité de l’algorithme proposé, comparativement à des algorithmes tels que CLOSE, A-CLOSE et TITANIC.In the last few years, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analyzed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose a new algorithm, called PRINCE. Its main feature is the construction of a partially ordered structure for extracting subsets of association rules, called generic bases. Without loss of information these subsets form representation of the whole association rule set. To reduce the cost of such a construction, the partially ordered structure is built thanks to the minimal generators associated to fréquent closed patterns. The closed ones are simultaneously derived with generic bases thanks to a simple bottom up traversal of the obtained structure. The experimentations we carried out in benchmark and « worst case » contexts showed the efficiency of the proposed algorithm, compared to algorithms like CLOSE, A-CLOSE and TITANIC

    Multidimensional process discovery

    Get PDF

    Annales Mathematicae et Informaticae 2020

    Get PDF

    What did I do Wrong in my MOBA Game?: Mining Patterns Discriminating Deviant Behaviours

    Get PDF
    International audienceThe success of electronic sports (eSports), where professional gamers participate in competitive leagues and tournaments , brings new challenges for the video game industry. Other than fun, games must be difficult and challenging for eSports professionals but still easy and enjoyable for amateurs. In this article, we consider Multi-player Online Battle Arena games (MOBA) and particularly, " Defense of the Ancients 2 " , commonly known simply as DOTA2. In this context, a challenge is to propose data analysis methods and metrics that help players to improve their skills. We design a data mining-based method that discovers strategic patterns from historical behavioral traces: Given a model encoding an expected way of playing (the norm), we are interested in patterns deviating from the norm that may explain a game outcome from which player can learn more efficient ways of playing. The method is formally introduced and shown to be adaptable to different scenarios. Finally, we provide an experimental evaluation over a dataset of 10, 000 behavioral game traces
    • …
    corecore