38 research outputs found

    Efficient Incremental Breadth-Depth XML Event Mining

    Full text link
    Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approach is composed of two-main phases: I) constructing a novel tree structure called Frequency XML-based Tree (FXT), which contains the frequency of events to be mined; II) querying the constructed FXT using XQuery to discover frequent itemsets and association rules. The FXT is constructed with a single-pass over logged data. We implement the proposed algorithm and study various performance issues. The performance study shows that the algorithm is efficient, for both constructing the FXT and discovering association rules

    Auto-administration des entrepôts de données complexes

    Get PDF
    National audienceLes requêtes définies sur les entrepôts de données sont souvent compliquées et utilisent plusieurs opérations de jointure qui sont coûteuses en terme de temps de calcul. Dans le cadre de l'entreposage de données complexes, les adaptations apportées aux schémas classiques d'entrepôts induisent des jointures supplémentaires lors des accès aux données. Ce coût devient encore plus important quand les requêtes opèrent sur de très grands volumes de données. Il est donc primordial de réduire ce temps de calcul. Pour cela, les administrateurs d'entrepôts de données utilisent en général des techniques d'indexation comme les index de jointure en étoile ou les index \textit{bitmap} de jointure. Cela demeure néanmoins complexe et fastidieux. La solution que nous proposons s'inscrit dans une optique d'auto-administration des entrepôts de données. Dans ce cadre, nous proposons une stratégie de sélection automatique d'index. Pour cela, nous avons recouru à une technique de fouille de données, plus particulièrement la recherche de motifs fréquents, pour déterminer un ensemble d'index candidats à partir d'une charge donnée. Nous proposons ensuite des modèles de coût permettant de sélectionner parmi les index ceux qui engendrent le meilleur profit. Ces modèles de coût évaluent en particulier le temps d'accès aux données à travers des index \textit{bitmap} de jointure, ainsi que le coût de maintenance et de stockage de ces index

    Processing and Managing Complex Data for Decision Support

    No full text
    International audienceNowadays, the data management community acknowledges the fact that data are not only numerical or symbolic, but that they may be: represented in various formats (databases, texts, images, sounds, videos...); diversely structured (relational databases, XML documents repository...); originating from several different sources (distributed databases, the Web...); described through several channels or points of view (radiographies and audio diagnosis of a physician, data expressed in different scales or languages...); changing in terms of definition or value (temporal databases, periodical surveys...).Data that fall in several of the above categories may be termed as complex data. Managing such data involves a lot of different issues regarding their structure, storage and processing. However, in many decision support fields (CRM, marketing, competition monitoring, medicine...), they are the real data that need to be exploited. Now that most decision support technologies such as data warehousing, on-line analysis (OLAP) or data mining have proven to be valuable on simple data, the issue of complex data must be addressed.Overall objective of the bookThe objective of this book is to provide an overall view of the field of complex data processing by bringing together various research studies, presumably in different subfields, and underlining the similarities between the different data, issues and approaches. The idea is also to show that many applications can benefit from the exploitation of other data than the ones they usually deal with
    corecore