157 research outputs found

    Hierarchies of Weighted Closed Partially-Ordered Patterns for Enhancing Sequential Data Analysis

    Get PDF
    International audienceDiscovering sequential patterns in sequence databases is an important data mining task. Recently, hierarchies of closed partially-ordered patterns (cpo-patterns), built directly using Relational Concept Analysis (RCA), have been proposed to simplify the interpretation step by highlighting how cpo-patterns relate to each other. However, there are practical cases (e.g. choosing interesting navigation paths in the obtained hierarchies) when these hierarchies are still insufficient for the expert. To address these cases, we propose to extract hierarchies of more informative cpo-patterns, namely weighted cpo-patterns (wcpo-patterns), by extending the RCA-based approach. These wcpo-patterns capture and explicitly show not only the order on itemsets but also their different influence on the analysed sequences. We illustrate how the proposed wcpo-patterns can enhance sequential data analysis on a toy example

    Mining complex structured data: Enhanced methods and applications

    Get PDF
    Conventional approaches to analysing complex business data typically rely on process models, which are difficult to construct and use. This thesis addresses this issue by converting semi-structured event logs to a simpler flat representation without any loss of information, which then enables direct applications of classical data mining methods. The thesis also proposes an effective and scalable classification method which can identify distinct characteristics of a business process for further improvements

    Large-Scale Pattern-Based Information Extraction from the World Wide Web

    Get PDF
    Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web

    Enhancing operational performance of AHUs through an advanced fault detection and diagnosis process based on temporal association and decision rules

    Get PDF
    The pervasive monitoring of HVAC systems through Building Energy Management Systems (BEMSs) is enabling the full exploitation of data-driven based methodologies for performing advanced energy management strategies. In this context, the implementation of Automated Fault Detection and Diagnosis (AFDD) based on collected operational data of Air Handling Units (AHUs) proved to be particularly effective to prevent anomalous running modes which can lead to significant energy waste over time and discomfort conditions in the built environment. The present work proposes a novel methodology for performing AFDD, based on both unsupervised and supervised data-driven methods tailored according to the operation of an AHU during transient and non-transient periods. The whole process is developed and tested on a sample of real data gathered from monitoring campaigns on two identical AHUs in the framework of the Research Project ASHRAE RP-1312. During the start-up period of operation, the methodology exploits Temporal Association Rules Mining (TARM) algorithm for an early detection of faults, while during non-transient period a number of classification models are developed for the identification of the deviation from the normal operation. The proposed methodology, conceived for quasi real-time implementation, proved to be capable of robustly and promptly identifying the presence of typical faults in AHUs

    Representation learning in complex data via pattern discovery

    Full text link
    This study proposes effective methods to learn meaningful representations for complex data such as sequences and graphs. It combines two important techniques in data mining and machine learning: pattern discovery and representation learning. The proposed methods can be applied to different real-world problems including healthcare analysis, business marketing, and bioinformatic

    Pattern Mining and Sense-Making Support for Enhancing the User Experience

    Get PDF
    While data mining techniques such as frequent itemset and sequence mining are well established as powerful pattern discovery tools in domains from science, medicine to business, a detriment is the lack of support for interactive exploration of high numbers of patterns generated with diverse parameter settings and the relationships among the mined patterns. To enhance the user experience, real-time query turnaround times and improved support for interactive mining are desired. There is also an increasing interest in applying data mining solutions for mobile data. Patterns mined over mobile data may enable context-aware applications ranging from automating frequently repeated tasks to providing personalized recommendations. Overall, this dissertation addresses three problems that limit the utility of data mining, namely, (a.) lack of interactive exploration tools for mined patterns, (b.) insufficient support for mining localized patterns, and (c.) high computational mining requirements prohibiting mining of patterns on smaller compute units such as a smartphone. This dissertation develops interactive frameworks for the guided exploration of mined patterns and their relationships. Contributions include the PARAS pre- processing and indexing framework; enabling analysts to gain key insights into rule relationships in a parameter space view due to the compact storage of rules that enables query-time reconstruction of complete rulesets. Contributions also include the visual rule exploration framework FIRE that presents an interactive dual view of the parameter space and the rule space, that together enable enhanced sense-making of rule relationships. This dissertation also supports the online mining of localized association rules computed on data subsets by selectively deploying alternative execution strategies that leverage multidimensional itemset-based data partitioning index. Finally, we designed OLAPH, an on-device context-aware service that learns phone usage patterns over mobile context data such as app usage, location, call and SMS logs to provide device intelligence. Concepts introduced for modeling mobile data as sequences include compressing context logs to intervaled context events, adding generalized time features, and identifying meaningful sequences via filter expressions

    Mining behavioral sequence constraints for classification

    Get PDF

    Mining subjectively interesting patterns in rich data

    Get PDF
    • …
    corecore