4,001 research outputs found

    Interactive Interpretation of Serial Episodes: Experiments in musical analysis

    Get PDF
    National audienceThe context of this work is the study of sequential data that can be represented with sequences of timestamped events. The aim is to explore these sequences with sequence mining to discover serial episodes which are frequent event subsequences that occur frequently in data (Mannila et al., 1997). The domain of melodic analysis is studied in this work : the aim is to highlight the structure of a musical piece by discovering its main melodic patterns. The episodes produced by the miner are examined by a user generally an expert of the domain who have to identify relevant episodes and interpret them. Meanwhile in the interpretation step, the user has to face to a recurrent overabundance of mining's results which makes difficult the identification of interesting ones. There is a real need to adopt a rigorous approach to methodically manage this step and assist the user's work. For this, we propose a visual and interactive approach to assist the interpretation of serial episodes. An Interactive approach to the interpretation of serial episodes We propose to assist the interpretation task by managing combinatorial redundancy in order to focus on relevant episodes. The assistance combines iteratively ranking and filtering useless episodes to help focusing on relevant ones. It has been exemplified in the Transmute prototype, a web-based application enabling user's interaction with events sequences and serial episodes that are represented graphically on a timeline with customisable icons. The interpretation process consists in the main iterative steps : ranking, selection and filtering. The user can choose measures to rank episodes and then select among them to display their occurrences in the sequence. When a choice is made, a filtering process is triggered to clean up other episodes that can no longer be selected following the previous selections of the user. Finally, the user can interpret the episodes by attaching them annotations and record the model resulting from the interpretation into a knowledge base. The ranking of episodes is performed thanks to several objective interestingness measures which estimate the relative importance and compactness of the episodes in the sequence. The first measure is the event coverage indicator which is the number of distinct events of the occurrences of an episode. The second measure is the spreading indicator which is the number of events of the sequence in the time intervals of the episode occurrences. The noise indicator is the difference between these two previous indicators and corresponds to the number of events of the sequence in the time intervals of the episode occurrences. Temporal measures may also be used when event duration are known. The selection of an episode by the user triggers the filtering process which is based on the event coverage of the selected episode. The remaining episodes are examined and occurrences having at least an event in common with the event coverage are discarded. The support is consequently updated and episodes whose support becomes less than the given frequency threshold are discarded. This results in removing combinatorial redundancy around the chosen episode and leads to a gradual diminution of the remaining episodes, allowing to the user a better focus on other episodes

    Discovering Predictive Event Sequences in Criminal Careers

    Get PDF
    In this work, we consider the problem of predicting criminal behavior, and propose a method for discovering predictive patterns in criminal histories. Quantitative criminal career analysis typically involves clustering individuals according to frequency of a particular event type over time, using cluster membership as a basis for comparison. We demonstrate the effectiveness of hazard pattern mining for the discovery of relationships between different types of events that may occur in criminal careers. Hazard pattern mining is an extension of event sequence mining, with the additional restriction that each event in the pattern is the first subsequent event of the specified type. This restriction facilitates application of established time based measures such as those used in survival analysis. We evaluate hazard patterns using a relative risk model and an accelerated failure time model. The results show that hazard patterns can reliably capture unexpected relationships between events of different types

    Discovering human activities from binary data in smart homes

    Get PDF
    With the rapid development in sensing technology, data mining, and machine learning fields for human health monitoring, it became possible to enable monitoring of personal motion and vital signs in a manner that minimizes the disruption of an individual’s daily routine and assist individuals with difficulties to live independently at home. A primary difficulty that researchers confront is acquiring an adequate amount of labeled data for model training and validation purposes. Therefore, activity discovery handles the problem that activity labels are not available using approaches based on sequence mining and clustering. In this paper, we introduce an unsupervised method for discovering activities from a network of motion detectors in a smart home setting. First, we present an intra-day clustering algorithm to find frequent sequential patterns within a day. As a second step, we present an inter-day clustering algorithm to find the common frequent patterns between days. Furthermore, we refine the patterns to have more compressed and defined cluster characterizations. Finally, we track the occurrences of various regular routines to monitor the functional health in an individual’s patterns and lifestyle. We evaluate our methods on two public data sets captured in real-life settings from two apartments during seven-month and three-month periods

    Explainable temporal data mining techniques to support the prediction task in Medicine

    Get PDF
    In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset

    Mining frequent itemsets a perspective from operations research

    Get PDF
    Many papers on frequent itemsets have been published. Besides somecontests in this field were held. In the majority of the papers the focus ison speed. Ad hoc algorithms and datastructures were introduced. Inthis paper we put most of the algorithms in one framework, usingclassical Operations Research paradigms such as backtracking, depth-first andbreadth-first search, and branch-and-bound. Moreover we presentexperimental results where the different algorithms are implementedunder similar designs.data mining;operation research;Frequent itemsets

    Discovering frequent episodes and learning hidden Markov models: a formal connection

    Full text link
    corecore