4,763 research outputs found

    Mining top-k regular episodes from sensor streams

    Get PDF
    International audienceThe monitoring of human activities plays an important role in health-care applications and for the data mining community. Existing approaches work on activities recognition occurring in sensor data streams. However, regular behaviors have not been studied. Thus, we here introduce a new approach to discover top-k most regular episodes from sensors streams, TKRES. The top-k approach allows us to control the size of the output, thus preventing overwhelming result analysis for the supervisor. TKRES is based on the use of a simple top-k list and a k-tree structure for maintaining the top-k episodes and their occurrence information. We also investigate and report the performances of TKRES on two real-life smart home datasets

    Mining Weighted Frequent Closed Episodes over Multiple Sequences

    Get PDF
    Frequent episode discovery is introduced to mine useful and interesting temporal patterns from sequential data. The existing episode mining methods mainly focused on mining from a single long sequence consisting of events with time constraints. However, there can be multiple sequences of different importance as the persons or entities associated with each sequence can be of different importance. Aiming to mine episodes in multiple sequences of different importance, we first define a new kind of episodes, i.e., the weighted frequent closed episodes, to take sequence importance, episode distribution and occurrence frequency into account together. Secondly, to facilitate the mining of such new episodes, we present a new concept called maximal duration serial episodes to cut a whole sequence into multiple maximum episodes using duration constraints, and discuss its properties for episode shrinking processing. Finally, based on the theoretical properties, we propose a two-phase approach to efficiently mine these new episodes. In Phase I, we adopt a level-wise episode shrinking framework to discover the candidate frequent closed episodes with the same prefixes, and in Phase II, we match the candidates with different prefixes to find the frequent close episodes. Experiments on simulated and real datasets demonstrate that the proposed episode mining strategy has good mining effectiveness and efficiency

    Predictive Handling of Asynchronous Concept Drifts in Distributed Environments

    Get PDF
    In a distributed computing environment, peers collaboratively learn to classify concepts of interest from each other. When external changes happen and their concepts drift, the peers should adapt to avoid increase in misclassification errors. The problem of adaptation becomes more difficult when the changes are asynchronous, i.e., when peers experience drifts at different times. We address this problem by developing an ensemble approach, PINE, that combines reactive adaptation via drift detection, and proactive handling of upcoming changes via early warning and adaptation across the peers. With empirical study on simulated and real world datasets, we show that PINE handles asynchronous concept drifts better and faster than current state-of-the-art approaches, which have been designed to work in less challenging environments. In addition, PINE is parameter insensitive and incurs less communication cost while achieving better accuracy. Keywords: ClassiÂżcation, Distributed Systems, Concept Drift

    Continuous Estimation of Smoking Lapse Risk from Noisy Wrist Sensor Data Using Sparse and Positive-Only Labels

    Get PDF
    Estimating the imminent risk of adverse health behaviors provides opportunities for developing effective behavioral intervention mechanisms to prevent the occurrence of the target behavior. One of the key goals is to find opportune moments for intervention by passively detecting the rising risk of an imminent adverse behavior. Significant progress in mobile health research and the ability to continuously sense internal and external states of individual health and behavior has paved the way for detecting diverse risk factors from mobile sensor data. The next frontier in this research is to account for the combined effects of these risk factors to produce a composite risk score of adverse behaviors using wearable sensors convenient for daily use. Developing a machine learning-based model for assessing the risk of smoking lapse in the natural environment faces significant outstanding challenges requiring the development of novel and unique methodologies for each of them. The first challenge is coming up with an accurate representation of noisy and incomplete sensor data to encode the present and historical influence of behavioral cues, mental states, and the interactions of individuals with their ever-changing environment. The next noteworthy challenge is the absence of confirmed negative labels of low-risk states and adequate precise annotations of high-risk states. Finally, the model should work on convenient wearable devices to facilitate widespread adoption in research and practice. In this dissertation, we develop methods that account for the multi-faceted nature of smoking lapse behavior to train and evaluate a machine learning model capable of estimating composite risk scores in the natural environment. We first develop mRisk, which combines the effects of various mHealth biomarkers such as stress, physical activity, and location history in producing the risk of smoking lapse using sequential deep neural networks. We propose an event-based encoding of sensor data to reduce the effect of noises and then present an approach to efficiently model the historical influence of recent and past sensor-derived contexts on the likelihood of smoking lapse. To circumvent the lack of confirmed negative labels (i.e., annotated low-risk moments) and only a few positive labels (i.e., sensor-based detection of smoking lapse corroborated by self-reports), we propose a new loss function to accurately optimize the models. We build the mRisk models using biomarker (stress, physical activity) streams derived from chest-worn sensors. Adapting the models to work with less invasive and more convenient wrist-based sensors requires adapting the biomarker detection models to work with wrist-worn sensor data. To that end, we develop robust stress and activity inference methodologies from noisy wrist-sensor data. We first propose CQP, which quantifies wrist-sensor collected PPG data quality. Next, we show that integrating CQP within the inference pipeline improves accuracy-yield trade-offs associated with stress detection from wrist-worn PPG sensors in the natural environment. mRisk also requires sensor-based precise detection of smoking events and confirmation through self-reports to extract positive labels. Hence, we develop rSmoke, an orientation-invariant smoking detection model that is robust to the variations in sensor data resulting from orientation switches in the field. We train the proposed mRisk risk estimation models using the wrist-based inferences of lapse risk factors. To evaluate the utility of the risk models, we simulate the delivery of intelligent smoking interventions to at-risk participants as informed by the composite risk scores. Our results demonstrate the envisaged impact of machine learning-based models operating on wrist-worn wearable sensor data to output continuous smoking lapse risk scores. The novel methodologies we propose throughout this dissertation help instigate a new frontier in smoking research that can potentially improve the smoking abstinence rate in participants willing to quit

    R-CAD: Rare Cyber Alert Signature Relationship Extraction Through Temporal Based Learning

    Get PDF
    The large number of streaming intrusion alerts make it challenging for security analysts to quickly identify attack patterns. This is especially difficult since critical alerts often occur too rarely for traditional pattern mining algorithms to be effective. Recognizing the attack speed as an inherent indicator of differing cyber attacks, this work aggregates alerts into attack episodes that have distinct attack speeds, and finds attack actions regularly co-occurring within the same episode. This enables a novel use of the constrained SPADE temporal pattern mining algorithm to extract consistent co-occurrences of alert signatures that are indicative of attack actions that follow each other. The proposed Rare yet Co-occurring Attack action Discovery (R-CAD) system extracts not only the co-occurring patterns but also the temporal characteristics of the co-occurrences, giving the `strong rules\u27 indicative of critical and repeated attack behaviors. Through the use of a real-world dataset, we demonstrate that R-CAD helps reduce the overwhelming volume and variety of intrusion alerts to a manageable set of co-occurring strong rules. We show specific rules that reveal how critical attack actions follow one another and in what attack speed
    • …
    corecore