3,478 research outputs found

    Relational Algebra for In-Database Process Mining

    Get PDF
    The execution logs that are used for process mining in practice are often obtained by querying an operational database and storing the result in a flat file. Consequently, the data processing power of the database system cannot be used anymore for this information, leading to constrained flexibility in the definition of mining patterns and limited execution performance in mining large logs. Enabling process mining directly on a database - instead of via intermediate storage in a flat file - therefore provides additional flexibility and efficiency. To help facilitate this ideal of in-database process mining, this paper formally defines a database operator that extracts the 'directly follows' relation from an operational database. This operator can both be used to do in-database process mining and to flexibly evaluate process mining related queries, such as: "which employee most frequently changes the 'amount' attribute of a case from one task to the next". We define the operator using the well-known relational algebra that forms the formal underpinning of relational databases. We formally prove equivalence properties of the operator that are useful for query optimization and present time-complexity properties of the operator. By doing so this paper formally defines the necessary relational algebraic elements of a 'directly follows' operator, which are required for implementation of such an operator in a DBMS

    Co-presence Communities: Using pervasive computing to support weak social networks

    No full text
    Although the strongest social relationships feature most prominently in our lives, we also maintain a multitude of much weaker connections: the distant colleagues that we share a coffee with in the afternoon; the waitress at a our regular sandwich bar; or the ‘familiar stranger’ we meet each morning on the way to work. These are all examples of weak relationships which have a strong spatial-temporal component but with few support systems available. This paper explores the idea of ‘Co-presence Communities’ - a probabilistic definition of groups that are regularly collocated together - and how they might be used to support weak social networks. An algorithm is presented for mining the Copresence Community definitions from data collected by Bluetooth-enabled mobile phones. Finally, an example application is introduced which utilises these communities for disseminating information

    Unfolding-Based Process Discovery

    Get PDF
    This paper presents a novel technique for process discovery. In contrast to the current trend, which only considers an event log for discovering a process model, we assume two additional inputs: an independence relation on the set of logged activities, and a collection of negative traces. After deriving an intermediate net unfolding from them, we perform a controlled folding giving rise to a Petri net which contains both the input log and all independence-equivalent traces arising from it. Remarkably, the derived Petri net cannot execute any trace from the negative collection. The entire chain of transformations is fully automated. A tool has been developed and experimental results are provided that witness the significance of the contribution of this paper.Comment: This is the unabridged version of a paper with the same title appearead at the proceedings of ATVA 201

    Detecting Flow Anomalies in Distributed Systems

    Get PDF
    Deep within the networks of distributed systems, one often finds anomalies that affect their efficiency and performance. These anomalies are difficult to detect because the distributed systems may not have sufficient sensors to monitor the flow of traffic within the interconnected nodes of the networks. Without early detection and making corrections, these anomalies may aggravate over time and could possibly cause disastrous outcomes in the system in the unforeseeable future. Using only coarse-grained information from the two end points of network flows, we propose a network transmission model and a localization algorithm, to detect the location of anomalies and rank them using a proposed metric within distributed systems. We evaluate our approach on passengers' records of an urbanized city's public transportation system and correlate our findings with passengers' postings on social media microblogs. Our experiments show that the metric derived using our localization algorithm gives a better ranking of anomalies as compared to standard deviation measures from statistical models. Our case studies also demonstrate that transportation events reported in social media microblogs matches the locations of our detect anomalies, suggesting that our algorithm performs well in locating the anomalies within distributed systems

    Temporal decomposition and semantic enrichment of mobility flows

    Get PDF
    Mobility data has increasingly grown in volume over the past decade as loc- alisation technologies for capturing mobility ows have become ubiquitous. Novel analytical approaches for understanding and structuring mobility data are now required to support the back end of a new generation of space-time GIS systems. This data has become increasingly important as GIS is now an essen- tial decision support platform in many domains that use mobility data, such as eet management, accessibility analysis and urban transportation planning. This thesis applies the machine learning method of probabilistic topic mod- elling to decompose and semantically enrich mobility ow data. This process annotates mobility ows with semantic meaning by fusing them with geograph- ically referenced social media data. This thesis also explores the relationship between causality and correlation, as well as the predictability of semantic decompositions obtained during a case study using a real mobility dataset

    Spatial And Temporal Patterns Of Geo-Tagged Tweets

    Get PDF
    With over 500 million current registered users and over 500 million tweets per day, Twitter has caught the attention of scientists in various disciplines. As Twitter allows users to send messages with location tags, a massive amount of valuable geo-social knowledge is embedded in tweets, which can provide useful implications for human geography, urban science, location-based service, targeted advertising, and social network studies. This thesis aims to determine the lifestyle patterns of college students by analyzing the spatial and temporal dynamics in their tweets. Geo-tagged tweets are collected over a period of six months for four US Midwestern college cites: 1) West Lafayette, Indiana (Purdue University); 2) Bloomington, Indiana (Indiana University); 3) Ann Arbor, Michigan (University of Michigan); 4) Columbus, Ohio (The Ohio State University). The overall distribution of the tweets was determined for each city, and the spatial patterns of representative individuals were examined as well. Grouping the tweets in time domains, the temporal patterns on an hourly, daily, and monthly basis were analyzed. Utilizing detailed land use data for each city, further insight about the thematic properties of the tweeting locations was obtained, leading to a deeper understanding about the life, mobility and flow patterns of Twitter users. Finally, space-time clusters and anomalies within tweets, which were considered events, were found with the space-time statistics. The results generally reflected everyday human activity patterns including the mobile population in each city as well as the commute behaviors of the representative users. The tweets also consistently revealed the occurrence of anomalies or events. The results of this thesis therefore confirmed the feasibility and promising future for using geo-tagged micro-blogging services such as Twitter in understanding human behavior patterns and other geo-social related studies

    Modeling Interdependent and Periodic Real-World Action Sequences

    Full text link
    Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million actions taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, our model improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions.Comment: Accepted at WWW 201
    corecore