3,478 research outputs found
Relational Algebra for In-Database Process Mining
The execution logs that are used for process mining in practice are often
obtained by querying an operational database and storing the result in a flat
file. Consequently, the data processing power of the database system cannot be
used anymore for this information, leading to constrained flexibility in the
definition of mining patterns and limited execution performance in mining large
logs. Enabling process mining directly on a database - instead of via
intermediate storage in a flat file - therefore provides additional flexibility
and efficiency. To help facilitate this ideal of in-database process mining,
this paper formally defines a database operator that extracts the 'directly
follows' relation from an operational database. This operator can both be used
to do in-database process mining and to flexibly evaluate process mining
related queries, such as: "which employee most frequently changes the 'amount'
attribute of a case from one task to the next". We define the operator using
the well-known relational algebra that forms the formal underpinning of
relational databases. We formally prove equivalence properties of the operator
that are useful for query optimization and present time-complexity properties
of the operator. By doing so this paper formally defines the necessary
relational algebraic elements of a 'directly follows' operator, which are
required for implementation of such an operator in a DBMS
Co-presence Communities: Using pervasive computing to support weak social networks
Although the strongest social relationships feature most prominently in our lives, we also maintain a multitude of much weaker connections: the distant colleagues that we share a coffee with in the afternoon; the waitress at a our regular sandwich bar; or the ‘familiar stranger’ we meet each morning on the way to work. These are all examples of weak relationships which have a strong spatial-temporal component but with few support systems available. This paper explores the idea of ‘Co-presence Communities’ - a probabilistic definition of groups that are regularly collocated together - and how they might be used to support weak social networks. An algorithm is presented for mining the Copresence Community definitions from data collected by Bluetooth-enabled mobile phones. Finally, an example application is introduced which utilises these communities for disseminating information
Unfolding-Based Process Discovery
This paper presents a novel technique for process discovery. In contrast to
the current trend, which only considers an event log for discovering a process
model, we assume two additional inputs: an independence relation on the set of
logged activities, and a collection of negative traces. After deriving an
intermediate net unfolding from them, we perform a controlled folding giving
rise to a Petri net which contains both the input log and all
independence-equivalent traces arising from it. Remarkably, the derived Petri
net cannot execute any trace from the negative collection. The entire chain of
transformations is fully automated. A tool has been developed and experimental
results are provided that witness the significance of the contribution of this
paper.Comment: This is the unabridged version of a paper with the same title
appearead at the proceedings of ATVA 201
Detecting Flow Anomalies in Distributed Systems
Deep within the networks of distributed systems, one often finds anomalies
that affect their efficiency and performance. These anomalies are difficult to
detect because the distributed systems may not have sufficient sensors to
monitor the flow of traffic within the interconnected nodes of the networks.
Without early detection and making corrections, these anomalies may aggravate
over time and could possibly cause disastrous outcomes in the system in the
unforeseeable future. Using only coarse-grained information from the two end
points of network flows, we propose a network transmission model and a
localization algorithm, to detect the location of anomalies and rank them using
a proposed metric within distributed systems. We evaluate our approach on
passengers' records of an urbanized city's public transportation system and
correlate our findings with passengers' postings on social media microblogs.
Our experiments show that the metric derived using our localization algorithm
gives a better ranking of anomalies as compared to standard deviation measures
from statistical models. Our case studies also demonstrate that transportation
events reported in social media microblogs matches the locations of our detect
anomalies, suggesting that our algorithm performs well in locating the
anomalies within distributed systems
Temporal decomposition and semantic enrichment of mobility flows
Mobility data has increasingly grown in volume over the past decade as loc-
alisation technologies for capturing mobility
ows have become ubiquitous.
Novel analytical approaches for understanding and structuring mobility data
are now required to support the back end of a new generation of space-time GIS
systems. This data has become increasingly important as GIS is now an essen-
tial decision support platform in many domains that use mobility data, such
as
eet management, accessibility analysis and urban transportation planning.
This thesis applies the machine learning method of probabilistic topic mod-
elling to decompose and semantically enrich mobility
ow data. This process
annotates mobility
ows with semantic meaning by fusing them with geograph-
ically referenced social media data. This thesis also explores the relationship
between causality and correlation, as well as the predictability of semantic
decompositions obtained during a case study using a real mobility dataset
Spatial And Temporal Patterns Of Geo-Tagged Tweets
With over 500 million current registered users and over 500 million tweets per day, Twitter has caught the attention of scientists in various disciplines. As Twitter allows users to send messages with location tags, a massive amount of valuable geo-social knowledge is embedded in tweets, which can provide useful implications for human geography, urban science, location-based service, targeted advertising, and social network studies. This thesis aims to determine the lifestyle patterns of college students by analyzing the spatial and temporal dynamics in their tweets. Geo-tagged tweets are collected over a period of six months for four US Midwestern college cites: 1) West Lafayette, Indiana (Purdue University); 2) Bloomington, Indiana (Indiana University); 3) Ann Arbor, Michigan (University of Michigan); 4) Columbus, Ohio (The Ohio State University). The overall distribution of the tweets was determined for each city, and the spatial patterns of representative individuals were examined as well. Grouping the tweets in time domains, the temporal patterns on an hourly, daily, and monthly basis were analyzed. Utilizing detailed land use data for each city, further insight about the thematic properties of the tweeting locations was obtained, leading to a deeper understanding about the life, mobility and flow patterns of Twitter users. Finally, space-time clusters and anomalies within tweets, which were considered events, were found with the space-time statistics. The results generally reflected everyday human activity patterns including the mobile population in each city as well as the commute behaviors of the representative users. The tweets also consistently revealed the occurrence of anomalies or events. The results of this thesis therefore confirmed the feasibility and promising future for using geo-tagged micro-blogging services such as Twitter in understanding human behavior patterns and other geo-social related studies
Modeling Interdependent and Periodic Real-World Action Sequences
Mobile health applications, including those that track activities such as
exercise, sleep, and diet, are becoming widely used. Accurately predicting
human actions is essential for targeted recommendations that could improve our
health and for personalization of these applications. However, making such
predictions is extremely difficult due to the complexities of human behavior,
which consists of a large number of potential actions that vary over time,
depend on each other, and are periodic. Previous work has not jointly modeled
these dynamics and has largely focused on item consumption patterns instead of
broader types of behaviors such as eating, commuting or exercising. In this
work, we develop a novel statistical model for Time-varying, Interdependent,
and Periodic Action Sequences. Our approach is based on personalized,
multivariate temporal point processes that model time-varying action
propensities through a mixture of Gaussian intensities. Our model captures
short-term and long-term periodic interdependencies between actions through
Hawkes process-based self-excitations. We evaluate our approach on two activity
logging datasets comprising 12 million actions taken by 20 thousand users over
17 months. We demonstrate that our approach allows us to make successful
predictions of future user actions and their timing. Specifically, our model
improves predictions of actions, and their timing, over existing methods across
multiple datasets by up to 156%, and up to 37%, respectively. Performance
improvements are particularly large for relatively rare and periodic actions
such as walking and biking, improving over baselines by up to 256%. This
demonstrates that explicit modeling of dependencies and periodicities in
real-world behavior enables successful predictions of future actions, with
implications for modeling human behavior, app personalization, and targeting of
health interventions.Comment: Accepted at WWW 201
- …