9 research outputs found

    Discovering Unexpected Patterns in Temporal Data Using Temporal Logic

    Get PDF
    There has been much attention given recently to the task of finding interesting patterns in temporal databases. Since there are so many different approaches to the problem of discovering temporal patterns, we first present a characterization of different discovery tasks and then focus on one task of discovering interesting patterns of events in temporal sequences. Given an (infinite) temporal database or a sequence of events one can, in general, discover an infinite number of temporal patterns in this data. Therefore, it is important to specify some measure of interestingness for discovered patterns and then select only the patterns interesting according to this measure. We present a probabilistic measure of interestingness based on unexpectedness, whereby a pattern P is deemed interesting if the ratio of the actual number of occurrences of P exceeds the expected number of occurrences of P by some user defined threshold. We then make use of a subset of the propositional, linear temporal logic and present an efficient algorithm that discovers unexpected patterns in temporal data. Finally, we apply this algorithm to synthetic data, UNIX operating system calls, and Web logfiles and present the results of these experiments.Information Systems Working Papers Serie

    Spatio-temporal pattern mining from global positioning systems (GPS) trajectories dataset

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science in Geospatial TechnologiesThe increasing frequency of use location-acquisition technology like the Global Positioning System is leading to the collection of large spatio-temporal datasets. The prospect of discovering usable knowledge about movement behavior, which encourages for the discovery of interesting relationships and characteristics users that may exist implicitly in spatial databases. Therefore spatial data mining is emerging as a novel area of research. In this study, the experiments were conducted following the Knowledge Discovery in Database process model. The Knowledge Discovery in Database process model starts from selection of the datasets. The GPS trajectory dataset for this research collected from Microsoft Research Asia Geolife project. After taking the data, it has been preprocessed. The major preprocessing activities include: Fill in missed values and remove outliers; Resolve inconsistencies, integration of data that contains both labeled and unlabeled datasets, Dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 4,273 trajectory dataset are used for training the models. For validating the performance of the selected model a separate 1,018 records are used as a testing set. For building a spatiotemporal model of this study the K-nearest Neighbors (KNN), decision tree and Bayes algorithms have been tasted as supervised approach. The model that was created using 10-fold cross validation with K value 11 and other default parameter values showed the best classification accuracy. The model has a prediction accuracy of 98.5% on the training datasets and 93.12% on the test dataset to classify the new instances as bike, bus, car, subway, train and walk classes. The findings of this study have shown that the spatiotemporal data mining methods help to classify user mobility transportation modes. Future research directions are forwarded to come up an applicable system in the area of the study

    On Unexpectedness in Recommender Systems: Or How to Better Expect the Unexpected

    Get PDF
    Although the broad social and business success of recommender systems has been achieved across several domains, there is still a long way to go in terms of user satisfaction. One of the key dimensions for significant improvement is the concept of unexpectedness. In this paper, we propose a method to improve user satisfaction by generating unexpected recommendations based on the utility theory of economics. In particular, we propose a new concept of unexpectedness as recommending to users those items that depart from what they expect from the system. We define and formalize the concept of unexpectedness and discuss how it differs from the related notions of novelty, serendipity, and diversity. Besides, we suggest several mechanisms for specifying the users’ expectations and propose specific performance metrics to measure the unexpectedness of recommendation lists.We also take into consideration the quality of recommendations using certain utility functions and present an algorithm for providing the users with unexpected recommendations of high quality that are hard to discover but fairly match their interests. Finally, we conduct several experiments on “real-world” data sets to compare our recommendation results with some other standard baseline methods. The proposed approach outperforms these baseline methods in terms of unexpectedness and other important metrics, such as coverage and aggregate diversity, while avoiding any accuracy loss

    Peeking into the other half of the glass : handling polarization in recommender systems.

    Get PDF
    This dissertation is about filtering and discovering information online while using recommender systems. In the first part of our research, we study the phenomenon of polarization and its impact on filtering and discovering information. Polarization is a social phenomenon, with serious consequences, in real-life, particularly on social media. Thus it is important to understand how machine learning algorithms, especially recommender systems, behave in polarized environments. We study polarization within the context of the users\u27 interactions with a space of items and how this affects recommender systems. We first formalize the concept of polarization based on item ratings and then relate it to the item reviews, when available. We then propose a domain independent data science pipeline to automatically detect polarization using the ratings rather than the properties, typically used to detect polarization, such as item\u27s content or social network topology. We perform an extensive comparison of polarization measures on several benchmark data sets and show that our polarization detection framework can detect different degrees of polarization and outperforms existing measures in capturing an intuitive notion of polarization. We also investigate and uncover certain peculiar patterns that are characteristic of environments where polarization emerges: A machine learning algorithm finds it easier to learn discriminating models in polarized environments: The models will quickly learn to keep each user in the safety of their preferred viewpoint, essentially, giving rise to filter bubbles and making them easier to learn. After quantifying the extent of polarization in current recommender system benchmark data, we propose new counter-polarization approaches for existing collaborative filtering recommender systems, focusing particularly on the state of the art models based on Matrix Factorization. Our work represents an essential step toward the new research area concerned with quantifying, detecting and counteracting polarization in human-generated data and machine learning algorithms.We also make a theoretical analysis of how polarization affects learning latent factor models, and how counter-polarization affects these models. In the second part of our dissertation, we investigate the problem of discovering related information by recommendation of tags on social media micro-blogging platforms. Real-time micro-blogging services such as Twitter have recently witnessed exponential growth, with millions of active web users who generate billions of micro-posts to share information, opinions and personal viewpoints, daily. However, these posts are inherently noisy and unstructured because they could be in any format, hence making them difficult to organize for the purpose of retrieval of relevant information. One way to solve this problem is using hashtags, which are quickly becoming the standard approach for annotation of various information on social media, such that varied posts about the same or related topic are annotated with the same hashtag. However hashtags are not used in a consistent manner and most importantly, are completely optional to use. This makes them unreliable as the sole mechanism for searching for relevant information. We investigate mechanisms for consolidating the hashtag space using recommender systems. Our methods are general enough that they can be used for hashtag annotation in various social media services such as twitter, as well as for general item recommendations on systems that rely on implicit user interest data such as e-learning and news sites, or explicit user ratings, such as e-commerce and online entertainment sites. To conclude, we propose a methodology to extract stories based on two types of hashtag co-occurrence graphs. Our research in hashtag recommendation was able to exploit the textual content that is available as part of user messages or posts, and thus resulted in hybrid recommendation strategies. Using content within this context can bridge polarization boundaries. However, when content is not available, is missing, or is unreliable, as in the case of platforms that are rich in multimedia and multilingual posts, the content option becomes less powerful and pure collaborative filtering regains its important role, along with the challenges of polarization

    On Unexpectedness in Recommender Systems: Or How to Better Expect the Unexpected

    Get PDF
    Although the broad social and business success of recommender systems has been achieved across several domains, there is still a long way to go in terms of user satisfaction. One of the key dimensions for significant improvement is the concept of unexpectedness. In this paper, we propose a method to improve user satisfaction by generating unexpected recommendations based on the utility theory of economics. In particular, we propose a new concept of unexpectedness as recommending to users those items that depart from what they expect from the system. We define and formalize the concept of unexpectedness and discuss how it differs from the related notions of novelty, serendipity, and diversity. Besides, we suggest several mechanisms for specifying the users’ expectations and propose specific performance metrics to measure the unexpectedness of recommendation lists.We also take into consideration the quality of recommendations using certain utility functions and present an algorithm for providing the users with unexpected recommendations of high quality that are hard to discover but fairly match their interests. Finally, we conduct several experiments on “real-world” data sets to compare our recommendation results with some other standard baseline methods. The proposed approach outperforms these baseline methods in terms of unexpectedness and other important metrics, such as coverage and aggregate diversity, while avoiding any accuracy loss

    Discovering temporal patterns for interval-based events.

    Get PDF
    Kam, Po-shan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 89-97).Abstracts in English and Chinese.Abstract --- p.iAcknowledgements --- p.iiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Data Mining --- p.1Chapter 1.2 --- Temporal Data Management --- p.2Chapter 1.3 --- Temporal reasoning and temporal semantics --- p.3Chapter 1.4 --- Temporal Data Mining --- p.5Chapter 1.5 --- Motivation --- p.6Chapter 1.6 --- Approach --- p.7Chapter 1.6.1 --- Focus and Objectives --- p.8Chapter 1.6.2 --- Experimental Setup --- p.8Chapter 1.7 --- Outline and contributions --- p.9Chapter 2 --- Relevant Work --- p.10Chapter 2.1 --- Data Mining --- p.10Chapter 2.1.1 --- Association Rules --- p.13Chapter 2.1.2 --- Classification --- p.15Chapter 2.1.3 --- Clustering --- p.16Chapter 2.2 --- Sequential Pattern --- p.17Chapter 2.2.1 --- Frequent Patterns --- p.18Chapter 2.2.2 --- Interesting Patterns --- p.20Chapter 2.2.3 --- Granularity --- p.21Chapter 2.3 --- Temporal Database --- p.21Chapter 2.4 --- Temporal Reasoning --- p.23Chapter 2.4.1 --- Natural Language Expression --- p.24Chapter 2.4.2 --- Temporal Logic Approach --- p.25Chapter 2.5 --- Temporal Data Mining --- p.25Chapter 2.5.1 --- Framework --- p.25Chapter 2.5.2 --- Temporal Association Rules --- p.26Chapter 2.5.3 --- Attribute-Oriented Induction --- p.27Chapter 2.5.4 --- Time Series Analysis --- p.27Chapter 3 --- Discovering Temporal Patterns for interval-based events --- p.29Chapter 3.1 --- Temporal Database --- p.29Chapter 3.2 --- Allen's Taxonomy of Temporal Relationships --- p.31Chapter 3.3 --- "Mining Temporal Pattern, AppSeq and LinkSeq" --- p.33Chapter 3.3.1 --- A1 and A2 temporal pattern --- p.33Chapter 3.3.2 --- "Second Temporal Pattern, LinkSeq" --- p.34Chapter 3.4 --- Overview of the Framework --- p.35Chapter 3.4.1 --- "Mining Temporal Pattern I, AppSeq" --- p.36Chapter 3.4.2 --- "Mining Temporal Pattern II, LinkSeq" --- p.36Chapter 3.5 --- Summary --- p.37Chapter 4 --- "Mining Temporal Pattern I, AppSeq" --- p.38Chapter 4.1 --- Problem Statement --- p.38Chapter 4.2 --- Mining A1 Temporal Patterns --- p.40Chapter 4.2.1 --- Candidate Generation --- p.43Chapter 4.2.2 --- Large k-Items Generation --- p.46Chapter 4.3 --- Mining A2 Temporal Patterns --- p.48Chapter 4.3.1 --- Candidate Generation: --- p.49Chapter 4.3.2 --- Generating Large 2k-Items: --- p.51Chapter 4.4 --- Modified AppOne and AppTwo --- p.51Chapter 4.5 --- Performance Study --- p.53Chapter 4.5.1 --- Experimental Setup --- p.53Chapter 4.5.2 --- Experimental Results --- p.54Chapter 4.5.3 --- Medical Data --- p.58Chapter 4.6 --- Summary --- p.60Chapter 5 --- "Mining Temporal Pattern II, LinkSeq" --- p.62Chapter 5.1 --- Problem Statement --- p.62Chapter 5.2 --- "First Method for Mining LinkSeq, LinkApp" --- p.63Chapter 5.3 --- "Second Method for Mining LinkSeq, LinkTwo" --- p.64Chapter 5.4 --- "Alternative Method for Mining LinkSeq, LinkTree" --- p.65Chapter 5.4.1 --- Sequence Tree: Design --- p.65Chapter 5.4.2 --- Construction of seq-tree --- p.69Chapter 5.4.3 --- Mining LinkSeq using seq-tree --- p.76Chapter 5.5 --- Performance Study --- p.82Chapter 5.6 --- Discussions --- p.85Chapter 5.7 --- Summary --- p.85Chapter 6 --- Conclusion and Future Work --- p.87Chapter 6.1 --- Conclusion --- p.87Chapter 6.2 --- Future Work --- p.88Bibliography --- p.9

    Event-Oriented Dynamic Adaptation of Workflows: Model, Architecture and Implementation

    Get PDF
    Workflow management is widely accepted as a core technology to support long-term business processes in heterogeneous and distributed environments. However, conventional workflow management systems do not provide sufficient flexibility support to cope with the broad range of failure situations that may occur during workflow execution. In particular, most systems do not allow to dynamically adapt a workflow due to a failure situation, e.g., to dynamically drop or insert execution steps. As a contribution to overcome these limitations, this dissertation introduces the agent-based workflow management system AgentWork. AgentWork supports the definition, the execution and, as its main contribution, the event-oriented and semi-automated dynamic adaptation of workflows. Two strategies for automatic workflow adaptation are provided. Predictive adaptation adapts workflow parts affected by a failure in advance (predictively), typically as soon as the failure is detected. This is advantageous in many situations and gives enough time to meet organizational constraints for adapted workflow parts. Reactive adaptation is typically performed when predictive adaptation is not possible. In this case, adaptation is performed when the affected workflow part is to be executed, e.g., before an activity is executed it is checked whether it is subject to a workflow adaptation such as dropping, postponement or replacement. In particular, the following contributions are provided by AgentWork: A Formal Model for Workflow Definition, Execution, and Estimation: In this context, AgentWork first provides an object-oriented workflow definition language. This language allows for the definition of a workflow\u92s control and data flow. Furthermore, a workflow\u92s cooperation with other workflows or workflow systems can be specified. Second, AgentWork provides a precise workflow execution model. This is necessary, as a running workflow usually is a complex collection of concurrent activities and data flow processes, and as failure situations and dynamic adaptations affect running workflows. Furthermore, mechanisms for the estimation of a workflow\u92s future execution behavior are provided. These mechanisms are of particular importance for predictive adaptation. Mechanisms for Determining and Processing Failure Events and Failure Actions: AgentWork provides mechanisms to decide whether an event constitutes a failure situation and what has to be done to cope with this failure. This is formally achieved by evaluating event-condition-action rules where the event-condition part describes under which condition an event has to be viewed as a failure event. The action part represents the necessary actions needed to cope with the failure. To support the temporal dimension of events and actions, this dissertation provides a novel event-condition-action model based on a temporal object-oriented logic. Mechanisms for the Adaptation of Affected Workflows: In case of failure situations it has to be decided how an affected workflow has to be dynamically adapted on the node and edge level. AgentWork provides a novel approach that combines the two principal strategies reactive adaptation and predictive adaptation. Depending on the context of the failure, the appropriate strategy is selected. Furthermore, control flow adaptation operators are provided which translate failure actions into structural control flow adaptations. Data flow operators adapt the data flow after a control flow adaptation, if necessary. Mechanisms for the Handling of Inter-Workflow Implications of Failure Situations: AgentWork provides novel mechanisms to decide whether a failure situation occurring to a workflow affects other workflows that communicate and cooperate with this workflow. In particular, AgentWork derives the temporal implications of a dynamic adaptation by estimating the duration that will be needed to process the changed workflow definition (in comparison with the original definition). Furthermore, qualitative implications of the dynamic change are determined. For this purpose, so-called quality measuring objects are introduced. All mechanisms provided by AgentWork include that users may interact during the failure handling process. In particular, the user has the possibility to reject or modify suggested workflow adaptations. A Prototypical Implementation: Finally, a prototypical Corba-based implementation of AgentWork is described. This implementation supports the integration of AgentWork into the distributed and heterogeneous environments of real-world organizations such as hospitals or insurance business enterprises
    corecore