8 research outputs found

    Querying Spatio-temporal Patterns in Mobile Phone-Call Databases

    Full text link
    Abstract — Call Detail Record (CDR) databases contain millions of records with information about cell phone calls, including the position of the user when the call was made/received. This huge amount of spatiotemporal data opens the door for the study of human trajectories on a large scale without the bias that other sources (like GPS or WLAN networks) introduce in the population studied. Also, it provides a platform for the development of a wide variety of studies ranging from the spread of diseases to planning of public transport. Nevertheless, previous work on spatiotemporal queries does not provide a framework flexible enough for expressing the complexity of human trajectories. In this paper we present the Spatiotemporal Pattern System (STPS) to query spatiotemporal patterns in very large CDR databases. STPS defines a regular-expression query language that is intuitive and that allows for any combination of spatial and temporal predicates with constraints, including the use of variables. The design of the language took into consideration the layout of the areas being covered by the cellular towers, as well as “areas ” that label places of interested (e.g. neighborhoods, parks, etc) and topological operators. STPS includes an underlying indexing structure and algorithms for query processing using different evaluation strategies. A full implementation of the STPS is currently running with real, very large CDR databases on Telefónica Research Labs. An extensive performance evaluation of the STPS shows that it can efficiently find complex mobility patterns in large CDR databases. I

    ZStream: A cost-based query processor for adaptively detecting composite events

    Get PDF
    Composite (or Complex) event processing (CEP) systems search sequences of incoming events for occurrences of user-specified event patterns. Recently, they have gained more attention in a variety of areas due to their powerful and expressive query language and performance potential. Sequentiality (temporal ordering) is the primary way in which CEP systems relate events to each other. In this paper, we present a CEP system called ZStream to efficiently process such sequential patterns. Besides simple sequential patterns, ZStream is also able to detect other patterns, including conjunction, disjunction, negation and Kleene closure. Unlike most recently proposed CEP systems, which use non-deterministic finite automata (NFA's) to detect patterns, ZStream uses tree-based query plans for both the logical and physical representation of query patterns. By carefully designing the underlying infrastructure and algorithms, ZStream is able to unify the evaluation of sequence, conjunction, disjunction, negation, and Kleene closure as variants of the join operator. Under this framework, a single pattern in ZStream may have several equivalent physical tree plans, with different evaluation costs. We propose a cost model to estimate the computation costs of a plan. We show that our cost model can accurately capture the actual runtime behavior of a plan, and that choosing the optimal plan can result in a factor of four or more speedup versus an NFA based approach. Based on this cost model and using a simple set of statistics about operator selectivity and data rates, ZStream is able to adaptively and seamlessly adjust the order in which it detects patterns on the fly. Finally, we describe a dynamic programming algorithm used in our cost model to efficiently search for an optimal query plan for a given pattern.National Natural Science Foundation (Grant number NETS-NOSS 0520032

    Jumping the ORDER BY Barrier in Large-Scale Pattern Matching

    Get PDF
    Event-series pattern matching is a major component of large-scale data analytics pipelines enabling a wide range of system diagnostics tasks. A precursor to pattern matching is an expensive ``shuffle the world'' stage wherein data are ordered by time and shuffled across the network. Because many existing systems treat the pattern matching engine as a black box, they are unable to optimizing the entire data analytics pipeline, and in particular, this costly shuffle. This paper demonstrates how to optimize such queries. We first translate an expressive class of regular-expression like patterns to relational queries such that they can benefit from decades of progress in relational optimizers, and then we introduce the technique of abstract pattern matching, a linear time preprocessing step which, adapting ideas from symbolic execution and abstract interpretation, discards events from the input guaranteed not to appear in successful matches. Abstract pattern matching first computes a conservative representation of the output-relevant domain of every transition in a pattern based on the (unary) predicates of that transition. It then further refines these domains based on the structure of the pattern (i.e., paths through the pattern) as well as any of the pattern's join predicates across transitions. The outcome is an abstract filter that when applied to the original stream excludes events that are guaranteed not to participate in a match. We implemented and applied abstract pattern matching in COSMOS/Scope to an industrial benchmark where we obtained up to 3 orders of magnitude reduction in shuffled data and 1.23x average speedup in total processing time

    Time Series Management Systems:A Survey

    Get PDF
    The collection of time series data increases as more monitoring and automation are being deployed. These deployments range in scale from an Internet of things (IoT) device located in a household to enormous distributed Cyber-Physical Systems (CPSs) producing large volumes of data at high velocity. To store and analyze these vast amounts of data, specialized Time Series Management Systems (TSMSs) have been developed to overcome the limitations of general purpose Database Management Systems (DBMSs) for times series management. In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. Our classification is organized into categories based on the architectures observed during our analysis. In addition, we provide an overview of each system with a focus on the motivational use case that drove the development of the system, the functionality for storage and querying of time series a system implements, the components the system is composed of, and the capabilities of each system with regard to Stream Processing and Approximate Query Processing (AQP). Last, we provide a summary of research directions proposed by other researchers in the field and present our vision for a next generation TSMS.Comment: 20 Pages, 15 Figures, 2 Tables, Accepted for publication in IEEE TKD

    Tesouraria previsional numa indĂşstria de laticĂ­nios: uma abordagem com o Power BI

    Get PDF
    Nos últimos anos, o avanço tecnológico trouxe consigo um aumento exponencial no volume de dados disponíveis. Com esta mudança de paradigma, tornou-se fundamental para as empresas a recolha e análise de dados para uma tomada de decisão mais eficiente e eficaz. A análise de dados permite que as organizações tenham uma visão completa e atualizada do seu desempenho, identificando tendências, oportunidades e desafios, e alinhando as suas ações para maximizar os resultados e garantir uma vantagem competitiva. Os Key Performance Indicators e o Business Intelligence assumem um papel central, atuando como elementos fundamentais na avaliação do desempenho organizacional. Este estudo dedica-se ao desenvolvimento de um sistema de Business Intelligence através do Microsoft Power BI. Ao longo deste projeto foram exploradas as funcionalidades avançadas de visualização de dados desta ferramenta, incluindo o drill through, para análises mais detalhadas. O sistema irá integrar dados de várias fontes, permitindo uma análise completa da tesouraria previsional

    Model-Based Time Series Management at Scale

    Get PDF

    Complex Event Processing with XChangeEQ

    Get PDF
    The emergence of event-driven architectures, automation of business processes, drastic cost-reductions in sensor technology, and a growing need to monitor IT systems (as well as other systems) due to legal, contractual, or operational considerations lead to an increasing generation of events. This development is accompanied by a growing demand for managing and processing events in an automated and systematic way. Complex Event Processing (CEP) encompasses the (automatable) tasks involved in making sense of all events in a system by deriving higher-level knowledge from lower-level events while the events occur, i.e., in a timely, online fashion and permanently. At the core of CEP are queries which monitor streams of "simple" events for so-called complex events, that is, events or situations that manifest themselves in certain combinations of several events occurring (or not occurring) over time and that cannot be detected from looking only at single events. Querying events is fundamentally different from traditional querying and reasoning with database or Web data, since event queries are standing queries that are evaluated permanently over time against incoming streams of event data. In order to express complex events that are of interest to a particular application or user in a convenient, concise, cost-effective and maintainable manner, special purpose Event Query Languages (EQLs) are needed. This thesis investigates practical and theoretical issues related to querying complex events, covering the spectrum from language design over declarative semantics to operational semantics for incremental query evaluation. Its central topic is the development of the high-level event query language XChangeEQ. In contrast to previous data stream and event query languages, XChangeEQ's language design recognizes the four querying dimensions of data extractions, event composition, temporal relationships, and, for non-monotonic queries involving negation or aggregation, event accumulation. XChangeEQ deals with complex structured data in event messages, thus addressing the need to query events communicated in XML formats over the Web. It supports deductive rules as an abstraction and reasoning mechanism for events. To achieve a full coverage of the four querying dimensions, it builds upon a separation of concerns of the four querying dimensions, which makes it easy-to-use and highly expressive. A recurrent theme in the formal foundations of XChangeEQ is that, despite the fundamental differences between traditional database queries and event queries, many well-known results from databases and logic programming are, with some importance changes, applicable to event queries. Declarative semantics for XChangeEQ are given as a (Tarski-style) model theory with accompanying fixpoint theory. This approach accounts well for (1) data in events and (2) deductive rules defining new events from existing ones, two aspects often neglected in previous work of semantics of EQLs. For the evaluation of event queries, this work introduces operational semantics based on an extended and tailored form of relational algebra and query plans with materialization points. Materialization points account for storing and maintaining information about those received events that are relevant for, i.e., can contribute to, future query answers, as well as for an incremental evaluation that avoids recomputing certain intermediate results. Efficient state maintenance in incremental evaluation is approached by "differentiating" algebra expressions, i.e., by deriving expressions for computing only the changes to materialization points. Knowing how long an event is relevant is a prerequisite for performing garbage collection during event query evaluation and also of central importance for developing cost-based query planners. To this end, this thesis introduces a notion of relevance of events (to a given query plan) and develops methods for determining temporal relevance, a particularly useful form based on time-related information