5 research outputs found

    Efficiently correlating complex events over live and archived data streams

    Get PDF
    Correlating complex events over live and archived data streams, which we call Pattern Correlation Queries (PCQs), provides many benefits for domains which need real-time forecasting of events or identification of causal dependencies, while handling data at high rates and in massive amounts, like in financial or medical settings. Existing work has focused either on complex event processing over a single type of stream source (i.e., either live or archived), or on simple stream correlation queries (e.g., live events trigerring a database lookup). In this paper, we specifically focus on recency-based PCQs and provide clear, useful, and optimizable semantics for them. PCQs raise a number of challenges in optimizing data management and query processing, which we address in the setting of the DejaVu complex event processing system. More specifically, we propose three complementary optimizations including recent in-put buffering, query result caching, and join source ordering. Fur-thermore, we capture the relevant query processing tradeoffs in a cost model. An extensive performance study on synthetic and real-life data sets not only validates this cost model, but also shows that our optimizations are very effective, achieving more than two orders magnitude throughput improvement and much better scala-bility compared to a conventional approach

    Statistical mechanics and learning problems in neural networks

    Get PDF
    My PhD thesis is based on Statistical Mechanics themes and their applications. In the second chapter I test the inverse problem method for a class of monomer-dimer statistical mechanics models that contain also an attractive potential and display a mean-field critical point at a boundary of a coexistence line. I obtain the inversion by analytically identifying the parameters in terms of the correlation functions and via the maximum-likelihood method. The precision is tested in the whole phase space and, when close to the coexistence line, the algorithm is used together with a clustering method to take care of the underlying possible ambiguity of the inversion. In the third chapter I perform some analysis in order to characterize statistical properties of the observed mobility of drosophilas expressing different kinds of proteins. In the fourth chapter I give an overview of the already existing algorithm Replicated Belief Propagation (RBP) deeply analyzing the equations which define the model. In the fifth chapter I apply the RBP in order to predict the congestion formation in the framework of complex systems physics. Traffic is a complex system where vehicle interactions and finite volume effects produce different collective regimes and phase transition phenomena. Such prediction can be a difficult problem due to the heterogenous behavior of drivers when the vehicle density increases. We propose a novel pipeline to classify traffic slowdowns by analyzing the features extracted from the fundamental diagram of traffic. I train the RBP and we provide a forewarning time of prediction related to the training set size. Then I compare my results with those of the most common classifiers used in machine learning analysis

    A Data-Descriptive Feedback Framework for Data Stream Management Systems

    Get PDF
    Data Stream Management Systems (DSMSs) provide support for continuous query evaluation over data streams. Data streams provide processing challenges due to their unbounded nature and varying characteristics, such as rate and density fluctuations. DSMSs need to adapt stream processing to these changes within certain constraints, such as available computational resources and minimum latency requirements in producing results. The proposed research develops an inter-operator feedback framework, where opportunities for run-time adaptation of stream processing are expressed in terms of descriptions of substreams and actions applicable to the substreams, called feedback punctuations. Both the discovery of adaptation opportunities and the exploitation of these opportunities are performed in the query operators. DSMSs are also concerned with state management, in particular, state derived from tuple processing. The proposed research also introduces the Contracts Framework, which provides execution guarantees about state purging in continuous query evaluation for systems with and without inter-operator feedback. This research provides both theoretical and design contributions. The research also includes an implementation and evaluation of the feedback techniques in the NiagaraST DSMS, and a reference implementation of the Contracts Framework

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

    Travel Time Estimation Using NiagaraST and latte

    No full text
    To address increasing traffic congestion and its associated consequences, traffic managers are turning to intelligent transportation management. The latte project is extending data stream technology to handle queries that combine live streams with large data archives, motivated by needs in the Intelligent Transportation Systems (ITS) domain. In particular, we focus on queries that combine live data streams with large data archives. We demonstrate such stream-archive queries via the travel-time estimation problem. The demonstration uses the new latte system which has been developed using the NiagaraST stream processing system and the PORTAL transportation data archive
    corecore