5 research outputs found
Efficiently correlating complex events over live and archived data streams
Correlating complex events over live and archived data streams, which we call Pattern Correlation Queries (PCQs), provides many benefits for domains which need real-time forecasting of events or identification of causal dependencies, while handling data at high rates and in massive amounts, like in financial or medical settings. Existing work has focused either on complex event processing over a single type of stream source (i.e., either live or archived), or on simple stream correlation queries (e.g., live events trigerring a database lookup). In this paper, we specifically focus on recency-based PCQs and provide clear, useful, and optimizable semantics for them. PCQs raise a number of challenges in optimizing data management and query processing, which we address in the setting of the DejaVu complex event processing system. More specifically, we propose three complementary optimizations including recent in-put buffering, query result caching, and join source ordering. Fur-thermore, we capture the relevant query processing tradeoffs in a cost model. An extensive performance study on synthetic and real-life data sets not only validates this cost model, but also shows that our optimizations are very effective, achieving more than two orders magnitude throughput improvement and much better scala-bility compared to a conventional approach
Statistical mechanics and learning problems in neural networks
My PhD thesis is based on Statistical Mechanics themes and their applications. In the second chapter I test the inverse problem method for a class of
monomer-dimer statistical mechanics models that contain also an attractive
potential and display a mean-field critical point at a boundary of a coexistence
line. I obtain the inversion by analytically identifying the parameters
in terms of the correlation functions and via the maximum-likelihood method.
The precision is tested in the whole phase space and, when close to the coexistence
line, the algorithm is used together with a clustering method to take
care of the underlying possible ambiguity of the inversion.
In the third chapter I perform some analysis in order to characterize
statistical properties of the observed mobility of drosophilas expressing
different kinds of proteins.
In the fourth chapter I give an overview of the already existing algorithm
Replicated Belief Propagation (RBP) deeply analyzing the equations
which define the model. In the fifth chapter I apply the RBP in order to predict the congestion
formation in the framework of complex systems physics. Traffic is a complex
system where vehicle interactions and finite volume effects produce different
collective regimes and phase transition phenomena. Such prediction can
be a difficult problem due to the heterogenous behavior of drivers when
the vehicle density increases. We propose a novel pipeline to classify traffic
slowdowns by analyzing the features extracted from the fundamental diagram
of traffic. I train the RBP and we provide a forewarning time of prediction
related to the training set size. Then I compare my results with those of the most common classifiers used in machine learning analysis
A Data-Descriptive Feedback Framework for Data Stream Management Systems
Data Stream Management Systems (DSMSs) provide support for continuous query evaluation over data streams. Data streams provide processing challenges due to their unbounded nature and varying characteristics, such as rate and density fluctuations. DSMSs need to adapt stream processing to these changes within certain constraints, such as available computational resources and minimum latency requirements in producing results. The proposed research develops an inter-operator feedback framework, where opportunities for run-time adaptation of stream processing are expressed in terms of descriptions of substreams and actions applicable to the substreams, called feedback punctuations. Both the discovery of adaptation opportunities and the exploitation of these opportunities are performed in the query operators. DSMSs are also concerned with state management, in particular, state derived from tuple processing. The proposed research also introduces the Contracts Framework, which provides execution guarantees about state purging in continuous query evaluation for systems with and without inter-operator feedback. This research provides both theoretical and design contributions. The research also includes an implementation and evaluation of the feedback techniques in the NiagaraST DSMS, and a reference implementation of the Contracts Framework
Ant Colony Optimization
Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented
Travel Time Estimation Using NiagaraST and latte
To address increasing traffic congestion and its associated consequences, traffic managers are turning to intelligent transportation management. The latte project is extending data stream technology to handle queries that combine live streams with large data archives, motivated by needs in the Intelligent Transportation Systems (ITS) domain. In particular, we focus on queries that combine live data streams with large data archives. We demonstrate such stream-archive queries via the travel-time estimation problem. The demonstration uses the new latte system which has been developed using the NiagaraST stream processing system and the PORTAL transportation data archive