8,452 research outputs found
Analysing Temporal Relations – Beyond Windows, Frames and Predicates
This article proposes an approach to rely on the standard
operators of relational algebra (including grouping and ag-
gregation) for processing complex event without requiring
window specifications. In this way the approach can pro-
cess complex event queries of the kind encountered in appli-
cations such as emergency management in metro networks.
This article presents Temporal Stream Algebra (TSA) which
combines the operators of relational algebra with an analy-
sis of temporal relations at compile time. This analysis de-
termines which relational algebra queries can be evaluated
against data streams, i. e. the analysis is able to distinguish
valid from invalid stream queries. Furthermore the analysis
derives functions similar to the pass, propagation and keep
invariants in Tucker's et al. \Exploiting Punctuation Seman-
tics in Continuous Data Streams". These functions enable
the incremental evaluation of TSA queries, the propagation
of punctuations, and garbage collection. The evaluation of
TSA queries combines bulk-wise and out-of-order processing
which makes it tolerant to workload bursts as they typically
occur in emergency management. The approach has been
conceived for efficiently processing complex event queries on
top of a relational database system. It has been deployed
and tested on MonetDB
A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures
This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverable’s authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
The EDAM Project: Mining Atmospheric Aerosol Datasets
Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, is not adequate for studying particle dynamics and real-time correlations. This has led to the development of a new generation of real-time instruments that provide continuous or semi-continuous streams of data about certain aerosol properties. However, these instruments have added a significant level of complexity to atmospheric aerosol data, and dramatically increased the amounts of data to be collected, managed, and analyzed. Our abilit y to integrate the data from all of these new and complex instruments now lags far behind our data-collection capabilities, and severely limits our ability to understand the data and act upon it in a timely manner. In this paper, we present an overview of the EDAM project. The goal of the project, which is in its early stages, is to develop novel data mining algorithms and approaches to managing and monitoring multiple complex data streams. An important objective is data quality assurance, and real-time data mining offers great potential. The approach that we take should also provide good techniques to deal with gas-phase and semi-volatile data. While atmospheric aerosol analysis is an important and challenging domain that motivates us with real problems and serves as a concrete test of our results, our objective is to develop techniques that have broader applicability, and to explore some fundamental challenges in data mining that are not specific to any given application domain
MorphIC: A 65-nm 738k-Synapse/mm Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning
Recent trends in the field of neural network accelerators investigate weight
quantization as a means to increase the resource- and power-efficiency of
hardware devices. As full on-chip weight storage is necessary to avoid the high
energy cost of off-chip memory accesses, memory reduction requirements for
weight storage pushed toward the use of binary weights, which were demonstrated
to have a limited accuracy reduction on many applications when
quantization-aware training techniques are used. In parallel, spiking neural
network (SNN) architectures are explored to further reduce power when
processing sparse event-based data streams, while on-chip spike-based online
learning appears as a key feature for applications constrained in power and
resources during the training phase. However, designing power- and
area-efficient spiking neural networks still requires the development of
specific techniques in order to leverage on-chip online learning on binary
weights without compromising the synapse density. In this work, we demonstrate
MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a
stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning
rule and a hierarchical routing fabric for large-scale chip interconnection.
The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF)
neurons and more than two million plastic synapses for an active silicon area
of 2.86mm in 65nm CMOS, achieving a high density of 738k synapses/mm.
MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy
tradeoff on the MNIST classification task compared to previously-proposed SNNs,
while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE
Transactions on Biomedical Circuits and Systems journal (2019), the
fully-edited paper is available at
https://ieeexplore.ieee.org/document/876400
Metadata-Aware Query Processing over Data Streams
Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination. The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies. The second part proposes herald-driven runtime query plan optimization techniques. We identify four query optimization techniques, design a lightweight algorithm to efficiently detect the optimization opportunities at runtime upon receiving heralds. We propose a novel execution paradigm to support multiple concurrent logical plans by maintaining one physical plan. Extensive experimental study confirms that our techniques significantly reduce query execution times. The third part deals with the shared execution of parameterized queries instantiated from a query template. We design a lightweight index mechanism to provide multiple access paths to data to facilitate a wide range of parameterized queries. To withstand workload fluctuations, we propose an index tuning framework to tune the index configurations in a timely manner. Extensive experimental evaluations demonstrate the effectiveness of the proposed strategies. The last part proposes event query optimization techniques by exploiting event constraints such as exclusiveness or ordering relationships among events extracted from workflows. Significant performance gains are shown to be achieved by our proposed constraint-aware event processing techniques
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
- …