3,143 research outputs found
Event Correlation and Forecasting over High Dimensional Streaming Sensor Data
Με την πρόοδο της τεχνολογίας, εντείνεται η ανάγκη ανίχνευσης και πρόβλεψης
συμβάντων σε πραγματικό χρόνο, ή σχεδόν πραγματικό χρόνο. Στην πτυχιακή αυτή,
αναλύουμε τι θεωρείται συμβάν για τις επιμέρους ροές δεδομένων και τον τρόπο με τον
οποίο καθίσταται δυνατή η επιτυχής πρόβλεψη των επόμενων συμβάντων, μέσω της
χρήσης ειδικά κατασκευασμένων αλγορίθμων. Οι προβλέψεις αυτές είναι εφικτές,
εξαιτίας των μεταξύ τους συσχετίσεων. Επιπλέον, η ακρίβεια των προβλέψεων
κορυφώνεται, όσο συμπεριλαμβάνεται ένα ευρύτερο φάσμα από παλαιότερα συμβάντακαταστάσεις. Στον πραγματικό κόσμο, οι καταστάσεις αυτές διατηρούν μία φθίνουσα
χρονικά πιθανότητα πραγματοποίησης, κάτι που οι αλγόριθμοι που υλοποιήσαμε
λαμβάνουν υπόψιν. Η διαχείριση Big Data μας ώθησε στη χρήση της γλώσσας
προγραμματισμού Python, σε συνδυασμό με τις βιβλιοθήκες NumPy και Pandas,
προκειμένου να επιτευχθεί βέλτιστος χρόνος εκτέλεσης. Με στόχο την παρουσίαση πιο
ρεαλιστικών αποτελεσμάτων, διαφοροποιήθηκε ένα πλήθος μεταβλητών του
προγράμματος ώστε να επιλεχθούν οι τιμές που έχουν τη βέλτιστη ακρίβεια και
ικανότητα ανάκλησης.As technology advances, the need to detect and predict events in real-time or near realtime intensifies. In this paper, we will discuss, what constitutes an event for every data
stream and how using different algorithms, future events may be predicted. These
predictions are feasible, due to the correlation between corresponding events and
become more frequent as the spectrum of previous events taken into account
increases. Moreover, in real-world applications, these events remain relevant as time
progresses with diminishing probability all the while, which is something that the
algorithms we developed take into account. Due to managing Big Data, a Python
implementation was considered the best approach, since both Pandas and NumPy
libraries provide ease of use and optimal run time for such problems. In order to present
as realistic results as possible, a variety of variables were differentiated so as to extract
the outcome with the best precision and recall
A framework for automated anomaly detection in high frequency water-quality data from in situ sensors
River water-quality monitoring is increasingly conducted using automated in
situ sensors, enabling timelier identification of unexpected values. However,
anomalies caused by technical issues confound these data, while the volume and
velocity of data prevent manual detection. We present a framework for automated
anomaly detection in high-frequency water-quality data from in situ sensors,
using turbidity, conductivity and river level data. After identifying end-user
needs and defining anomalies, we ranked their importance and selected suitable
detection methods. High priority anomalies included sudden isolated spikes and
level shifts, most of which were classified correctly by regression-based
methods such as autoregressive integrated moving average models. However, using
other water-quality variables as covariates reduced performance due to complex
relationships among variables. Classification of drift and periods of
anomalously low or high variability improved when we applied replaced anomalous
measurements with forecasts, but this inflated false positive rates.
Feature-based methods also performed well on high priority anomalies, but were
also less proficient at detecting lower priority anomalies, resulting in high
false negative rates. Unlike regression-based methods, all feature-based
methods produced low false positive rates, but did not and require training or
optimization. Rule-based methods successfully detected impossible values and
missing observations. Thus, we recommend using a combination of methods to
improve anomaly detection performance, whilst minimizing false detection rates.
Furthermore, our framework emphasizes the importance of communication between
end-users and analysts for optimal outcomes with respect to both detection
performance and end-user needs. Our framework is applicable to other types of
high frequency time-series data and anomaly detection applications
Stepwise correlation of multivariate IoT event data based on first-order Markov chains
Correlating events in complex and dynamic IoT environments is a challenging
task not only because of the amount of available data that needs to be
processed but also due to the call for time efficient data processing. In this
paper, we discuss the major steps that should be performed in real- or near
real-time event management focusing on event detection and event correlation.
We investigate the adoption of a univariate change detection algorithm for
real-time event detection and we propose a stepwise event correlation scheme
based on a first-order Markov model. The proposed theory is applied on the
maritime domain and is validated through extensive experimentation with real
sensor streams originating from large-scale sensor networks deployed in a
maritime fleet of ships.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0563
Network Inference via the Time-Varying Graphical Lasso
Many important problems can be modeled as a system of interconnected
entities, where each entity is recording time-dependent observations or
measurements. In order to spot trends, detect anomalies, and interpret the
temporal dynamics of such data, it is essential to understand the relationships
between the different entities and how these relationships evolve over time. In
this paper, we introduce the time-varying graphical lasso (TVGL), a method of
inferring time-varying networks from raw time series data. We cast the problem
in terms of estimating a sparse time-varying inverse covariance matrix, which
reveals a dynamic network of interdependencies between the entities. Since
dynamic network inference is a computationally expensive task, we derive a
scalable message-passing algorithm based on the Alternating Direction Method of
Multipliers (ADMM) to solve this problem in an efficient way. We also discuss
several extensions, including a streaming algorithm to update the model and
incorporate new observations in real time. Finally, we evaluate our TVGL
algorithm on both real and synthetic datasets, obtaining interpretable results
and outperforming state-of-the-art baselines in terms of both accuracy and
scalability
- …