3,143 research outputs found

    Event Correlation and Forecasting over High Dimensional Streaming Sensor Data

    Get PDF
    Με την πρόοδο της τεχνολογίας, εντείνεται η ανάγκη ανίχνευσης και πρόβλεψης συμβάντων σε πραγματικό χρόνο, ή σχεδόν πραγματικό χρόνο. Στην πτυχιακή αυτή, αναλύουμε τι θεωρείται συμβάν για τις επιμέρους ροές δεδομένων και τον τρόπο με τον οποίο καθίσταται δυνατή η επιτυχής πρόβλεψη των επόμενων συμβάντων, μέσω της χρήσης ειδικά κατασκευασμένων αλγορίθμων. Οι προβλέψεις αυτές είναι εφικτές, εξαιτίας των μεταξύ τους συσχετίσεων. Επιπλέον, η ακρίβεια των προβλέψεων κορυφώνεται, όσο συμπεριλαμβάνεται ένα ευρύτερο φάσμα από παλαιότερα συμβάντακαταστάσεις. Στον πραγματικό κόσμο, οι καταστάσεις αυτές διατηρούν μία φθίνουσα χρονικά πιθανότητα πραγματοποίησης, κάτι που οι αλγόριθμοι που υλοποιήσαμε λαμβάνουν υπόψιν. Η διαχείριση Big Data μας ώθησε στη χρήση της γλώσσας προγραμματισμού Python, σε συνδυασμό με τις βιβλιοθήκες NumPy και Pandas, προκειμένου να επιτευχθεί βέλτιστος χρόνος εκτέλεσης. Με στόχο την παρουσίαση πιο ρεαλιστικών αποτελεσμάτων, διαφοροποιήθηκε ένα πλήθος μεταβλητών του προγράμματος ώστε να επιλεχθούν οι τιμές που έχουν τη βέλτιστη ακρίβεια και ικανότητα ανάκλησης.As technology advances, the need to detect and predict events in real-time or near realtime intensifies. In this paper, we will discuss, what constitutes an event for every data stream and how using different algorithms, future events may be predicted. These predictions are feasible, due to the correlation between corresponding events and become more frequent as the spectrum of previous events taken into account increases. Moreover, in real-world applications, these events remain relevant as time progresses with diminishing probability all the while, which is something that the algorithms we developed take into account. Due to managing Big Data, a Python implementation was considered the best approach, since both Pandas and NumPy libraries provide ease of use and optimal run time for such problems. In order to present as realistic results as possible, a variety of variables were differentiated so as to extract the outcome with the best precision and recall

    A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

    Full text link
    River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, using other water-quality variables as covariates reduced performance due to complex relationships among variables. Classification of drift and periods of anomalously low or high variability improved when we applied replaced anomalous measurements with forecasts, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies, but were also less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, all feature-based methods produced low false positive rates, but did not and require training or optimization. Rule-based methods successfully detected impossible values and missing observations. Thus, we recommend using a combination of methods to improve anomaly detection performance, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and analysts for optimal outcomes with respect to both detection performance and end-user needs. Our framework is applicable to other types of high frequency time-series data and anomaly detection applications

    Stepwise correlation of multivariate IoT event data based on first-order Markov chains

    Full text link
    Correlating events in complex and dynamic IoT environments is a challenging task not only because of the amount of available data that needs to be processed but also due to the call for time efficient data processing. In this paper, we discuss the major steps that should be performed in real- or near real-time event management focusing on event detection and event correlation. We investigate the adoption of a univariate change detection algorithm for real-time event detection and we propose a stepwise event correlation scheme based on a first-order Markov model. The proposed theory is applied on the maritime domain and is validated through extensive experimentation with real sensor streams originating from large-scale sensor networks deployed in a maritime fleet of ships.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0563

    Network Inference via the Time-Varying Graphical Lasso

    Full text link
    Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics of such data, it is essential to understand the relationships between the different entities and how these relationships evolve over time. In this paper, we introduce the time-varying graphical lasso (TVGL), a method of inferring time-varying networks from raw time series data. We cast the problem in terms of estimating a sparse time-varying inverse covariance matrix, which reveals a dynamic network of interdependencies between the entities. Since dynamic network inference is a computationally expensive task, we derive a scalable message-passing algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in an efficient way. We also discuss several extensions, including a streaming algorithm to update the model and incorporate new observations in real time. Finally, we evaluate our TVGL algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming state-of-the-art baselines in terms of both accuracy and scalability
    corecore