51 research outputs found

    Online Unsupervised State Recognition in Sensor Data

    Get PDF
    Smart sensors, such as smart meters or smart phones, are nowadays ubiquitous. To be "smart", however, they need to process their input data with limited storage and computational resources. In this paper, we convert the stream of sensor data into a stream of symbols, and further, to higher level symbols in such a way that common analytical tasks such as anomaly detection, forecasting or state recognition, can still be carried out on the transformed data with almost no loss of accuracy, and using far fewer resources. We identify states of a monitored system and convert them into symbols (thus, reducing data size), while keeping "interesting" events, such as anomalies or transition between states, as it is. Our algorithm is able to find states of various length in an online and unsupervised way, which is crucial since behavior of the system is not known beforehand. We show the effectiveness of our approach using real-world datasets and various application scenarios

    Non-intrusive load monitoring solutions for low- and very low-rate granularity

    Get PDF
    Strathclyde theses - ask staff. Thesis no. : T15573Large-scale smart energy metering deployment worldwide and the integration of smart meters within the smart grid are enabling two-way communication between the consumer and energy network, thus ensuring an improved response to demand. Energy disaggregation or non-intrusive load monitoring (NILM), namely disaggregation of the total metered electricity consumption down to individual appliances using purely algorithmic tools, is gaining popularity as an added-value that makes the most of meter data.In this thesis, the first contribution tackles low-rate NILM problem by proposing an approach based on graph signal processing (GSP) that does not require any training.Note that Low-rate NILM refers to NILM of active power measurements only, at rates from 1 second to 1 minute. Adaptive thresholding, signal clustering and pattern matching are implemented via GSP concepts and applied to the NILM problem. Then for further demonstration of GSP potential, GSP concepts are applied at both, physical signal level via graph-based filtering and data level, via effective semi-supervised GSP-based feature matching. The proposed GSP-based NILM-improving methods are generic and can be used to improve the results of various event-based NILM approaches. NILM solutions for very low data rates (15-60 min) cannot leverage on low to highrates NILM approaches. Therefore, the third contribution of this thesis comprises three very low-rate load disaggregation solutions, based on supervised (i) K-nearest neighbours relying on features such as statistical measures of the energy signal, time usage profile of appliances and reactive power consumption (if available); unsupervised(ii) optimisation performing minimisation of error between aggregate and the sum of estimated individual loads, where energy consumed by always-on load is heuristically estimated prior to further disaggregation and appliance models are built only by manufacturer information; and (iii) GSP as a variant of aforementioned GSP-based solution proposed for low-rate load disaggregation, with an additional graph of time-of-day information.Large-scale smart energy metering deployment worldwide and the integration of smart meters within the smart grid are enabling two-way communication between the consumer and energy network, thus ensuring an improved response to demand. Energy disaggregation or non-intrusive load monitoring (NILM), namely disaggregation of the total metered electricity consumption down to individual appliances using purely algorithmic tools, is gaining popularity as an added-value that makes the most of meter data.In this thesis, the first contribution tackles low-rate NILM problem by proposing an approach based on graph signal processing (GSP) that does not require any training.Note that Low-rate NILM refers to NILM of active power measurements only, at rates from 1 second to 1 minute. Adaptive thresholding, signal clustering and pattern matching are implemented via GSP concepts and applied to the NILM problem. Then for further demonstration of GSP potential, GSP concepts are applied at both, physical signal level via graph-based filtering and data level, via effective semi-supervised GSP-based feature matching. The proposed GSP-based NILM-improving methods are generic and can be used to improve the results of various event-based NILM approaches. NILM solutions for very low data rates (15-60 min) cannot leverage on low to highrates NILM approaches. Therefore, the third contribution of this thesis comprises three very low-rate load disaggregation solutions, based on supervised (i) K-nearest neighbours relying on features such as statistical measures of the energy signal, time usage profile of appliances and reactive power consumption (if available); unsupervised(ii) optimisation performing minimisation of error between aggregate and the sum of estimated individual loads, where energy consumed by always-on load is heuristically estimated prior to further disaggregation and appliance models are built only by manufacturer information; and (iii) GSP as a variant of aforementioned GSP-based solution proposed for low-rate load disaggregation, with an additional graph of time-of-day information

    Hierarchical feature extraction from spatiotemporal data for cyber-physical system analytics

    Get PDF
    With the advent of ubiquitous sensing, robust communication and advanced computation, data-driven modeling is increasingly becoming popular for many engineering problems. Eliminating difficulties of physics-based modeling, avoiding simplifying assumptions and ad hoc empirical models are significant among many advantages of data-driven approaches, especially for large-scale complex systems. While classical statistics and signal processing algorithms have been widely used by the engineering community, advanced machine learning techniques have not been sufficiently explored in this regard. This study summarizes various categories of machine learning tools that have been applied or may be a candidate for addressing engineering problems. While there are increasing number of machine learning algorithms, the main steps involved in applying such techniques to the problems consist in: data collection and pre-processing, feature extraction, model training and inference for decision-making. To support decision-making processes in many applications, hierarchical feature extraction is key. Among various feature extraction principles, recent studies emphasize hierarchical approaches of extracting salient features that is carried out at multiple abstraction levels from data. In this context, the focus of the dissertation is towards developing hierarchical feature extraction algorithms within the framework of machine learning in order to solve challenging cyber-physical problems in various domains such as electromechanical systems and agricultural systems. Furthermore, the feature extraction techniques are described using the spatial, temporal and spatiotemporal data types collected from the systems. The wide applicability of such features in solving some selected real-life domain problems are demonstrated throughout this study

    Data Challenges and Data Analytics Solutions for Power Systems

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Energy-efficient Continuous Context Sensing on Mobile Phones

    Get PDF
    With the ever increasing adoption of smartphones worldwide, researchers have found the perfect sensor platform to perform context-based research and to prepare for context-based services to be also deployed for the end-users. However, continuous context sensing imposes a considerable challenge in balancing the energy consumption of the sensors, the accuracy of the recognized context and its latency. After outlining the common characteristics of continuous sensing systems, we present a detailed overview of the state of the art, from sensors sub-systems to context inference algorithms. Then, we present the three main contribution of this thesis. The first approach we present is based on the use of local communications to exchange sensing information with neighboring devices. As proximity, location and environmental information can be obtained from nearby smartphones, we design a protocol for synchronizing the exchanges and fairly distribute the sensing tasks. We show both theoretically and experimentally the reduction in energy needed when the devices can collaborate. The second approach focuses on the way to schedule mobile sensors, optimizing for both the accuracy and energy needs. We formulate the optimal sensing problem as a decision problem and propose a two-tier framework for approximating its solution. The first tier is responsible for segmenting the sensor measurement time series, by fitting various models. The second tier takes care of estimating the optimal sampling, selecting the measurements that contributes the most to the model accuracy. We provide near-optimal heuristics for both tiers and evaluate their performances using environmental sensor data. In the third approach we propose an online algorithm that identifies repeated patterns in time series and produces a compressed symbolic stream. The first symbolic transformation is based on clustering with the raw sensor data. Whereas the next iterations encode repetitive sequences of symbols into new symbols. We define also a metric to evaluate the symbolization methods with regard to their capacity at preserving the systems' states. We also show that the output of symbols can be used directly for various data mining tasks, such as classification or forecasting, without impacting much the accuracy, but greatly reducing the complexity and running time. In addition, we also present an example of application, assessing the user's exposure to air pollutants, which demonstrates the many opportunities to enhance contextual information when fusing sensor data from different sources. On one side we gather fine grained air quality information from mobile sensor deployments and aggregate them with an interpolation model. And, on the other side, we continuously capture the user's context, including location, activity and surrounding air quality. We also present the various models used for fusing all these information in order to produce the exposure estimation

    Detecting anomalous electrical appliance behavior based on motif transition likelihood matrices

    Get PDF
    With the rise of the Internet of Things, innumerable sensors and actuators are expected to find their way into homes, office spaces, and beyond. Electric power metering equipment, mostly present in the form of smart meters and smart plugs, can thus be anticipated to be installed widely. While smart plugs (i.e., individual power monitors attachable to wall outlets) primarily cater to the user comfort by enabling legacy devices to be controlled remotely, their power measurement capability also serves as an enabler for novel context-based services. We realize one such functionality in this paper, namely the recognition of unexpected behavior and appliance faults based on observed power consumption sequences. To this end, we present a system that extracts characteristic power consumption transitions and their temporal dependencies from previously collected measurements. When queried with a power data sequence collected from the appliance under consideration, it returns the likelihood that the collected power data indicate normal behavior of this appliance. We evaluate our system using real-world appliance-level consumption data. Our results confirm the individual nature of consumption patterns and show that the system can reliably detect errors introduced in the data within close temporal proximity to their time of occurrence

    Enhancing Computer Network Security through Improved Outlier Detection for Data Streams

    Get PDF
    V několika posledních letech se metody strojového učení (zvláště ty zabývající se detekcí odlehlých hodnot - OD) v oblasti kyberbezpečnosti opíraly o zjišťování anomálií síťového provozu spočívajících v nových schématech útoků. Detekce anomálií v počítačových sítích reálného světa se ale stala stále obtížnější kvůli trvalému nárůstu vysoce objemných, rychlých a dimenzionálních průběžně přicházejících dat (SD), pro která nejsou k dispozici obecně uznané a pravdivé informace o anomalitě. Účinná detekční schémata pro vestavěná síťová zařízení musejí být rychlá a paměťově nenáročná a musejí být schopna se potýkat se změnami konceptu, když se vyskytnou. Cílem této disertace je zlepšit bezpečnost počítačových sítí zesílenou detekcí odlehlých hodnot v datových proudech, obzvláště SD, a dosáhnout kyberodolnosti, která zahrnuje jak detekci a analýzu, tak reakci na bezpečnostní incidenty jako jsou např. nové zlovolné aktivity. Za tímto účelem jsou v práci navrženy čtyři hlavní příspěvky, jež byly publikovány nebo se nacházejí v recenzním řízení časopisů. Zaprvé, mezera ve volbě vlastností (FS) bez učitele pro zlepšování již hotových metod OD v datových tocích byla zaplněna navržením volby vlastností bez učitele pro detekci odlehlých průběžně přicházejících dat označované jako UFSSOD. Následně odvozujeme generický koncept, který ukazuje dva aplikační scénáře UFSSOD ve spojení s online algoritmy OD. Rozsáhlé experimenty ukázaly, že UFSSOD coby algoritmus schopný online zpracování vykazuje srovnatelné výsledky jako konkurenční metoda upravená pro OD. Zadruhé představujeme nový aplikační rámec nazvaný izolovaný les založený na počítání výkonu (PCB-iForest), jenž je obecně schopen využít jakoukoliv online OD metodu založenou na množinách dat tak, aby fungovala na SD. Do tohoto algoritmu integrujeme dvě varianty založené na klasickém izolovaném lese. Rozsáhlé experimenty provedené na 23 multidisciplinárních datových sadách týkajících se bezpečnostní problematiky reálného světa ukázaly, že PCB-iForest jasně překonává už zavedené konkurenční metody v 61 % případů a dokonce dosahuje ještě slibnějších výsledků co do vyváženosti mezi výpočetními náklady na klasifikaci a její úspěšností. Zatřetí zavádíme nový pracovní rámec nazvaný detekce odlehlých hodnot a rozpoznávání schémat útoku proudovým způsobem (SOAAPR), jenž je na rozdíl od současných metod schopen zpracovat výstup z různých online OD metod bez učitele proudovým způsobem, aby získal informace o nových schématech útoku. Ze seshlukované množiny korelovaných poplachů jsou metodou SOAAPR vypočítány tři různé soukromí zachovávající podpisy podobné otiskům prstů, které charakterizují a reprezentují potenciální scénáře útoku s ohledem na jejich komunikační vztahy, projevy ve vlastnostech dat a chování v čase. Evaluace na dvou oblíbených datových sadách odhalila, že SOAAPR může soupeřit s konkurenční offline metodou ve schopnosti korelace poplachů a významně ji překonává z hlediska výpočetního času . Navíc se všechny tři typy podpisů ve většině případů zdají spolehlivě charakterizovat scénáře útoků tím, že podobné seskupují k sobě. Začtvrté představujeme algoritmus nepárového kódu autentizace zpráv (Uncoupled MAC), který propojuje oblasti kryptografického zabezpečení a detekce vniknutí (IDS) pro síťovou bezpečnost. Zabezpečuje síťovou komunikaci (autenticitu a integritu) kryptografickým schématem s podporou druhé vrstvy kódy autentizace zpráv, ale také jako vedlejší efekt poskytuje funkcionalitu IDS tak, že vyvolává poplach na základě porušení hodnot nepárového MACu. Díky novému samoregulačnímu rozšíření algoritmus adaptuje svoje vzorkovací parametry na základě zjištění škodlivých aktivit. Evaluace ve virtuálním prostředí jasně ukazuje, že schopnost detekce se za běhu zvyšuje pro různé scénáře útoku. Ty zahrnují dokonce i situace, kdy se inteligentní útočníci snaží využít slabá místa vzorkování.ObhájenoOver the past couple of years, machine learning methods - especially the Outlier Detection (OD) ones - have become anchored to the cyber security field to detect network-based anomalies rooted in novel attack patterns. Due to the steady increase of high-volume, high-speed and high-dimensional Streaming Data (SD), for which ground truth information is not available, detecting anomalies in real-world computer networks has become a more and more challenging task. Efficient detection schemes applied to networked, embedded devices need to be fast and memory-constrained, and must be capable of dealing with concept drifts when they occur. The aim of this thesis is to enhance computer network security through improved OD for data streams, in particular SD, to achieve cyber resilience, which ranges from the detection, over the analysis of security-relevant incidents, e.g., novel malicious activity, to the reaction to them. Therefore, four major contributions are proposed, which have been published or are submitted journal articles. First, a research gap in unsupervised Feature Selection (FS) for the improvement of off-the-shell OD methods in data streams is filled by proposing Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD. A generic concept is retrieved that shows two application scenarios of UFSSOD in conjunction with online OD algorithms. Extensive experiments have shown that UFSSOD, as an online-capable algorithm, achieves comparable results with a competitor trimmed for OD. Second, a novel unsupervised online OD framework called Performance Counter-Based iForest (PCB-iForest) is being introduced, which generalized, is able to incorporate any ensemble-based online OD method to function on SD. Two variants based on classic iForest are integrated. Extensive experiments, performed on 23 different multi-disciplinary and security-related real-world data sets, revealed that PCB-iForest clearly outperformed state-of-the-art competitors in 61 % of cases and even achieved more promising results in terms of the tradeoff between classification and computational costs. Third, a framework called Streaming Outlier Analysis and Attack Pattern Recognition, denoted as SOAAPR is being introduced that, in contrast to the state-of-the-art, is able to process the output of various online unsupervised OD methods in a streaming fashion to extract information about novel attack patterns. Three different privacy-preserving, fingerprint-like signatures are computed from the clustered set of correlated alerts by SOAAPR, which characterize and represent the potential attack scenarios with respect to their communication relations, their manifestation in the data's features and their temporal behavior. The evaluation on two popular data sets shows that SOAAPR can compete with an offline competitor in terms of alert correlation and outperforms it significantly in terms of processing time. Moreover, in most cases all three types of signatures seem to reliably characterize attack scenarios to the effect that similar ones are grouped together. Fourth, an Uncoupled Message Authentication Code algorithm - Uncoupled MAC - is presented which builds a bridge between cryptographic protection and Intrusion Detection Systems (IDSs) for network security. It secures network communication (authenticity and integrity) through a cryptographic scheme with layer-2 support via uncoupled message authentication codes but, as a side effect, also provides IDS-functionality producing alarms based on the violation of Uncoupled MAC values. Through a novel self-regulation extension, the algorithm adapts its sampling parameters based on the detection of malicious actions on SD. The evaluation in a virtualized environment clearly shows that the detection rate increases over runtime for different attack scenarios. Those even cover scenarios in which intelligent attackers try to exploit the downsides of sampling
    corecore