267 research outputs found

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Virtual reality for training and fitness assessments for construction safety

    Get PDF
    Reducing accident rate is a primary goal of construction safety. In this paper, we present a large scale study of using virtual reality technology for safety training. Beyond the training, a technology framework is proposed to assess the fitness of construction workers (e.g. suitability of people with underlining health conditions to work under particular construction environments). The new virtual construction system consists of a Brain-Computer Interface (BCI) of electroencephalography (EEG) neural network to capture EEG signals of users during the virtual simulation training continuously to achieve user profiling. For real-time assessment of the accident susceptibility of a worker under various construction environments, a deep learning neural network is trained to process the EEG crops and a clipping training algorithm that classifies small segments of the EEG dataset is used to improve the computational performance of the system. Physiology data of the person during the training, i.e. blood pressure and heart rate, is also recorded. Based on the EEG data and the physiology data, a statistic model is used in the safety assessment framework to set up the risk standard. The study has tested 117 workers who were employed by the construction sites in Shanghai. People who were tested in the risk group were further underwent medical examinations for risk related medical conditions that deemed unsuitable for working in construction sites. Results show six of the nine workers identified by the VR system have been medically confirmed unsuitable, thus, over 80% accuracy of our virtual reality training and assessment system. Our proposed system can be used as a tool for understanding risk conditions of workers and safety trainin

    Multi-Level Data-Driven Battery Management: From Internal Sensing to Big Data Utilization

    Get PDF
    Battery management system (BMS) is essential for the safety and longevity of lithium-ion battery (LIB) utilization. With the rapid development of new sensing techniques, artificial intelligence and the availability of huge amounts of battery operational data, data-driven battery management has attracted ever-widening attention as a promising solution. This review article overviews the recent progress and future trend of data-driven battery management from a multi-level perspective. The widely-explored data-driven methods relying on routine measurements of current, voltage, and surface temperature are reviewed first. Within a deeper understanding and at the microscopic level, emerging management strategies with multi-dimensional battery data assisted by new sensing techniques have been reviewed. Enabled by the fast growth of big data technologies and platforms, the efficient use of battery big data for enhanced battery management is further overviewed. This belongs to the upper and the macroscopic level of the data-driven BMS framework. With this endeavor, we aim to motivate new insights into the future development of next-generation data-driven battery management

    Anomalous behaviour detection using heterogeneous data

    Get PDF
    Anomaly detection is one of the most important methods to process and find abnormal data, as this method can distinguish between normal and abnormal behaviour. Anomaly detection has been applied in many areas such as the medical sector, fraud detection in finance, fault detection in machines, intrusion detection in networks, surveillance systems for security, as well as forensic investigations. Abnormal behaviour can give information or answer questions when an investigator is performing an investigation. Anomaly detection is one way to simplify big data by focusing on data that have been grouped or clustered by the anomaly detection method. Forensic data usually consists of heterogeneous data which have several data forms or types such as qualitative or quantitative, structured or unstructured, and primary or secondary. For example, when a crime takes place, the evidence can be in the form of various types of data. The combination of all the data types can produce rich information insights. Nowadays, data has become ‘big’ because it is generated every second of every day and processing has become time-consuming and tedious. Therefore, in this study, a new method to detect abnormal behaviour is proposed using heterogeneous data and combining the data using data fusion technique. Vast challenge data and image data are applied to demonstrate the heterogeneous data. The first contribution in this study is applying the heterogeneous data to detect an anomaly. The recently introduced anomaly detection technique which is known as Empirical Data Analytics (EDA) is applied to detect the abnormal behaviour based on the data sets. Standardised eccentricity (a newly introduced within EDA measure offering a new simplified form of the well-known Chebyshev Inequality) can be applied to any data distribution. Then, the second contribution is applying image data. The image data is processed using pre-trained deep learning network, and classification is done using a support vector machine (SVM). After that, the last contribution is combining anomaly result from heterogeneous data and image recognition using new data fusion technique. There are five types of data with three different modalities and different dimensionalities. The data cannot be simply combined and integrated. Therefore, the new data fusion technique first analyses the abnormality in each data type separately and determines the degree of suspicious between 0 and 1 and sums up all the degrees of suspicion data afterwards. This method is not intended to be a fully automatic system that resolves investigations, which would likely be unacceptable in any case. The aim is rather to simplify the role of the humans so that they can focus on a small number of cases to be looked in more detail. The proposed approach does simplify the processing of such huge amounts of data. Later, this method can assist human experts in their investigations and making final decisions

    Utilization Of A Large-Scale Wireless Sensor Network For Intrusion Detection And Border Surveillance

    Get PDF
    To control the border more effectively, countries may deploy a detection system that enables real-time surveillance of border integrity. Events such as border crossings need to be monitored in real time so that any border entries can be noted by border security forces and destinations marked for apprehension. Wireless Sensor Networks (WSNs) are promising for border security surveillance because they enable enforcement teams to monitor events in the physical environment. In this work, probabilistic models have been presented to investigate senor development schemes while considering the environmental factors that affect the sensor performance. Simulation studies have been carried out using the OPNET to verify the theoretical analysis and to find an optimal node deployment scheme that is robust and efficient by incorporating geographical coordination in the design. Measures such as adding camera and range-extended antenna to each node have been investigated to improve the system performance. A prototype WSN based surveillance system has been developed to verify the proposed approach

    Spatiotemporal anomaly detection: streaming architecture and algorithms

    Get PDF
    Includes bibliographical references.2020 Summer.Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoft™. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitter™) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases

    Outlier Detection for Shape Model Fitting

    Get PDF
    Medical image analysis applications often benefit from having a statistical shape model in the background. Statistical shape models are generative models which can generate shapes from the same family and assign a likelihood to the generated shape. In an Analysis-by-synthesis approach to medical image analysis, the target shape to be segmented, registered or completed must first be reconstructed by the statistical shape model. Shape models accomplish this by either acting as regression models, used to obtain the reconstruction, or as regularizers, used to limit the space of possible reconstructions. However, the accuracy of these models is not guaranteed for targets that lie out of the modeled distribution of the statistical shape model. Targets with pathologies are an example of out-of-distribution data. The target shape to be reconstructed has deformations caused by pathologies that do not exist on the healthy data used to build the model. Added and missing regions may lead to false correspondences, which act as outliers and influence the reconstruction result. Robust fitting is necessary to decrease the influence of outliers on the fitting solution, but often comes at the cost of decreased accuracy in the inlier region. Robust techniques often presuppose knowledge of outlier characteristics to build a robust cost function or knowledge of the correct regressed function to filter the outliers. This thesis proposes strategies to obtain the outliers and reconstruction simultaneously without previous knowledge about either. The assumptions are that a statistical shape model that represents the healthy variations of the target organ is available, and that some landmarks on the model reference that annotate locations with correspondence to the target exist. The first strategy uses an EM-like algorithm to obtain the sampling posterior. This is a global reconstruction approach that requires classical noise assumptions on the outlier distribution. The second strategy uses Bayesian optimization to infer the closed-form predictive posterior distribution and estimate a label map of the outliers. The underlying regression model is a Gaussian Process Morphable Model (GPMM). To make the reconstruction obtained through Bayesian optimization robust, a novel acquisition function is proposed. The acquisition function uses the posterior and predictive posterior distributions to avoid choosing outliers as next query points. The algorithms give as outputs a label map and a a posterior distribution that can be used to choose the most likely reconstruction. To obtain the label map, the first strategy uses Bayesian classification to separate inliers and outliers, while the second strategy annotates all query points as inliers and unused model vertices as outliers. The proposed solutions are compared to the literature, evaluated through their sensitivity and breakdown points, and tested on publicly available datasets and in-house clinical examples. The thesis contributes to shape model fitting to pathological targets by showing that: - performing accurate inlier reconstruction and outlier detection is possible without case-specific manual thresholds or input label maps, through the use of outlier detection. - outlier detection makes the algorithms agnostic to pathology type i.e. the algorithms are suitable for both sparse and grouped outliers which appear as holes and bumps, the severity of which influences the results. - using the GPMM-based sequential Bayesian optimization approach, the closed-form predictive posterior distribution can be obtained despite the presence of outliers, because the Gaussian noise assumption is valid for the query points. - using sequential Bayesian optimization instead of traditional optimization for shape model fitting brings forth several advantages that had not been previously explored. Fitting can be driven by different reconstruction goals such as speed, location-dependent accuracy, or robustness. - defining pathologies as outliers opens the door for general pathology segmentation solutions for medical data. Segmentation algorithms do not need to be dependent on imaging modality, target pathology type, or training datasets for pathology labeling. The thesis highlights the importance of outlier-based definitions of pathologies in medical data that are independent of pathology type and imaging modality. Developing such standards would not only simplify the comparison of different pathology segmentation algorithms on unlabeled datsets, but also push forward standard algorithms that are able to deal with general pathologies instead of data-driven definitions of pathologies. This comes with theoretical as well as clinical advantages. Practical applications are shown on shape reconstruction and labeling tasks. Publicly-available challenge datasets are used, one for cranium implant reconstruction, one for kidney tumor detection, and one for liver shape reconstruction. Further clinical applications are shown on in-house examples of a femur and mandible with artifacts and missing parts. The results focus on shape modeling but can be extended in future work to include intensity information and inner volume pathologies

    Context Awareness for Navigation Applications

    Get PDF
    This thesis examines the topic of context awareness for navigation applications and asks the question, “What are the benefits and constraints of introducing context awareness in navigation?” Context awareness can be defined as a computer’s ability to understand the situation or context in which it is operating. In particular, we are interested in how context awareness can be used to understand the navigation needs of people using mobile computers, such as smartphones, but context awareness can also benefit other types of navigation users, such as maritime navigators. There are countless other potential applications of context awareness, but this thesis focuses on applications related to navigation. For example, if a smartphone-based navigation system can understand when a user is walking, driving a car, or riding a train, then it can adapt its navigation algorithms to improve positioning performance. We argue that the primary set of tools available for generating context awareness is machine learning. Machine learning is, in fact, a collection of many different algorithms and techniques for developing “computer systems that automatically improve their performance through experience” [1]. This thesis examines systematically the ability of existing algorithms from machine learning to endow computing systems with context awareness. Specifically, we apply machine learning techniques to tackle three different tasks related to context awareness and having applications in the field of navigation: (1) to recognize the activity of a smartphone user in an indoor office environment, (2) to recognize the mode of motion that a smartphone user is undergoing outdoors, and (3) to determine the optimal path of a ship traveling through ice-covered waters. The diversity of these tasks was chosen intentionally to demonstrate the breadth of problems encompassed by the topic of context awareness. During the course of studying context awareness, we adopted two conceptual “frameworks,” which we find useful for the purpose of solidifying the abstract concepts of context and context awareness. The first such framework is based strongly on the writings of a rhetorician from Hellenistic Greece, Hermagoras of Temnos, who defined seven elements of “circumstance”. We adopt these seven elements to describe contextual information. The second framework, which we dub the “context pyramid” describes the processing of raw sensor data into contextual information in terms of six different levels. At the top of the pyramid is “rich context”, where the information is expressed in prose, and the goal for the computer is to mimic the way that a human would describe a situation. We are still a long way off from computers being able to match a human’s ability to understand and describe context, but this thesis improves the state-of-the-art in context awareness for navigation applications. For some particular tasks, machine learning has succeeded in outperforming humans, and in the future there are likely to be tasks in navigation where computers outperform humans. One example might be the route optimization task described above. This is an example of a task where many different types of information must be fused in non-obvious ways, and it may be that computer algorithms can find better routes through ice-covered waters than even well-trained human navigators. This thesis provides only preliminary evidence of this possibility, and future work is needed to further develop the techniques outlined here. The same can be said of the other two navigation-related tasks examined in this thesis
    • …
    corecore