1,122 research outputs found

    A Power-Enhanced Algorithm for Spatial Anomaly Detection in Binary Labelled Point Data Using the Spatial Scan Statistic [postprint]

    Get PDF
    This paper presents a novel modification to an existing algorithm for spatial anomaly detection in binary labeled point data sets, using the Bernoulli version of the Spatial Scan Statistic. We identify a potential ambiguity in p-values produced by Monte Carlo testing, which (by the selection of the most conservative p-value) can lead to sub-optimal power. When such ambiguity occurs, the modification uses a very inexpensive secondary test to suggest a less conservative p-value. Using benchmark tests, we show that this appears to restore power to the expected level, whilst having similarly retest variance to the original. The modification also appears to produce a small but significant improvement in overall detection performance when multiple anomalies are present

    A pilot inference study for a beta-Bernoulli spatial scan statistic

    Get PDF
    The Bernoulli spatial scan statistic is used to detect localised clusters in binary labelled point data, such as that used in spatial or spatio-temporal case/control studies. We test the inferential capability of a recently developed beta-Bernoulli spatial scan statistic, which adds a beta prior to the original statistic. This pilot study, which includes two test scenarios with 6,000 data sets each, suggests a marked increase in power for a given false alert rate. We suggest a more extensive study would be worthwhile to corroborate the findings. We also speculate on an explanation for the observed improvement

    Large-scale inference in the focally damaged human brain

    Get PDF
    Clinical outcomes in focal brain injury reflect the interactions between two distinct anatomically distributed patterns: the functional organisation of the brain and the structural distribution of injury. The challenge of understanding the functional architecture of the brain is familiar; that of understanding the lesion architecture is barely acknowledged. Yet, models of the functional consequences of focal injury are critically dependent on our knowledge of both. The studies described in this thesis seek to show how machine learning-enabled high-dimensional multivariate analysis powered by large-scale data can enhance our ability to model the relation between focal brain injury and clinical outcomes across an array of modelling applications. All studies are conducted on internationally the largest available set of MR imaging data of focal brain injury in the context of acute stroke (N=1333) and employ kernel machines at the principal modelling architecture. First, I examine lesion-deficit prediction, quantifying the ceiling on achievable predictive fidelity for high-dimensional and low-dimensional models, demonstrating the former to be substantially higher than the latter. Second, I determine the marginal value of adding unlabelled imaging data to predictive models within a semi-supervised framework, quantifying the benefit of assembling unlabelled collections of clinical imaging. Third, I compare high- and low-dimensional approaches to modelling response to therapy in two contexts: quantifying the effect of treatment at the population level (therapeutic inference) and predicting the optimal treatment in an individual patient (prescriptive inference). I demonstrate the superiority of the high-dimensional approach in both settings

    What happens where during disasters? A Workflow for the multifaceted characterization of crisis events based on Twitter data

    Get PDF
    Twitter data are a valuable source of information for rescue and helping activities in case of natural disasters and technical accidents. Several methods for disaster- and event-related tweet filtering and classification are available to analyse social media streams. Rather than processing single tweets, taking into account space and time is likely to reveal even more insights regarding local event dynamics and impacts on population and environment. This study focuses on the design and evaluation of a generic workflow for Twitter data analysis that leverages that additional information to characterize crisis events more comprehensively. The workflow covers data acquisition, analysis and visualization, and aims at the provision of a multifaceted and detailed picture of events that happen in affected areas. This is approached by utilizing agile and flexible analysis methods providing different and complementary views on the data. Utilizing state‐of‐the‐art deep learning and clustering methods, we are interested in the question, whether our workflow is suitable to reconstruct and picture the course of events during major natural disasters from Twitter data. Experimental results obtained with a data set acquired during hurricane Florence in September 2018 demonstrate the effectiveness of the applied methods but also indicate further interesting research questions and directions

    Sparse representation based hyperspectral image compression and classification

    Get PDF
    Abstract This thesis presents a research work on applying sparse representation to lossy hyperspectral image compression and hyperspectral image classification. The proposed lossy hyperspectral image compression framework introduces two types of dictionaries distinguished by the terms sparse representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively. The former is learnt in the spectral domain to exploit the spectral correlations, and the latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in hyperspectral images. To alleviate the computational demand of dictionary learning, either a base dictionary trained offline or an update of the base dictionary is employed in the compression framework. The proposed compression method is evaluated in terms of different objective metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of both SRSD and MSSD approaches. For the proposed hyperspectral image classification method, we utilize the sparse coefficients for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular, the discriminative character of the sparse coefficients is enhanced by incorporating contextual information using local mean filters. The classification performance is evaluated and compared to a number of similar or representative methods. The results show that our approach could outperform other approaches based on SVM or sparse representation. This thesis makes the following contributions. It provides a relatively thorough investigation of applying sparse representation to lossy hyperspectral image compression. Specifically, it reveals the effectiveness of sparse representation for the exploitation of spectral correlations in hyperspectral images. In addition, we have shown that the discriminative character of sparse coefficients can lead to superior performance in hyperspectral image classification.EM201

    Exploring the application of ultrasonic phased arrays for industrial process analysis

    Get PDF
    This thesis was previously held under moratorium from 25/11/19 to 25/11/21Typical industrial process analysis techniques require an optical path to exist between the measurement sensor and the process to acquire data used to optimise and control an industrial process. Ultrasonic sensing is a well-established method to measure into optically opaque structures and highly focussed images can be generated using multiple element transducer arrays. In this Thesis, such arrays are explored as a real-time imaging tool for industrial process analysis. A novel methodology is proposed to characterise the variation between consecutive ultrasonic data sets deriving from the ultrasonic hardware. The pulse-echo response corresponding to a planar back wall acoustic interface is used to infer the bandwidth, pulse length and sensitivity of each array element. This led to the development of a calibration methodology to enhance the accuracy of experimentally generated ultrasonic images. An algorithm enabling non-invasive through-steel imaging of an industrial process is demonstrated using a simulated data set. Using principal component analysis, signals corresponding to reverberations in the steel vessel wall are identified and deselected from the ultrasonic data set prior to image construction. This facilitates the quantification of process information from the image. An image processing and object tracking algorithm are presented to quantify the bubble size distribution (BSD) and bubble velocity from ultrasonic images. When tested under controlled dynamic conditions, the mean value of the BSD was predicted within 50% at 100 mms-1 and the velocity could be predicted within 30% at 100 mms-1. However, these algorithms were sensitive to the quality of the input image to represent the true bubble shape. The consolidation of these techniques demonstrates successful application of ultrasonic phased array imaging, both invasively and noninvasively, to a dynamic process stream. Key to industrial uptake of the technology are data throughput and processing, which currently limit its applicability to real-time process analysis, and low sensitivity for some non-invasive applications.Typical industrial process analysis techniques require an optical path to exist between the measurement sensor and the process to acquire data used to optimise and control an industrial process. Ultrasonic sensing is a well-established method to measure into optically opaque structures and highly focussed images can be generated using multiple element transducer arrays. In this Thesis, such arrays are explored as a real-time imaging tool for industrial process analysis. A novel methodology is proposed to characterise the variation between consecutive ultrasonic data sets deriving from the ultrasonic hardware. The pulse-echo response corresponding to a planar back wall acoustic interface is used to infer the bandwidth, pulse length and sensitivity of each array element. This led to the development of a calibration methodology to enhance the accuracy of experimentally generated ultrasonic images. An algorithm enabling non-invasive through-steel imaging of an industrial process is demonstrated using a simulated data set. Using principal component analysis, signals corresponding to reverberations in the steel vessel wall are identified and deselected from the ultrasonic data set prior to image construction. This facilitates the quantification of process information from the image. An image processing and object tracking algorithm are presented to quantify the bubble size distribution (BSD) and bubble velocity from ultrasonic images. When tested under controlled dynamic conditions, the mean value of the BSD was predicted within 50% at 100 mms-1 and the velocity could be predicted within 30% at 100 mms-1. However, these algorithms were sensitive to the quality of the input image to represent the true bubble shape. The consolidation of these techniques demonstrates successful application of ultrasonic phased array imaging, both invasively and noninvasively, to a dynamic process stream. Key to industrial uptake of the technology are data throughput and processing, which currently limit its applicability to real-time process analysis, and low sensitivity for some non-invasive applications

    Contextual Anomaly Detection Framework for Big Sensor Data

    Get PDF
    Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the context of the data. As data becomes more complex it is increasingly important to bias anomaly detection techniques for the context, whether it is spatial, temporal, or semantic. The work proposed in this thesis outlines a contextual anomaly detection framework for use in Big sensor Data systems. The framework uses a well-defined content anomaly detection algorithm for real-time point anomaly detection. Additionally, we present a post-processing context-aware anomaly detection algorithm based on sensor profiles, which are groups of contextually similar sensors generated by a multivariate clustering algorithm. The contextual anomaly detection framework is evaluated with respect to two different Big sensor Data data sets; one for electrical sensors, and another for temperature sensors within a building
    corecore