954 research outputs found
Data Driven Nonparametric Detection
The major goal of signal detection is to distinguish between hypotheses about the state of events based on observations. Typically, signal detection can be categorized into centralized detection, where all observed data are available for making decision, and decentralized detection, where only quantized data from distributed sensors are forwarded to a fusion center for decision making. While these problems have been intensively studied under parametric and semi-parametric models with underlying distributions being fully or partially known, nonparametric scenarios are not well understood yet. This thesis mainly explores nonparametric models with unknown underlying distributions as well as semi-parametric models as an intermediate step to solve nonparametric problems.
One major topic of this thesis is on nonparametric decentralized detection, in which the joint distribution of the state of an event and sensor observations are not known, but only some training data are available. The kernel-based nonparametric approach has been proposed by Nguyen, Wainwright and Jordan where sensors\u27 quality is treated equally. We study heterogeneous sensor networks, and propose a weighted kernel so that weight parameters are utilized to selectively incorporate sensors\u27 information into the fusion center\u27s decision rule based on quality of sensors\u27 observations. Furthermore, weight parameters also serve as sensor selection parameters with nonzero parameters corresponding to sensors being selected. Sensor selection is jointly performed with decision rules of sensors and the fusion center with the resulting optimal decision rule having only a sparse number of nonzero weight parameters. A gradient projection algorithm and a Gauss-Seidel algorithm are developed to solve the risk minimization problem, which is non-convex, and both algorithms are shown to converge to critical points.
The other major topic of this thesis is composite outlier detection in centralized scenarios. The goal is to detect the existence of data streams drawn from outlying distributions among data streams drawn from a typical distribution. We study both the semi-parametric model with known typical distribution and unknown outlying distributions, and the nonparametric model with unknown typical and outlying distributions. For both models, we construct generalized likelihood ratio tests (GLRT), and show that with the knowledge of the KL divergence between the outlier and typical distributions, GLRT is exponentially consistent (i.e, the error risk function decays exponentially fast). We also show that with the knowledge of the Chernoff distance between the outlying and typical distributions, GLRT for semi-parametric model achieves the same risk decay exponent as the parametric model, and GLRT for nonparametric model achieves the same performance when the number of data streams gets asymptotically large. We further show that for both models without any knowledge about the distance between distributions, there does not exist an exponentially consistent test. However, GLRT with a diminishing threshold can still be consistent
Active Anomaly Detection in Heterogeneous Processes
An active inference problem of detecting anomalies among heterogeneous
processes is considered. At each time, a subset of processes can be probed. The
objective is to design a sequential probing strategy that dynamically
determines which processes to observe at each time and when to terminate the
search so that the expected detection time is minimized under a constraint on
the probability of misclassifying any process. This problem falls into the
general setting of sequential design of experiments pioneered by Chernoff in
1959, in which a randomized strategy, referred to as the Chernoff test, was
proposed and shown to be asymptotically optimal as the error probability
approaches zero. For the problem considered in this paper, a low-complexity
deterministic test is shown to enjoy the same asymptotic optimality while
offering significantly better performance in the finite regime and faster
convergence to the optimal rate function, especially when the number of
processes is large. The computational complexity of the proposed test is also
of a significantly lower order.Comment: This work has been accepted for publication on IEEE Transactions on
Information Theor
Nonparametric Anomaly Detection and Secure Communication
Two major security challenges in information systems are detection of anomalous data patterns that reflect malicious intrusions into data storage systems and protection of data from malicious eavesdropping during data transmissions. The first problem typically involves design of statistical tests to identify data variations, and the second problem generally involves design of communication schemes to transmit data securely in the presence of malicious eavesdroppers. The main theme of this thesis is to exploit information theoretic and statistical tools to address the above two security issues in order to provide information theoretically provable security, i.e., anomaly detection with vanishing probability of error and guaranteed secure communication with vanishing leakage rate at eavesdroppers.
First, the anomaly detection problem is investigated, in which typical and anomalous patterns (i.e., distributions that generate data) are unknown \emph{a priori}. Two types of problems are investigated. The first problem considers detection of the existence of anomalous geometric structures over networks, and the second problem considers the detection of a set of anomalous data streams out of a large number of data streams. In both problems, anomalous data are assumed to be generated by a distribution , which is different from a distribution generating typical samples. For both problems, kernel-based tests are proposed, which are based on maximum mean discrepancy (MMD) that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. These tests are nonparametric without exploiting the information about and and are universally applicable to arbitrary and . Furthermore, these tests are shown to be statistically consistent under certain conditions on the parameters of the problems. These conditions are further shown to be necessary or nearly necessary, which implies that the MMD-based tests are order level optimal or nearly order level optimal. Numerical results are provided to demonstrate the performance of the proposed tests.
The secure communication problem is then investigated, for which the focus is on degraded broadcast channels. In such channels, one transmitter sends messages to multiple receivers, the channel quality of which can be ordered. Two specific models are studied. In the first model, layered decoding and layered secrecy are required, i.e., each receiver decodes one more message than the receiver with one level worse channel quality, and this message should be kept secure from all receivers with worse channel qualities. In the second model, secrecy only outside a bounded range is required, i.e., each message is required to be kept secure from the receiver with two-level worse channel quality. Communication schemes for both models are designed and the corresponding achievable rate regions (i.e., inner bounds on the capacity region) are characterized. Furthermore, outer bounds on the capacity region are developed, which match the inner bounds, and hence the secrecy capacity regions are established for both models
A Survey on Explainable Anomaly Detection
In the past two decades, most research on anomaly detection has focused on
improving the accuracy of the detection, while largely ignoring the
explainability of the corresponding methods and thus leaving the explanation of
outcomes to practitioners. As anomaly detection algorithms are increasingly
used in safety-critical domains, providing explanations for the high-stakes
decisions made in those domains has become an ethical and regulatory
requirement. Therefore, this work provides a comprehensive and structured
survey on state-of-the-art explainable anomaly detection techniques. We propose
a taxonomy based on the main aspects that characterize each explainable anomaly
detection technique, aiming to help practitioners and researchers find the
explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from
Data (TKDD) for publication (preprint version
- …