2,992 research outputs found
An Improved Composite Hypothesis Test for Markov Models with Applications in Network Anomaly Detection
Recent work has proposed the use of a composite hypothesis Hoeffding test for
statistical anomaly detection. Setting an appropriate threshold for the test
given a desired false alarm probability involves approximating the false alarm
probability. To that end, a large deviations asymptotic is typically used
which, however, often results in an inaccurate setting of the threshold,
especially for relatively small sample sizes. This, in turn, results in an
anomaly detection test that does not control well for false alarms. In this
paper, we develop a tighter approximation using the Central Limit Theorem (CLT)
under Markovian assumptions. We apply our result to a network anomaly detection
application and demonstrate its advantages over earlier work.Comment: 6 pages, 6 figures; final version for CDC 201
Universal and Composite Hypothesis Testing via Mismatched Divergence
For the universal hypothesis testing problem, where the goal is to decide
between the known null hypothesis distribution and some other unknown
distribution, Hoeffding proposed a universal test in the nineteen sixties.
Hoeffding's universal test statistic can be written in terms of
Kullback-Leibler (K-L) divergence between the empirical distribution of the
observations and the null hypothesis distribution. In this paper a modification
of Hoeffding's test is considered based on a relaxation of the K-L divergence
test statistic, referred to as the mismatched divergence. The resulting
mismatched test is shown to be a generalized likelihood-ratio test (GLRT) for
the case where the alternate distribution lies in a parametric family of the
distributions characterized by a finite dimensional parameter, i.e., it is a
solution to the corresponding composite hypothesis testing problem. For certain
choices of the alternate distribution, it is shown that both the Hoeffding test
and the mismatched test have the same asymptotic performance in terms of error
exponents. A consequence of this result is that the GLRT is optimal in
differentiating a particular distribution from others in an exponential family.
It is also shown that the mismatched test has a significant advantage over the
Hoeffding test in terms of finite sample size performance. This advantage is
due to the difference in the asymptotic variances of the two test statistics
under the null hypothesis. In particular, the variance of the K-L divergence
grows linearly with the alphabet size, making the test impractical for
applications involving large alphabet distributions. The variance of the
mismatched divergence on the other hand grows linearly with the dimension of
the parameter space, and can hence be controlled through a prudent choice of
the function class defining the mismatched divergence.Comment: Accepted to IEEE Transactions on Information Theory, July 201
Detection and optimization problems with applications in smart cities
This dissertation proposes solutions to a selected set of detection and optimization problems, whose applications are focused on transportation systems. The goal is to help build smarter and more efficient transportation systems, hence smarter cities.
Problems with dynamics evolving in two different time-scales are considered:
(1) In a fast time-scale, the dissertation considers the problem of detection, especially statistical anomaly detection in real-time. From a theoretical perspective and under Markovian assumptions, novel threshold estimators are derived for the widely used Hoeffding test. This results in a test with a much better ability to control false alarms while maintaining a high detection rate. From a practical perspective, the improved test is applied to detecting non-typical traffic jams in the Boston road network using real traffic data reported by the Waze smartphone navigation application. The detection results can alert the drivers to reroute so as to avoid the corresponding areas and provide the most urgent "targets" to the Transportation department and/or emergency services to intervene and remedy the underlying cause resulting in these jams, thus, improving transportation systems and contributing to the smart city agenda.
(2) In a slower time-scale, the dissertation investigates a host of optimization problems, including estimation and adjustment of Origin-Destination (OD) demand, traffic assignment, recovery of travel cost functions, and joint recovery of travel cost functions and OD demand (joint problem). Integrating these problems leads to a data-driven predictive model which serves to diagnose/control/optimize the transportation network. To ensure good accuracy of the predictive model and increase its robustness and consistency, several novel formulations for the travel cost function recovery problem and the joint problem are proposed. A data-driven framework is proposed to evaluate the Price-of-Anarchy (PoA; a metric assessing the degree of congestion under selfish user-centric routing vs. socially-optimal system-centric routing). For the case where the PoA is larger than expected, three viable strategies are proposed to reduce it. To demonstrate the effectiveness and efficiency of the proposed approaches, case-studies are conducted on three benchmark transportation networks using synthetic data and an actual road network (from Eastern Massachusetts (EMA)) using real traffic data. Moreover, to facilitate research in the transportation community, the largest highway subnetwork of EMA has been released as a new benchmark network
- …