86 research outputs found

    Testing (subclasses of) halfspaces

    Get PDF
    We address the problem of testing whether a Boolean-valued function f is a halfspace, i.e. a function of the form f(x) = sgn(w . x − θ). We consider halfspaces over the continuous domain R n (endowed with the standard multivariate Gaussian distribution) as well as halfspaces over the Boolean cube { − 1,1} n (endowed with the uniform distribution). In both cases we give an algorithm that distinguishes halfspaces from functions that are ε-far from any halfspace using only poly(1) queries, independent of the dimension n. In contrast to the case of general halfspaces, we show that testing natural subclasses of halfspaces can be markedly harder; for the class of { − 1,1}-weight halfspaces, we show that a tester must make at least Ω(logn) queries. We complement this lower bound with an upper bound showing that O(√n) queries suffice.National Basic Research Program of China (grant 2007CB807900)National Basic Research Program of China (grant 2007CB807901)National Natural Science Foundation (China) (grant 60553001

    Testing +/- 1-Weight Halfspaces

    Get PDF
    We consider the problem of testing whether a Boolean function f:{ − 1,1} [superscript n] →{ − 1,1} is a ±1-weight halfspace, i.e. a function of the form f(x) = sgn(w [subscript 1] x [subscript 1] + w [subscript 2] x [subscript 2 ]+ ⋯ + w [subscript n] x [subscript n] ) where the weights w i take values in { − 1,1}. We show that the complexity of this problem is markedly different from the problem of testing whether f is a general halfspace with arbitrary weights. While the latter can be done with a number of queries that is independent of n [7], to distinguish whether f is a ±-weight halfspace versus ε-far from all such halfspaces we prove that nonadaptive algorithms must make Ω(logn) queries. We complement this lower bound with a sublinear upper bound showing that O(nO(\sqrt{n}\cdot poly(1ϵ))(\frac{1}{\epsilon})) queries suffice

    A Comparison of Generalizability for Anomaly Detection

    Get PDF
    In security-related areas there is concern over the novel “zeroday” attack that penetrates system defenses and wreaks havoc. The best methods for countering these threats are recognizing “non-self” as in an Artificial Immune System or recognizing “self” through clustering. For either case, the concern remains that something that looks similar to self could be missed. Given this situation one could logically assume that a tighter fit to self rather than generalizability is important for false positive reduction in this type of learning problem. This article shows that a tight fit, although important, does not supersede having some model generality. This is shown using three systems. The first two use sphere and ellipsoid clusters with a k-means algorithm modified to work on the one-class/blind classification problem. The third is based on wrapping the self points with a multidimensional convex hull (polytope) algorithm capable of learning disjunctive concepts via a thresholding constant. All three of these algorithms are tested on an intrusion detection problem and a steganalysis problem with results exceeding published results using an Artificial Immune System

    Depth functions as measures of representativeness

    Get PDF
    Data depth provides a natural means to rank multivariate vectors with respect to an underlying multivariate distribution. Most existing depth functions emphasize a centre-outward ordering of data points, which may not provide a useful geometric representation of certain distributional features, such as multimodality, of concern to some statistical applications. Such inadequacy motivates us to develop a device for ranking data points according to their “representativeness” rather than “centrality” with respect to an underlying distribution of interest. Derived essentially from a choice of goodness-of-fit test statistic, our device calls for a new interpretation of “depth” more akin to the concept of density than location. It copes particularly well with multivariate data exhibiting multimodality. In addition to providing depth values for individual data points, depth functions derived from goodness-of-fit tests also extend naturally to provide depth values for subsets of data points, a concept new to the data-depth literature.postprin
    corecore