7 research outputs found

    Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search

    Full text link
    Vector quantization (VQ) techniques are widely used in similarity search for data compression, fast metric computation and etc. Originally designed for Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or implicitly minimize the quantization error. In this paper, we present a new angle to analyze the quantization error, which decomposes the quantization error into norm error and direction error. We show that quantization errors in norm have much higher influence on inner products than quantization errors in direction, and small quantization error does not necessarily lead to good performance in maximum inner product search (MIPS). Based on this observation, we propose norm-explicit quantization (NEQ) --- a general paradigm that improves existing VQ techniques for MIPS. NEQ quantizes the norms of items in a dataset explicitly to reduce errors in norm, which is crucial for MIPS. For the direction vectors, NEQ can simply reuse an existing VQ technique to quantize them without modification. We conducted extensive experiments on a variety of datasets and parameter configurations. The experimental results show that NEQ improves the performance of various VQ techniques for MIPS, including PQ, OPQ, RQ and AQ

    Multi-Resolution Hashing for Fast Pairwise Summations

    Full text link
    A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector yy (query) that is unknown a priori. Given a set of points XβŠ‚RdX\subset \mathbb{R}^{d} and a pairwise function w:RdΓ—Rdβ†’[0,1]w:\mathbb{R}^{d}\times \mathbb{R}^{d}\to [0,1], we study the problem of designing a data-structure that enables sublinear-time approximation of the summation Zw(y)=1∣Xβˆ£βˆ‘x∈Xw(x,y)Z_{w}(y)=\frac{1}{|X|}\sum_{x\in X}w(x,y) for any query y∈Rdy\in \mathbb{R}^{d}. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is a collection of Tβ‰₯1T\geq 1 hashing schemes with collision probabilities p1,…,pTp_{1},\ldots, p_{T} such that sup⁑t∈[T]{pt(x,y)}=Θ(w(x,y))\sup_{t\in [T]}\{p_{t}(x,y)\} = \Theta(\sqrt{w(x,y)}). This leads to a data-structure that approximates Zw(y)Z_{w}(y) using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for any log-convex function w(x,y)=eΟ•(⟨x,y⟩)w(x,y)=e^{\phi(\langle x,y\rangle)} of the inner product on the unit sphere x,y∈Sdβˆ’1x,y\in \mathcal{S}^{d-1}. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to Rd\mathbb{R}^{d} and from scalar functions to vector functions.Comment: 39 pages, 3 figure

    Artificial intelligence methods for security and cyber security systems

    Get PDF
    This research is in threat analysis and countermeasures employing Artificial Intelligence (AI) methods within the civilian domain, where safety and mission-critical aspects are essential. AI has challenges of repeatable determinism and decision explanation. This research proposed methods for dense and convolutional networks that provided repeatable determinism. In dense networks, the proposed alternative method had an equal performance with more structured learnt weights. The proposed method also had earlier learning and higher accuracy in the Convolutional networks. When demonstrated in colour image classification, the accuracy improved in the first epoch to 67%, from 29% in the existing scheme. Examined in transferred learning with the Fast Sign Gradient Method (FSGM) as an analytical method to control distortion of dissimilarity, a finding was that the proposed method had more significant retention of the learnt model, with 31% accuracy instead of 9%. The research also proposed a threat analysis method with set-mappings and first principle analytical steps applied to a Symbolic AI method using an algebraic expert system with virtualized neurons. The neural expert system method demonstrated the infilling of parameters by calculating beamwidths with variations in the uncertainty of the antenna type. When combined with a proposed formula extraction method, it provides the potential for machine learning of new rules as a Neuro-Symbolic AI method. The proposed method uses extra weights allocated to neuron input value ranges as activation strengths. The method simplifies the learnt representation reducing model depth, thus with less significant dropout potential. Finally, an image classification method for emitter identification is proposed with a synthetic dataset generation method and shows the accurate identification between fourteen radar emission modes with high ambiguity between them (and achieved 99.8% accuracy). That method would be a mechanism to recognize non-threat civil radars aimed at threat alert when deviations from those civilian emitters are detected
    corecore