7 research outputs found
Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search
Vector quantization (VQ) techniques are widely used in similarity search for
data compression, fast metric computation and etc. Originally designed for
Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or
implicitly minimize the quantization error. In this paper, we present a new
angle to analyze the quantization error, which decomposes the quantization
error into norm error and direction error. We show that quantization errors in
norm have much higher influence on inner products than quantization errors in
direction, and small quantization error does not necessarily lead to good
performance in maximum inner product search (MIPS). Based on this observation,
we propose norm-explicit quantization (NEQ) --- a general paradigm that
improves existing VQ techniques for MIPS. NEQ quantizes the norms of items in a
dataset explicitly to reduce errors in norm, which is crucial for MIPS. For the
direction vectors, NEQ can simply reuse an existing VQ technique to quantize
them without modification. We conducted extensive experiments on a variety of
datasets and parameter configurations. The experimental results show that NEQ
improves the performance of various VQ techniques for MIPS, including PQ, OPQ,
RQ and AQ
Multi-Resolution Hashing for Fast Pairwise Summations
A basic computational primitive in the analysis of massive datasets is
summing simple functions over a large number of objects. Modern applications
pose an additional challenge in that such functions often depend on a parameter
vector (query) that is unknown a priori. Given a set of points and a pairwise function , we study the problem of designing a data-structure
that enables sublinear-time approximation of the summation
for any query . By combining ideas from Harmonic Analysis (partitions of unity
and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis
FOCS'17], we provide a general framework for designing such data structures
through hashing that reaches far beyond what previous techniques allowed.
A key design principle is a collection of hashing schemes with
collision probabilities such that . This leads to a data-structure
that approximates using a sub-linear number of samples from each
hash family. Using this new framework along with Distance Sensitive Hashing
[Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection
can be constructed and evaluated efficiently for any log-convex function
of the inner product on the unit sphere
.
Our method leads to data structures with sub-linear query time that
significantly improve upon random sampling and can be used for Kernel Density
or Partition Function Estimation. We provide extensions of our result from the
sphere to and from scalar functions to vector functions.Comment: 39 pages, 3 figure
Artificial intelligence methods for security and cyber security systems
This research is in threat analysis and countermeasures employing Artificial Intelligence (AI) methods within the civilian domain, where safety and mission-critical aspects are essential. AI has challenges of repeatable determinism and decision explanation. This research proposed methods for dense and convolutional networks that provided repeatable determinism. In dense networks, the proposed alternative method had an equal performance with more structured learnt weights. The proposed method also had earlier learning and higher accuracy in the Convolutional networks. When demonstrated in colour image classification, the accuracy improved in the first epoch to 67%, from 29% in the existing scheme. Examined in transferred learning with the Fast Sign Gradient Method (FSGM) as an analytical method to control distortion of dissimilarity, a finding was that the proposed method had more significant retention of the learnt model, with 31% accuracy instead of 9%. The research also proposed a threat analysis method with set-mappings and first principle analytical steps applied to a Symbolic AI method using an algebraic expert system with virtualized neurons. The neural expert system method demonstrated the infilling of parameters by calculating beamwidths with variations in the uncertainty of the antenna type. When combined with a proposed formula extraction method, it provides the potential for machine learning of new rules as a Neuro-Symbolic AI method. The proposed method uses extra weights allocated to neuron input value ranges as activation strengths. The method simplifies the learnt representation reducing model depth, thus with less significant dropout potential. Finally, an image classification method for emitter identification is proposed with a synthetic dataset generation method and shows the accurate identification between fourteen radar emission modes with high ambiguity between them (and achieved 99.8% accuracy). That method would be a mechanism to recognize non-threat civil radars aimed at threat alert when deviations from those civilian emitters are detected