3,987 research outputs found
The similarity metric
A new class of distances appropriate for measuring similarity relations
between sequences, say one type of similarity per distance, is studied. We
propose a new ``normalized information distance'', based on the noncomputable
notion of Kolmogorov complexity, and show that it is in this class and it
minorizes every computable distance in the class (that is, it is universal in
that it discovers all computable similarities). We demonstrate that it is a
metric and call it the {\em similarity metric}. This theory forms the
foundation for a new practical tool. To evidence generality and robustness we
give two distinctive applications in widely divergent areas using standard
compression programs like gzip and GenCompress. First, we compare whole
mitochondrial genomes and infer their evolutionary history. This results in a
first completely automatic computed whole mitochondrial phylogeny tree.
Secondly, we fully automatically compute the language tree of 52 different
languages.Comment: 13 pages, LaTex, 5 figures, Part of this work appeared in Proc. 14th
ACM-SIAM Symp. Discrete Algorithms, 2003. This is the final, corrected,
version to appear in IEEE Trans Inform. T
On The Similarity Metric
In mathematics, and more specifically in topology, the notion of distance metric is well known since the nineteenth century. It is used to measure the difference between two objects. When it comes to characterizing the similarity between two objects, a similarity metric is needed. Although widely used in computer science, such a metric is not clearly defined mathematically. We fill in the existing gap in the current literature concerning similarity metrics, connecting them to the well-known notion of partial metrics in general topology
Information similarity metrics in information security and forensics
We study two information similarity measures, relative entropy and the similarity metric, and methods for estimating them. Relative entropy can be readily estimated with existing algorithms based on compression. The similarity metric, based on algorithmic complexity, proves to be more difficult to estimate due to the fact that algorithmic complexity itself is not computable. We again turn to compression for estimating the similarity metric. Previous studies rely on the compression ratio as an indicator for choosing compressors to estimate the similarity metric. This assumption, however, is fundamentally flawed. We propose a new method to benchmark compressors for estimating the similarity metric. To demonstrate its use, we propose to quantify the security of a stegosystem using the similarity metric. Unlike other measures of steganographic security, the similarity metric is not only a true distance metric, but it is also universal in the sense that it is asymptotically minimal among all computable metrics between two objects. Therefore, it accounts for all similarities between two objects. In contrast, relative entropy, a widely accepted steganographic security definition, only takes into consideration the statistical similarity between two random variables. As an application, we present a general method for benchmarking stegosystems. The method is general in the sense that it is not restricted to any covertext medium and therefore, can be applied to a wide range of stegosystems. For demonstration, we analyze several image stegosystems using the newly proposed similarity metric as the security metric. The results show the true security limits of stegosystems regardless of the chosen security metric or the existence of steganalysis detectors. In other words, this makes it possible to show that a stegosystem with a large similarity metric is inherently insecure, even if it has not yet been broken
On Empirical Entropy
We propose a compression-based version of the empirical entropy of a finite
string over a finite alphabet. Whereas previously one considers the naked
entropy of (possibly higher order) Markov processes, we consider the sum of the
description of the random variable involved plus the entropy it induces. We
assume only that the distribution involved is computable. To test the new
notion we compare the Normalized Information Distance (the similarity metric)
with a related measure based on Mutual Information in Shannon's framework. This
way the similarities and differences of the last two concepts are exposed.Comment: 14 pages, LaTe
Deep Metric Learning via Facility Location
Learning the representation and the similarity metric in an end-to-end
fashion with deep networks have demonstrated outstanding results for clustering
and retrieval. However, these recent approaches still suffer from the
performance degradation stemming from the local metric training procedure which
is unaware of the global structure of the embedding space.
We propose a global metric learning scheme for optimizing the deep metric
embedding with the learnable clustering function and the clustering metric
(NMI) in a novel structured prediction framework.
Our experiments on CUB200-2011, Cars196, and Stanford online products
datasets show state of the art performance both on the clustering and retrieval
tasks measured in the NMI and Recall@K evaluation metrics.Comment: Submission accepted at CVPR 201
Learning Deep Similarity Metric for 3D MR-TRUS Registration
Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance
(MR) images for guiding targeted prostate biopsy has significantly improved the
biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image
registration. However, it is very challenging to obtain a robust automatic
MR-TRUS registration due to the large appearance difference between the two
imaging modalities. The work presented in this paper aims to tackle this
problem by addressing two challenges: (i) the definition of a suitable
similarity metric and (ii) the determination of a suitable optimization
strategy.
Methods: This work proposes the use of a deep convolutional neural network to
learn a similarity metric for MR-TRUS registration. We also use a composite
optimization strategy that explores the solution space in order to search for a
suitable initialization for the second-order optimization of the learned
metric. Further, a multi-pass approach is used in order to smooth the metric
for optimization.
Results: The learned similarity metric outperforms the classical mutual
information and also the state-of-the-art MIND feature based methods. The
results indicate that the overall registration framework has a large capture
range. The proposed deep similarity metric based approach obtained a mean TRE
of 3.86mm (with an initial TRE of 16mm) for this challenging problem.
Conclusion: A similarity metric that is learned using a deep neural network
can be used to assess the quality of any given image registration and can be
used in conjunction with the aforementioned optimization framework to perform
automatic registration that is robust to poor initialization.Comment: To appear on IJCAR
Experimental study of a similarity metric for retrieving pieces from structured plan cases: its role in the originality of plan case solutions
This paper describes a quantitative similarity metric and its
contribution to achieve original plan solutions. This similarity metric is
used by an iterative process of piece retrieval from structured plan cases.
Within our approach plan cases are tree-like networks of pieces (goals and
actions). These case pieces are ill-related each other by links
(explanations). These links may be classified as hierarchical or temporal,
antecedent or consequent, and explicit or implicit. Besides links, each case
piece has also information about its properties (the attributes-value pairs),
its hierarchical and temporal position in the case (the address), and about its
constraints in the relationship with others (the constraints). The similarity
metric computes a similarity value between two case pieces taking into
account similarities between these case piece’s information types. Each
time a problem is proposed, different weights are given to some of those
similarities, with the aim of solving it with an original solution. This
similarity metric is used by the system INSPIRER (ImagiNation taking as
Source Past and Imperfectly REalated Reasonings). We illustrate the role of
the similarity metric in the creativity of solutions, focusing specially their
originality, with the presentation of the experimental results obtained in
the musical composition domain, which is considered by us as a planning
domain
- …