2,753 research outputs found
Ptolemaic Indexing
This paper discusses a new family of bounds for use in similarity search,
related to those used in metric indexing, but based on Ptolemy's inequality,
rather than the metric axioms. Ptolemy's inequality holds for the well-known
Euclidean distance, but is also shown here to hold for quadratic form metrics
in general, with Mahalanobis distance as an important special case. The
inequality is examined empirically on both synthetic and real-world data sets
and is also found to hold approximately, with a very low degree of error, for
important distances such as the angular pseudometric and several Lp norms.
Indexing experiments demonstrate a highly increased filtering power compared to
existing, triangular methods. It is also shown that combining the Ptolemaic and
triangular filtering can lead to better results than using either approach on
its own
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
Recommended from our members
Using CBR to improve the usability of numerical models
In this thesis we show that CBR systems can be constructed from numerical models, so as to improve their usability. It is shown that CBR models may be queried in a flexible manner, and that the user may formulate queries consisting of constraints over both “input” and “output” variables of the numerical model. It is also shown that the constraints may be formulated using either nominal or continuous variables. A generalization of the CBR retrieval process to include constraints over unified “input-output” space is formulated as a framework for the method.
The method is illustrated with practical engineering models: the pneumatic conveyor problem and the projectile problem. Comparisons are made on usability of CBR and numerical models for specific problems. It is shown that CBR models can answer questions difficult or impossible to formulate using numerical models, and that CBR models can be faster.
The thesis also addresses a latent problem with the general method, which is of importance generally. This is to do with interpolation over nominal values in unified space. A novel method is proposed for interpolation over nominal values, termed Generalised Shepard Nearest Neighbour method (GSNN). GSNN can utilise distance metrics defined on the solution space of a CBR system.
The properties and advantages of GSNN are examined in the thesis. A comparison is made with other CBR retrieval methods, using several examples, including the travel domain case base. It is shown that GSNN can out-perform conventional nearest neighbour methods. It is shown that GSNN has advantages in that it can find solutions not in the case base and it can find solutions not in the retrieval set. It is also shown that the performance of GSNN can be improved further by using it in conjunction with a diversity algorithm. The merit of using GSNN as a case selection component is examined, and it is shown that it can give good results in sparse case bases.
Finally the thesis concludes with a survey of numerical models where CBR construction can be useful, and where benefits can be expected
- …