Search CORE

21 research outputs found

Assessing metric structures on GPGPU environments

Author: Santos Eder dos
Sofía Albert A. O.
Uribe Paredes Roberto
Publication venue
Publication date: 01/10/2015
Field of study

Similarity search consists on retrieving objects within a database that are similar or relevant to a particular query. It is a topic of great interest to scientific community because of its many fields of application, such as searching for words and images on the World Wide Web, pattern recognition, detection of plagiarism, multimedia databases, among others. It is modeled through metric spaces, in which objects are represented in a black-box that contains only the distance between objects; calculating the distance function is costly and search systems operate at a high query rate. Metrical structures have been developed to optimize this process; such structures work as indexes and preprocess data to decrease the distance evaluations during the search. Processing large volumes of data makes unfeasible the use of such structures without using parallel processing environments. Technologies based on multi- CPU and GPU architectures are among the most force due to its costs and performance.XV Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Pivot Selection for Median String Problem

Author: Abreu José
Mirabal Pedro
Pedreira Oscar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/03/2020
Field of study

The Median String Problem is W[1]-Hard under the Levenshtein distance, thus, approximation heuristics are used. Perturbation-based heuristics have been proved to be very competitive as regards the ratio approximation accuracy/convergence speed. However, the computational burden increase with the size of the set. In this paper, we explore the idea of reducing the size of the problem by selecting a subset of representative elements, i.e. pivots, that are used to compute the approximate median instead of the whole set. We aim to reduce the computation time through a reduction of the problem size while achieving similar approximation accuracy. We explain how we find those pivots and how to compute the median string from them. Results on commonly used test data suggest that our approach can reduce the computational requirements (measured in computed edit distances) by

8

\% with approximation accuracy as good as the state of the art heuristic. This work has been supported in part by CONICYT-PCHA/Doctorado Nacional/

2014-63140074

through a Ph.D. Scholarship; Universidad Cat\'{o}lica de la Sant\'{i}sima Concepci\'{o}n through the research project DIN-01/2016; European Union's Horizon 2020 under the Marie Sk\l odowska-Curie grant agreement

690941

; Millennium Institute for Foundational Research on Data (IMFD); FONDECYT-CONICYT grant number

1170497

; and for O. Pedreira, Xunta de Galicia/FEDER-UE refs. CSI ED431G/01 and GRC: ED431C 2017/58

arXiv.org e-Print Archive

Crossref

Assessing metric structures on GPGPU environments

Author: Santos Eder dos
Sofía Albert A. O.
Uribe Paredes Roberto
Publication venue
Publication date: 01/10/2015
Field of study

Localization of seizure sources using blind identification and a new clustering algorithm

Author: Boostani R
Jarchi D
Sanei Saeid
Taheri M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

University of Surrey

Surrey Research Insight

Ptolemaic Indexing

Author: Hetland Magnus Lie
Publication venue
Publication date: 01/01/2015
Field of study

This paper discusses a new family of bounds for use in similarity search, related to those used in metric indexing, but based on Ptolemy's inequality, rather than the metric axioms. Ptolemy's inequality holds for the well-known Euclidean distance, but is also shown here to hold for quadratic form metrics in general, with Mahalanobis distance as an important special case. The inequality is examined empirically on both synthetic and real-world data sets and is also found to hold approximately, with a very low degree of error, for important distances such as the angular pseudometric and several Lp norms. Indexing experiments demonstrate a highly increased filtering power compared to existing, triangular methods. It is also shown that combining the Ptolemaic and triangular filtering can lead to better results than using either approach on its own

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Journal of Computational Geometry (JoCG - Carleton University, Computational Geometry Lab)

NORA - Norwegian Open Research Archives

Assessing metric structures on GPGPU environments

Author: Santos Eder dos
Sofía Albert A. O.
Uribe Paredes Roberto
Publication venue
Publication date: 14/12/2015
Field of study

Servicio de Difusión de la Creación Intelectual

SProt: sphere-based protein structure similarity algorithm

Author: Galgonek Jakub
Hoksza David
Skopal Tomáš
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Exploiting subspace distance equalities in Highdimensional data for knn queries

Author: Broneske David
Köpen Veit
Saake Gunter
Schäler Martin
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2018
Field of study

Efficient k-nearest neighbor computation for high-dimensional data is an important, yet challenging task. The response times of stateof-the-art indexing approaches highly depend on factors like distribution of the data. For clustered data, such approaches are several factors faster than a sequential scan. However, if various dimensions contain uniform or Gaussian data they tend to be clearly outperformed by a simple sequential scan. Hence, we require for an approach generally delivering good response times, independent of the data distribution. As solution, we propose to exploit a novel concept to efficiently compute nearest neighbors. We name it sub-space distance equality, which aims at reducing the number of distance computations independent of the data distribution. We integrate knn computing algorithms into the Elf index structure allowing to study the sub-space distance equality concept in isolation and in combination with a main-memory optimized storage layout. In a large comparative study with twelve data sets, our results indicate that indexes based on sub-space distance equalities compute the least amount of distances. For clustered data, our Elf knn algorithm delivers at least a performance increase of factor two up to an increase of two magnitudes without losing the performance gain compared to sequential scans for uniform or Gaussian data

KITopen