Search CORE

3,054 research outputs found

Pivot Selection for Median String Problem

Author: Abreu José
Mirabal Pedro
Pedreira Oscar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/03/2020
Field of study

The Median String Problem is W[1]-Hard under the Levenshtein distance, thus, approximation heuristics are used. Perturbation-based heuristics have been proved to be very competitive as regards the ratio approximation accuracy/convergence speed. However, the computational burden increase with the size of the set. In this paper, we explore the idea of reducing the size of the problem by selecting a subset of representative elements, i.e. pivots, that are used to compute the approximate median instead of the whole set. We aim to reduce the computation time through a reduction of the problem size while achieving similar approximation accuracy. We explain how we find those pivots and how to compute the median string from them. Results on commonly used test data suggest that our approach can reduce the computational requirements (measured in computed edit distances) by

8

\% with approximation accuracy as good as the state of the art heuristic. This work has been supported in part by CONICYT-PCHA/Doctorado Nacional/

2014-63140074

through a Ph.D. Scholarship; Universidad Cat\'{o}lica de la Sant\'{i}sima Concepci\'{o}n through the research project DIN-01/2016; European Union's Horizon 2020 under the Marie Sk\l odowska-Curie grant agreement

690941

; Millennium Institute for Foundational Research on Data (IMFD); FONDECYT-CONICYT grant number

1170497

; and for O. Pedreira, Xunta de Galicia/FEDER-UE refs. CSI ED431G/01 and GRC: ED431C 2017/58

arXiv.org e-Print Archive

Crossref

Indexability, concentration, and VC theory

Author: Pestov Vladimir
Publication venue: 'Elsevier BV'
Publication date: 21/05/2011
Field of study

Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated. We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded, improved and corrected version of the SISAP'2010 invited paper, this e-print, v3

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Engineering Parallel String Sorting

Author: Bingmann Timo
Eberle Andreas
Sanders Peter
Publication venue
Publication date: 09/03/2014
Field of study

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

arXiv.org e-Print Archive

CiteSeerX

KITopen

Parallel String Sample Sort

Author: J. Kärkkäinen
J. Wassenberg
K. Mehlhorn
P. Sanders
P.M. McIlroy
R. Sinha
R. Sinha
R. Sinha
T. Hagerup
W. Ng
Publication venue
Publication date: 01/01/2013
Field of study

arXiv.org e-Print Archive

CiteSeerX

Crossref

KITopen

SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities

Author: Betts A.
Crosby S. A.
Godefroid P.
Haller I.
Shenoy G. S.
Shenoy G. S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/08/2017
Field of study

Algorithmic complexity vulnerabilities occur when the worst-case time/space complexity of an application is significantly higher than the respective average case for particular user-controlled inputs. When such conditions are met, an attacker can launch Denial-of-Service attacks against a vulnerable application by providing inputs that trigger the worst-case behavior. Such attacks have been known to have serious effects on production systems, take down entire websites, or lead to bypasses of Web Application Firewalls. Unfortunately, existing detection mechanisms for algorithmic complexity vulnerabilities are domain-specific and often require significant manual effort. In this paper, we design, implement, and evaluate SlowFuzz, a domain-independent framework for automatically finding algorithmic complexity vulnerabilities. SlowFuzz automatically finds inputs that trigger worst-case algorithmic behavior in the tested binary. SlowFuzz uses resource-usage-guided evolutionary search techniques to automatically find inputs that maximize computational resource utilization for a given application.Comment: ACM CCS '17, October 30-November 3, 2017, Dallas, TX, US

arXiv.org e-Print Archive

Crossref

Impact of the initialization in tree-based fast similarity search techniques

Author: Micó Luisa
Oncina Jose
Serrano Díaz-Carrasco Aureo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Many fast similarity search techniques relies on the use of pivots (specially selected points in the data set). Using these points, specific structures (indexes) are built speeding up the search when queering. Usually, pivot selection techniques are incremental, being the first one randomly chosen. This article explores several techniques to choose the first pivot in a tree-based fast similarity search technique. We provide experimental results showing that an adequate choice of this pivot leads to significant reductions in distance computations and time complexity. Moreover, most pivot tree-based indexes emphasizes in building balanced trees. We provide experimentally and theoretical support that very unbalanced trees can be a better choice than balanced ones.The authors thank the Spanish CICyT for partial support of this work through projects TIN2009-14205-C04-C1, the Ist Programme of the European Community, under the Pascal Network of Excellence, (Ist– 2006-216886), and the program Consolider Ingenio 2010 (Csd2007-00018)

Repositorio Institucional de la Universidad de Alicante

Simple data analysis for biologists

Author: Baran Eric
Warry Fiona
Publication venue: WorldFish Center and the Fisheries Administration
Publication date: 01/01/2008
Field of study

This document provides a simple introduction to research methods and analysis tools for biologists or environmental scientists, with particular emphasis on fish biology in devleoping countries

Aquatic Commons