3,317 research outputs found
Pivot Selection for Median String Problem
The Median String Problem is W[1]-Hard under the Levenshtein distance, thus,
approximation heuristics are used. Perturbation-based heuristics have been
proved to be very competitive as regards the ratio approximation
accuracy/convergence speed. However, the computational burden increase with the
size of the set. In this paper, we explore the idea of reducing the size of the
problem by selecting a subset of representative elements, i.e. pivots, that are
used to compute the approximate median instead of the whole set. We aim to
reduce the computation time through a reduction of the problem size while
achieving similar approximation accuracy. We explain how we find those pivots
and how to compute the median string from them. Results on commonly used test
data suggest that our approach can reduce the computational requirements
(measured in computed edit distances) by \% with approximation accuracy as
good as the state of the art heuristic.
This work has been supported in part by CONICYT-PCHA/Doctorado
Nacional/ through a Ph.D. Scholarship; Universidad Cat\'{o}lica
de la Sant\'{i}sima Concepci\'{o}n through the research project DIN-01/2016;
European Union's Horizon 2020 under the Marie Sk\l odowska-Curie grant
agreement ; Millennium Institute for Foundational Research on Data
(IMFD); FONDECYT-CONICYT grant number ; and for O. Pedreira, Xunta de
Galicia/FEDER-UE refs. CSI ED431G/01 and GRC: ED431C 2017/58
An Efficient Rank Based Approach for Closest String and Closest Substring
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results
Dynamics of quantum adiabatic evolution algorithm for Number Partitioning
We have developed a general technique to study the dynamics of the quantum
adiabatic evolution algorithm applied to random combinatorial optimization
problems in the asymptotic limit of large problem size . We use as an
example the NP-complete Number Partitioning problem and map the algorithm
dynamics to that of an auxilary quantum spin glass system with the slowly
varying Hamiltonian. We use a Green function method to obtain the adiabatic
eigenstates and the minimum excitation gap, ,
corresponding to the exponential complexity of the algorithm for Number
Partitioning. The key element of the analysis is the conditional energy
distribution computed for the set of all spin configurations generated from a
given (ancestor) configuration by simulteneous fipping of a fixed number of
spins. For the problem in question this distribution is shown to depend on the
ancestor spin configuration only via a certain parameter related to the energy
of the configuration. As the result, the algorithm dynamics can be described in
terms of one-dimenssional quantum diffusion in the energy space. This effect
provides a general limitation on the power of a quantum adiabatic computation
in random optimization problems. Analytical results are in agreement with the
numerical simulation of the algorithm.Comment: 32 pages, 5 figures, 3 Appendices; List of additions compare to v.3:
(i) numerical solution of the stationary Schroedinger equation for the
adiabatic eigenstates and eigenvalues; (ii) connection between the scaling
law of the minimum gap with the problem size and the shape of the
coarse-grained distribution of the adiabatic eigenvalues at the
avoided-crossing poin
The Sketching Complexity of Graph and Hypergraph Counting
Subgraph counting is a fundamental primitive in graph processing, with
applications in social network analysis (e.g., estimating the clustering
coefficient of a graph), database processing and other areas. The space
complexity of subgraph counting has been studied extensively in the literature,
but many natural settings are still not well understood. In this paper we
revisit the subgraph (and hypergraph) counting problem in the sketching model,
where the algorithm's state as it processes a stream of updates to the graph is
a linear function of the stream. This model has recently received a lot of
attention in the literature, and has become a standard model for solving
dynamic graph streaming problems.
In this paper we give a tight bound on the sketching complexity of counting
the number of occurrences of a small subgraph in a bounded degree graph
presented as a stream of edge updates. Specifically, we show that the space
complexity of the problem is governed by the fractional vertex cover number of
the graph . Our subgraph counting algorithm implements a natural vertex
sampling approach, with sampling probabilities governed by the vertex cover of
. Our main technical contribution lies in a new set of Fourier analytic
tools that we develop to analyze multiplayer communication protocols in the
simultaneous communication model, allowing us to prove a tight lower bound. We
believe that our techniques are likely to find applications in other settings.
Besides giving tight bounds for all graphs , both our algorithm and lower
bounds extend to the hypergraph setting, albeit with some loss in space
complexity
Genetic-algorithm-based design of groundwater quality monitoring system
This research builds on the work of Meyer and Brill [I988] and subsequent work by Meyer et al. [1990], Meyer et al. [1992], and Meyer [I992] on the optimal location of a network of groundwater monitoring wells under conditions of uncertainty. A method of optimization is developed using genetic algorithms (GAS) which allows consideration of the two objectives of Meyer et al. [1992], maximizing reliability and minimizing contaminated area, separately yet simultaneously. The GA-based solution method can generate both convex and non-convex points of the tradeoff curve, can accommodate non-linearities in the two objective functions, and is not restricted to the peculiarities of a weighted objective function. Furthermore, GAS can generate large portions of the tradeoff curve in a single iteration and may be more efficient than methods that generate only a single point at a time.Four multi-objective GAS formulations are investigated and their performance in generating the multi-objective tradeoff curve is evaluated for the groundwater monitoring problem using two example data sets. The GA formulations are compared to each other and to simulated annealing on both performance and computational intensity.The simulated annealing based technique used by Meyer et al. [I992] relies on a weighted objective function which finds only a single point along the tradeoff curve for each iteration, while the multiple-objective GA formulations are able to find many convex and nonconvex points along the tradeoff curve in a single iteration. Each iteration of simulated annealing is approximately five times faster than an iteration of the genetic algorithm, but several simulated annealing iterations are required to generate the tradeoff curve. GAS are able to find a larger number of non-dominated points on the tradeoff curve in a single iteration, and are therefore just as computationally efficient as simulated annealing in terms of generating the tradeoff curves.None of the GA formulations demonstrate the ability to generate the entire tradeoff curve in a single iteration, but they yield either a good estimation of all regions of the tradeoff curve except the very highest and very lowest reliability ends or a good estimation of the high reliability end alone.U.S. Department of the InteriorU.S. Geological Surve
Boosting Perturbation-Based Iterative Algorithms to Compute the Median String
[Abstract] The most competitive heuristics for calculating the median string are those that use perturbation-based iterative algorithms. Given the complexity of this problem, which under many formulations is NP-hard, the computational cost involved in the exact solution is not affordable. In this work, the heuristic algorithms that solve this problem are addressed, emphasizing its initialization and the policy to order possible editing operations. Both factors have a significant weight in the solution of this problem. Initial string selection influences the algorithm’s speed of convergence, as does the criterion chosen to select the modification to be made in each iteration of the algorithm. To obtain the initial string, we use the median of a subset of the original dataset; to obtain this subset, we employ the Half Space Proximal (HSP) test to the median of the dataset. This test provides sufficient diversity within the members of the subset while at the same time fulfilling the centrality criterion. Similarly, we provide an analysis of the stop condition of the algorithm, improving its performance without substantially damaging the quality of the solution. To analyze the results of our experiments, we computed the execution time of each proposed modification of the algorithms, the number of computed editing distances, and the quality of the solution obtained. With these experiments, we empirically validated our proposal.This work was supported in part by the Comisión Nacional de Investigación CientÃfica y Tecnológica - Programa de Formación de Capital Humano Avanzado (CONICYT-PCHA)/Doctorado Nacional/2014-63140074 through the Ph.D. Scholarship, in part by the European Union's Horizon 2020 under the Marie Sklodowska-Curie under Grant 690941, in part by the Millennium Institute for Foundational Research on Data (IMFD), and in part by the FONDECYT-CONICYT under Grant 1170497. The work of ÓSCAR PEDREIRA was supported in part by the Xunta de Galicia/FEDER-UE refs under Grant CSI ED431G/01 and Grant GRC: ED431C 2017/58, in part by the Office of the Vice President for Research and Postgraduate Studies of the Universidad Católica de Temuco, VIPUCT Project 2020EM-PS-08, and in part by the FEQUIP 2019-INRN-03 of the Universidad Católica de TemucoXunta de Galicia; ED431G/01Xunta de Galicia; ED431C 2017/58Chile. Comisión Nacional de Investigación CientÃfica y Tecnológica; 2014-63140074Chile. Comisión Nacional de Investigación CientÃfica y Tecnológica; 1170497Universidad Católica de Temuco (Chile); 2020EM-PS-08Universidad Católica de Temuco (Chile); 2019-INRN-0
- …