3,317 research outputs found

    Pivot Selection for Median String Problem

    Full text link
    The Median String Problem is W[1]-Hard under the Levenshtein distance, thus, approximation heuristics are used. Perturbation-based heuristics have been proved to be very competitive as regards the ratio approximation accuracy/convergence speed. However, the computational burden increase with the size of the set. In this paper, we explore the idea of reducing the size of the problem by selecting a subset of representative elements, i.e. pivots, that are used to compute the approximate median instead of the whole set. We aim to reduce the computation time through a reduction of the problem size while achieving similar approximation accuracy. We explain how we find those pivots and how to compute the median string from them. Results on commonly used test data suggest that our approach can reduce the computational requirements (measured in computed edit distances) by 88\% with approximation accuracy as good as the state of the art heuristic. This work has been supported in part by CONICYT-PCHA/Doctorado Nacional/2014−631400742014-63140074 through a Ph.D. Scholarship; Universidad Cat\'{o}lica de la Sant\'{i}sima Concepci\'{o}n through the research project DIN-01/2016; European Union's Horizon 2020 under the Marie Sk\l odowska-Curie grant agreement 690941690941; Millennium Institute for Foundational Research on Data (IMFD); FONDECYT-CONICYT grant number 11704971170497; and for O. Pedreira, Xunta de Galicia/FEDER-UE refs. CSI ED431G/01 and GRC: ED431C 2017/58

    An Efficient Rank Based Approach for Closest String and Closest Substring

    Get PDF
    This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

    Dynamics of quantum adiabatic evolution algorithm for Number Partitioning

    Full text link
    We have developed a general technique to study the dynamics of the quantum adiabatic evolution algorithm applied to random combinatorial optimization problems in the asymptotic limit of large problem size nn. We use as an example the NP-complete Number Partitioning problem and map the algorithm dynamics to that of an auxilary quantum spin glass system with the slowly varying Hamiltonian. We use a Green function method to obtain the adiabatic eigenstates and the minimum excitation gap, gmin=O(n2−n/2)g_{\rm min}={\cal O}(n 2^{-n/2}), corresponding to the exponential complexity of the algorithm for Number Partitioning. The key element of the analysis is the conditional energy distribution computed for the set of all spin configurations generated from a given (ancestor) configuration by simulteneous fipping of a fixed number of spins. For the problem in question this distribution is shown to depend on the ancestor spin configuration only via a certain parameter related to the energy of the configuration. As the result, the algorithm dynamics can be described in terms of one-dimenssional quantum diffusion in the energy space. This effect provides a general limitation on the power of a quantum adiabatic computation in random optimization problems. Analytical results are in agreement with the numerical simulation of the algorithm.Comment: 32 pages, 5 figures, 3 Appendices; List of additions compare to v.3: (i) numerical solution of the stationary Schroedinger equation for the adiabatic eigenstates and eigenvalues; (ii) connection between the scaling law of the minimum gap with the problem size and the shape of the coarse-grained distribution of the adiabatic eigenvalues at the avoided-crossing poin

    The Sketching Complexity of Graph and Hypergraph Counting

    Full text link
    Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e.g., estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph HH in a bounded degree graph GG presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph HH. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of HH. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs HH, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity

    Genetic-algorithm-based design of groundwater quality monitoring system

    Get PDF
    This research builds on the work of Meyer and Brill [I988] and subsequent work by Meyer et al. [1990], Meyer et al. [1992], and Meyer [I992] on the optimal location of a network of groundwater monitoring wells under conditions of uncertainty. A method of optimization is developed using genetic algorithms (GAS) which allows consideration of the two objectives of Meyer et al. [1992], maximizing reliability and minimizing contaminated area, separately yet simultaneously. The GA-based solution method can generate both convex and non-convex points of the tradeoff curve, can accommodate non-linearities in the two objective functions, and is not restricted to the peculiarities of a weighted objective function. Furthermore, GAS can generate large portions of the tradeoff curve in a single iteration and may be more efficient than methods that generate only a single point at a time.Four multi-objective GAS formulations are investigated and their performance in generating the multi-objective tradeoff curve is evaluated for the groundwater monitoring problem using two example data sets. The GA formulations are compared to each other and to simulated annealing on both performance and computational intensity.The simulated annealing based technique used by Meyer et al. [I992] relies on a weighted objective function which finds only a single point along the tradeoff curve for each iteration, while the multiple-objective GA formulations are able to find many convex and nonconvex points along the tradeoff curve in a single iteration. Each iteration of simulated annealing is approximately five times faster than an iteration of the genetic algorithm, but several simulated annealing iterations are required to generate the tradeoff curve. GAS are able to find a larger number of non-dominated points on the tradeoff curve in a single iteration, and are therefore just as computationally efficient as simulated annealing in terms of generating the tradeoff curves.None of the GA formulations demonstrate the ability to generate the entire tradeoff curve in a single iteration, but they yield either a good estimation of all regions of the tradeoff curve except the very highest and very lowest reliability ends or a good estimation of the high reliability end alone.U.S. Department of the InteriorU.S. Geological Surve

    Boosting Perturbation-Based Iterative Algorithms to Compute the Median String

    Get PDF
    [Abstract] The most competitive heuristics for calculating the median string are those that use perturbation-based iterative algorithms. Given the complexity of this problem, which under many formulations is NP-hard, the computational cost involved in the exact solution is not affordable. In this work, the heuristic algorithms that solve this problem are addressed, emphasizing its initialization and the policy to order possible editing operations. Both factors have a significant weight in the solution of this problem. Initial string selection influences the algorithm’s speed of convergence, as does the criterion chosen to select the modification to be made in each iteration of the algorithm. To obtain the initial string, we use the median of a subset of the original dataset; to obtain this subset, we employ the Half Space Proximal (HSP) test to the median of the dataset. This test provides sufficient diversity within the members of the subset while at the same time fulfilling the centrality criterion. Similarly, we provide an analysis of the stop condition of the algorithm, improving its performance without substantially damaging the quality of the solution. To analyze the results of our experiments, we computed the execution time of each proposed modification of the algorithms, the number of computed editing distances, and the quality of the solution obtained. With these experiments, we empirically validated our proposal.This work was supported in part by the Comisión Nacional de Investigación Científica y Tecnológica - Programa de Formación de Capital Humano Avanzado (CONICYT-PCHA)/Doctorado Nacional/2014-63140074 through the Ph.D. Scholarship, in part by the European Union's Horizon 2020 under the Marie Sklodowska-Curie under Grant 690941, in part by the Millennium Institute for Foundational Research on Data (IMFD), and in part by the FONDECYT-CONICYT under Grant 1170497. The work of ÓSCAR PEDREIRA was supported in part by the Xunta de Galicia/FEDER-UE refs under Grant CSI ED431G/01 and Grant GRC: ED431C 2017/58, in part by the Office of the Vice President for Research and Postgraduate Studies of the Universidad Católica de Temuco, VIPUCT Project 2020EM-PS-08, and in part by the FEQUIP 2019-INRN-03 of the Universidad Católica de TemucoXunta de Galicia; ED431G/01Xunta de Galicia; ED431C 2017/58Chile. Comisión Nacional de Investigación Científica y Tecnológica; 2014-63140074Chile. Comisión Nacional de Investigación Científica y Tecnológica; 1170497Universidad Católica de Temuco (Chile); 2020EM-PS-08Universidad Católica de Temuco (Chile); 2019-INRN-0
    • …
    corecore