15 research outputs found
CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads
Index tuning, i.e., selecting the indexes appropriate for a workload, is a
crucial problem in database system tuning. In this paper, we solve index tuning
for large problem instances that are common in practice, e.g., thousands of
queries in the workload, thousands of candidate indexes and several hard and
soft constraints. Our work is the first to reveal that the index tuning problem
has a well structured space of solutions, and this space can be explored
efficiently with well known techniques from linear optimization. Experimental
results demonstrate that our approach outperforms state-of-the-art commercial
and research techniques by a significant margin (up to an order of magnitude).Comment: VLDB201
Bi-directional Search for Robust Routes in Time-dependent Bi-criteria Road Networks
Based on time-dependent travel times for N past days, we consider the computation of robust routes according to the min-max relative regret criterion. For this method we seek a path minimizing its maximum weight in any one of the N days, normalized by the weight of an optimum for the respective day. In order to speed-up this computationally demanding approach, we observe that its output belongs to the Pareto front of the network with time-dependent
multi-criteria edge weights. We adapt a well-known algorithm for computing Pareto fronts in time-dependent graphs and apply the bi-directional search technique to it. We also show how to parametrize this algorithm by a value K to compute a K-approximate Pareto front. An experimental evaluation for the cases N = 2 and N = 3 indicates a considerable speed-up of the bi-directional search over the uni-directional
Seeding the Initial Population of Multi-Objective Evolutionary Algorithms: A Computational Study
Most experimental studies initialize the population of evolutionary
algorithms with random genotypes. In practice, however, optimizers are
typically seeded with good candidate solutions either previously known or
created according to some problem-specific method. This "seeding" has been
studied extensively for single-objective problems. For multi-objective
problems, however, very little literature is available on the approaches to
seeding and their individual benefits and disadvantages. In this article, we
are trying to narrow this gap via a comprehensive computational study on common
real-valued test functions. We investigate the effect of two seeding techniques
for five algorithms on 48 optimization problems with 2, 3, 4, 6, and 8
objectives. We observe that some functions (e.g., DTLZ4 and the LZ family)
benefit significantly from seeding, while others (e.g., WFG) profit less. The
advantage of seeding also depends on the examined algorithm
Recommended from our members
Approximation of Multiobjective Optimization Problems
We study optimization problems with multiple objectives. Such problems are pervasive across many diverse disciplines -- in economics, engineering, healthcare, biology, to name but a few -- and heuristic approaches to solve them have already been deployed in several areas, in both academia and industry. Hence, there is a real need for a rigorous investigation of the relevant questions. In such problems we are interested not in a single optimal solution, but in the tradeoff between the different objectives. This is captured by the tradeoff or Pareto curve, the set of all feasible solutions whose vector of the various objectives is not dominated by any other solution. Typically, we have a small number of objectives and we wish to plot the tradeoff curve to get a sense of the design space. Unfortunately, typically the tradeoff curve has exponential size for discrete optimization problems even for two objectives (and is typically infinite for continuous problems). Hence, a natural goal in this setting is, given an instance of a multiobjective problem, to efficiently obtain a ``good'' approximation to the entire solution space with ``few'' solutions. This has been the underlying goal in much of the research in the multiobjective area, with many heuristics proposed for this purpose, typically however without any performance guarantees or complexity analysis. We develop efficient algorithms for the succinct approximation of the Pareto set for a large class of multiobjective problems. First, we investigate the problem of computing a minimum set of solutions that approximates within a specified accuracy the Pareto curve of a multiobjective optimization problem. We provide approximation algorithms with tight performance guarantees for bi-objective problems and make progress for the more challenging case of three and more objectives. Subsequently, we propose and study the notion of the approximate convex Pareto set; a novel notion of approximation to the Pareto set, as the appropriate one for the convex setting. We characterize when such an approximation can be efficiently constructed and investigate the problem of computing minimum size approximate convex Pareto sets, both for discrete and convex problems. Next, we turn to the problem of approximating the Pareto set as efficiently as possible. To this end, we analyze the Chord algorithm, a popular, simple method for the succinct approximation of curves, which is widely used, under different names, in a variety of areas, such as, multiobjective and parametric optimization, computational geometry, and graphics
On clustering and related problems on curves under the FrƩchet distance
Sensor measurements can be represented as points in Rd. Ordered by the time-stamps of these measurements, they yield a time series, that can be interpreted as a polygonal curve in the d-dimensional ambient space. The number of the vertices is called complexity of the curve. In this thesis we study several fundamental computational tasks on curves: clustering, sim- pliļ¬cation, and embedding, under the FrĀ“echet distance, which is a popular distance measure for curves, in its continuous and discrete version.
We focus on curves in one-dimensional ambient space R. We study
the problem of clustering of the curves in R under the FrĀ“echet distance, in particular, the following variations of the well-known k-center and k- median problems. Given is a set P of n curves in R, each of complexity at most m. Our goal is to ļ¬nd k curves in R, not necessarily from P , called cluster centers and that each has complexity at most R. In the (k, R)-center problem, the maximum distance of an element of P to its nearest cluster center is minimized. In the (k, R)-median problem, the sum of these distances is minimized. We show that both problems are NP-hard under both versions of the FrĀ“echet distance, if k is part of the input.
Under the continuous FrĀ“echet distance, we give (1 + Īµ)-approximation algorithms for both (k, R)-center and (k, R)-median problem, with running time near-linear in the input size for constant Īµ, k and R. Our techniques yield constant-factor approximation algorithms for the observed problems under the discrete FrĀ“echet distance.
To obtain the (1 + Īµ)-approximation algorithms for the clustering prob- lems under the continuous FrĀ“echet distance, we develop a new simpliļ¬cation technique on one-dimensional curve, called Ī“-signature. The signatures al- ways exist, and we can compute them eļ¬ciently.
We also study the problem of embedding of the FrĀ“echet distance into space R. We show that, in the worst case and under reasonable assumptions, the discrete FrĀ“echet distance between two polygonal curves of complexity m in Rd, where 2 ā¤ d ā¤ 7, degrades by a factor linear in m with constant probability, when the curves are projected onto a randomly chosen line. We show upper and lower bounds on the distortion.
Sensor measurements can also deļ¬ne a discrete distribution over possi- ble locations of a point in Rd. Then, the input consists of n probabilistic points. We study the probabilistic 1-center problem in Euclidean space Rd, also known as the probabilistic smallest enclosing ball (pSEB) problem. To improve the best existing algorithm for the pSEB problem by reducing its exponential dependence on the dimension to linear, we study the determinis- tic set median problem, that generalizes both the 1-center and the 1-median problems. We present a (1 + Īµ)-approximation algorithm for the set median
problem, using a novel combination of sampling techniques and stochastic subgradient descent.
Our (1 + Īµ)-approximation algorithm for the pSEB problem takes linear time in d and n, making the pSEB algorithm applicable to shape ļ¬tting problems in Hilbert spaces of unbounded dimension using kernel functions. We present an exemplary application by extending the support vector data description (SVDD) shape ļ¬tting method to the probabilistic case
Corridor Location: Generating Competitive and Efficient Route Alternatives
The problem of transmission line corridor location can be considered, at best, a "wicked" public systems decision problem. It requires the consideration of numerous objectives while balancing the priorities of a variety of stakeholders, and designers should be prepared to develop diverse non-inferior route alternatives that must be defensible under the scrutiny of a public forum. Political elements aside, the underlying geographical computational problems that must be solved to provide a set of high quality alternatives are no less easy, as they require solving difficult spatial optimization problems on massive GIS terrain-based raster data sets.Transmission line siting methodologies have previously been developed to guide designers in this endeavor, but close scrutiny of these methodologies show that there are many shortcomings with their approaches. The main goal of this dissertation is to take a fresh look at the process of corridor location, and develop a set of algorithms that compute path alternatives using a foundation of solid geographical theory in order to offer designers better tools for developing quality alternatives that consider the entire spectrum of viable solutions. And just as importantly, as data sets become increasingly massive and present challenging computational elements, it is important that algorithms be efficient and able to take advantage of parallel computing resources.A common approach to simplify a problem with numerous objectives is to combine the cost layers into a composite a priori weighted single-objective raster grid. This dissertation examines new methods used for determining a spatially diverse set of near-optimal alternatives, and develops parallel computing techniques for brute-force near-optimal path enumeration, as well as more elegant methods that take advantage of the hierarchical structure of the underlying path-tree computation to select sets of spatially diverse near optimal paths.Another approach for corridor location is to simultaneously consider all objectives to determine the set of Pareto-optimal solutions between the objectives. This amounts to solving a discrete multi-objective shortest path problem, which is considered to be NP-Hard for computing the full set of non-inferior solutions. Given the difficulty of solving for the complete Pareto-optimal set, this dissertation develops an approximation heuristic to compute path sets that are nearly exact-optimal in a fraction of the time when compared to exact algorithms. This method is then applied as an upper bound to an exact enumerative approach, resulting in significant performance speedups. But as analytic computing continues to moved toward distributed clusters, it is important to optimize algorithms to take full advantage parallel computing. To that extent, this dissertation develops a scalable parallel framework that efficiently solves for the supported/convex solutions of a biobjective shortest path problem. This framework is equally applicable to other biobjective network optimization problems, providing a powerful tool for solving the next generation of location analysis and geographical optimization models