6,706 research outputs found
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
Recommended from our members
Combinatorial optimization and metaheuristics
Today, combinatorial optimization is one of the youngest and most active areas of discrete mathematics. It is a branch of optimization in applied mathematics and computer science, related to operational research, algorithm theory and computational complexity theory. It sits at the intersection of several fields, including artificial intelligence, mathematics and software engineering. Its increasing interest arises for the fact that a large number of scientific and industrial problems can be formulated as abstract combinatorial optimization problems, through graphs and/or (integer) linear programs. Some of these problems have polynomial-time (“efficient”) algorithms, while most of them are NP-hard, i.e. it is not proved that they can be solved in polynomial-time. Mainly, it means that it is not possible to guarantee that an exact solution to the problem can be found and one has to settle for an approximate solution with known performance guarantees. Indeed, the goal of approximate methods is to find “quickly” (reasonable run-times), with “high” probability, provable “good” solutions (low error from the real optimal solution). In the last 20 years, a new kind of algorithm commonly called metaheuristics have emerged in this class, which basically try to combine heuristics in high level frameworks aimed at efficiently and effectively exploring the search space. This report briefly outlines the components, concepts, advantages and disadvantages of different metaheuristic approaches from a conceptual point of view, in order to analyze their similarities and differences. The two very significant forces of intensification and diversification, that mainly determine the behavior of a metaheuristic, will be pointed out. The report concludes by exploring the importance of hybridization and integration methods
Feature-based tuning of simulated annealing applied to the curriculum-based course timetabling problem
We consider the university course timetabling problem, which is one of the
most studied problems in educational timetabling. In particular, we focus our
attention on the formulation known as the curriculum-based course timetabling
problem, which has been tackled by many researchers and for which there are
many available benchmarks.
The contribution of this paper is twofold. First, we propose an effective and
robust single-stage simulated annealing method for solving the problem.
Secondly, we design and apply an extensive and statistically-principled
methodology for the parameter tuning procedure. The outcome of this analysis is
a methodology for modeling the relationship between search method parameters
and instance features that allows us to set the parameters for unseen instances
on the basis of a simple inspection of the instance itself. Using this
methodology, our algorithm, despite its apparent simplicity, has been able to
achieve high quality results on a set of popular benchmarks.
A final contribution of the paper is a novel set of real-world instances,
which could be used as a benchmark for future comparison
Maximin design on non hypercube domain and kernel interpolation
In the paradigm of computer experiments, the choice of an experimental design
is an important issue. When no information is available about the black-box
function to be approximated, an exploratory design have to be used. In this
context, two dispersion criteria are usually considered: the minimax and the
maximin ones. In the case of a hypercube domain, a standard strategy consists
of taking the maximin design within the class of Latin hypercube designs.
However, in a non hypercube context, it does not make sense to use the Latin
hypercube strategy. Moreover, whatever the design is, the black-box function is
typically approximated thanks to kernel interpolation. Here, we first provide a
theoretical justification to the maximin criterion with respect to kernel
interpolations. Then, we propose simulated annealing algorithms to determine
maximin designs in any bounded connected domain. We prove the convergence of
the different schemes.Comment: 3 figure
Gaussian process hyper-parameter estimation using parallel asymptotically independent Markov sampling
Gaussian process emulators of computationally expensive computer codes
provide fast statistical approximations to model physical processes. The
training of these surrogates depends on the set of design points chosen to run
the simulator. Due to computational cost, such training set is bound to be
limited and quantifying the resulting uncertainty in the hyper-parameters of
the emulator by uni-modal distributions is likely to induce bias. In order to
quantify this uncertainty, this paper proposes a computationally efficient
sampler based on an extension of Asymptotically Independent Markov Sampling, a
recently developed algorithm for Bayesian inference. Structural uncertainty of
the emulator is obtained as a by-product of the Bayesian treatment of the
hyper-parameters. Additionally, the user can choose to perform stochastic
optimisation to sample from a neighbourhood of the Maximum a Posteriori
estimate, even in the presence of multimodality. Model uncertainty is also
acknowledged through numerical stabilisation measures by including a nugget
term in the formulation of the probability model. The efficiency of the
proposed sampler is illustrated in examples where multi-modal distributions are
encountered. For the purpose of reproducibility, further development, and use
in other applications the code used to generate the examples is freely
available for download at https://github.com/agarbuno/paims_codesComment: Computational Statistics \& Data Analysis, Volume 103, November 201
- …