6 research outputs found
SUNNY-CP and the MiniZinc Challenge
In Constraint Programming (CP) a portfolio solver combines a variety of
different constraint solvers for solving a given problem. This fairly recent
approach enables to significantly boost the performance of single solvers,
especially when multicore architectures are exploited. In this work we give a
brief overview of the portfolio solver sunny-cp, and we discuss its performance
in the MiniZinc Challenge---the annual international competition for CP
solvers---where it won two gold medals in 2015 and 2016. Under consideration in
Theory and Practice of Logic Programming (TPLP)Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP
SUNNY-CP: a Portfolio Solver for Constraint Programming
In Constraint Programming (CP) a portfolio solver combines a variety of different constraint solvers for solving a given problem. This fairly recent approach enables to significantly boost the performance of single solvers, especially when multicore architectures are exploited. In this work we give a brief overview of the portfolio solver sunny-cp, and we discuss its performance in the last MiniZinc Challenge —the annual international competition for CP solvers— where it won a gold medal
Universal performance bounds of restart
As has long been known to computer scientists, the performance of
probabilistic algorithms characterized by relatively large runtime fluctuations
can be improved by applying a restart, i.e., episodic interruption of a
randomized computational procedure followed by initialization of its new
statistically independent realization. A similar effect of restart-induced
process acceleration could potentially be possible in the context of enzymatic
reactions, where dissociation of the enzyme-substrate intermediate corresponds
to restarting the catalytic step of the reaction. To date, a significant number
of analytical results have been obtained in physics and computer science
regarding the effect of restart on the completion time statistics in various
model problems, however, the fundamental limits of restart efficiency remain
unknown. Here we derive a range of universal statistical inequalities that
offer constraints on the effect that restart could impose on the completion
time of a generic stochastic process. The corresponding bounds are expressed
via simple statistical metrics of the original process such as harmonic mean
, median value and mode , and, thus, are remarkably practical. We
test our analytical predictions with multiple numerical examples, discuss
implications arising from them and important avenues of future work.Comment: 12 pages, 2 figure
sunny-as2: Enhancing SUNNY for Algorithm Selection
SUNNY is an Algorithm Selection (AS) technique originally tailored for
Constraint Programming (CP). SUNNY enables to schedule, from a portfolio of
solvers, a subset of solvers to be run on a given CP problem. This approach has
proved to be effective for CP problems, and its parallel version won many gold
medals in the Open category of the MiniZinc Challenge -- the yearly
international competition for CP solvers. In 2015, the ASlib benchmarks were
released for comparing AS systems coming from disparate fields (e.g., ASP, QBF,
and SAT) and SUNNY was extended to deal with generic AS problems. This led to
the development of sunny-as2, an algorithm selector based on SUNNY for ASlib
scenarios. A preliminary version of sunny-as2 was submitted to the Open
Algorithm Selection Challenge (OASC) in 2017, where it turned out to be the
best approach for the runtime minimization of decision problems. In this work,
we present the technical advancements of sunny-as2, including: (i)
wrapper-based feature selection; (ii) a training approach combining feature
selection and neighbourhood size configuration; (iii) the application of nested
cross-validation. We show how sunny-as2 performance varies depending on the
considered AS scenarios, and we discuss its strengths and weaknesses. Finally,
we also show how sunny-as2 improves on its preliminary version submitted to
OASC
On the enhancement of Big Data Pipelines through Data Preparation, Data Quality, and the distribution of Optimisation Problems
Nowadays, data are fundamental for companies, providing operational support by facilitating daily
transactions. Data has also become the cornerstone of strategic decision-making processes in
businesses. For this purpose, there are numerous techniques that allow to extract knowledge and
value from data. For example, optimisation algorithms excel at supporting decision-making
processes to improve the use of resources, time and costs in the organisation. In the current
industrial context, organisations usually rely on business processes to orchestrate their daily
activities while collecting large amounts of information from heterogeneous sources. Therefore,
the support of Big Data technologies (which are based on distributed environments) is required
given the volume, variety and speed of data. Then, in order to extract value from the data, a set
of techniques or activities is applied in an orderly way and at different stages. This set of
techniques or activities, which facilitate the acquisition, preparation, and analysis of data, is known
in the literature as Big Data pipelines.
In this thesis, the improvement of three stages of the Big Data pipelines is tackled: Data
Preparation, Data Quality assessment, and Data Analysis. These improvements can be
addressed from an individual perspective, by focussing on each stage, or from a more complex
and global perspective, implying the coordination of these stages to create data workflows.
The first stage to improve is the Data Preparation by supporting the preparation of data with
complex structures (i.e., data with various levels of nested structures, such as arrays).
Shortcomings have been found in the literature and current technologies for transforming complex
data in a simple way. Therefore, this thesis aims to improve the Data Preparation stage through
Domain-Specific Languages (DSLs). Specifically, two DSLs are proposed for different use cases.
While one of them is a general-purpose Data Transformation language, the other is a DSL aimed
at extracting event logs in a standard format for process mining algorithms.
The second area for improvement is related to the assessment of Data Quality. Depending on the
type of Data Analysis algorithm, poor-quality data can seriously skew the results. A clear example
are optimisation algorithms. If the data are not sufficiently accurate and complete, the search
space can be severely affected. Therefore, this thesis formulates a methodology for modelling
Data Quality rules adjusted to the context of use, as well as a tool that facilitates the automation
of their assessment. This allows to discard the data that do not meet the quality criteria defined
by the organisation. In addition, the proposal includes a framework that helps to select actions to
improve the usability of the data.
The third and last proposal involves the Data Analysis stage. In this case, this thesis faces the
challenge of supporting the use of optimisation problems in Big Data pipelines. There is a lack of
methodological solutions that allow computing exhaustive optimisation problems in distributed
environments (i.e., those optimisation problems that guarantee the finding of an optimal solution
by exploring the whole search space). The resolution of this type of problem in the Big Data
context is computationally complex, and can be NP-complete. This is caused by two different
factors. On the one hand, the search space can increase significantly as the amount of data to
be processed by the optimisation algorithms increases. This challenge is addressed through a
technique to generate and group problems with distributed data. On the other hand, processing
optimisation problems with complex models and large search spaces in distributed environments
is not trivial. Therefore, a proposal is presented for a particular case in this type of scenario.
As a result, this thesis develops methodologies that have been published in scientific journals and
conferences.The methodologies have been implemented in software tools that are integrated with
the Apache Spark data processing engine. The solutions have been validated through tests and use cases with real datasets