4 research outputs found
FABIOLA: Defining the Components for Constraint Optimization Problems in Big Data Environment
The optimization problems can be found in several examples within companies, such as the minimization of the production costs, the faults produced, or the maximization of customer loyalty. The resolution of them is a challenge that entails an extra effort. In addition, many of today鈥檚 enterprises are encountering the Big Data problems added to these optimization problems. Unfortunately, to tackle this challenge by medium and small companies is extremely difficult or even impossible. In this paper, we propose a framework that isolates companies from how the optimization problems are solved. More specifically, we solve optimization problems where the data is heterogeneous, distributed and of a huge volume. FABIOLA (FAst BIg cOstraint LAb) framework enables to describe the distributed and structured data used in optimization problems that can be parallelized (the variables are not shared between the various optimization problems), and obtains a solution using Constraint Programming Techniques
Developing 5GL Concepts from User Interactions
In the fulfilling of the contracts generated in Test Driven Development, a developer could be said to act as a constraint solver, similar to those used by a 5th Generation Language(5GL). This thesis presents the hypothesis that 5GL linguistic mechanics, such as facts, rules and goals, will be emergent in the communications of developer pairs performing Test Driven Development, validating that 5GL syntax is congruent with the ways that practitioners communicate. Along the way, nomenclatures and linguistic patterns may be observed that could inform the design of future 5GL languages
On the enhancement of Big Data Pipelines through Data Preparation, Data Quality, and the distribution of Optimisation Problems
Nowadays, data are fundamental for companies, providing operational support by facilitating daily
transactions. Data has also become the cornerstone of strategic decision-making processes in
businesses. For this purpose, there are numerous techniques that allow to extract knowledge and
value from data. For example, optimisation algorithms excel at supporting decision-making
processes to improve the use of resources, time and costs in the organisation. In the current
industrial context, organisations usually rely on business processes to orchestrate their daily
activities while collecting large amounts of information from heterogeneous sources. Therefore,
the support of Big Data technologies (which are based on distributed environments) is required
given the volume, variety and speed of data. Then, in order to extract value from the data, a set
of techniques or activities is applied in an orderly way and at different stages. This set of
techniques or activities, which facilitate the acquisition, preparation, and analysis of data, is known
in the literature as Big Data pipelines.
In this thesis, the improvement of three stages of the Big Data pipelines is tackled: Data
Preparation, Data Quality assessment, and Data Analysis. These improvements can be
addressed from an individual perspective, by focussing on each stage, or from a more complex
and global perspective, implying the coordination of these stages to create data workflows.
The first stage to improve is the Data Preparation by supporting the preparation of data with
complex structures (i.e., data with various levels of nested structures, such as arrays).
Shortcomings have been found in the literature and current technologies for transforming complex
data in a simple way. Therefore, this thesis aims to improve the Data Preparation stage through
Domain-Specific Languages (DSLs). Specifically, two DSLs are proposed for different use cases.
While one of them is a general-purpose Data Transformation language, the other is a DSL aimed
at extracting event logs in a standard format for process mining algorithms.
The second area for improvement is related to the assessment of Data Quality. Depending on the
type of Data Analysis algorithm, poor-quality data can seriously skew the results. A clear example
are optimisation algorithms. If the data are not sufficiently accurate and complete, the search
space can be severely affected. Therefore, this thesis formulates a methodology for modelling
Data Quality rules adjusted to the context of use, as well as a tool that facilitates the automation
of their assessment. This allows to discard the data that do not meet the quality criteria defined
by the organisation. In addition, the proposal includes a framework that helps to select actions to
improve the usability of the data.
The third and last proposal involves the Data Analysis stage. In this case, this thesis faces the
challenge of supporting the use of optimisation problems in Big Data pipelines. There is a lack of
methodological solutions that allow computing exhaustive optimisation problems in distributed
environments (i.e., those optimisation problems that guarantee the finding of an optimal solution
by exploring the whole search space). The resolution of this type of problem in the Big Data
context is computationally complex, and can be NP-complete. This is caused by two different
factors. On the one hand, the search space can increase significantly as the amount of data to
be processed by the optimisation algorithms increases. This challenge is addressed through a
technique to generate and group problems with distributed data. On the other hand, processing
optimisation problems with complex models and large search spaces in distributed environments
is not trivial. Therefore, a proposal is presented for a particular case in this type of scenario.
As a result, this thesis develops methodologies that have been published in scientific journals and
conferences.The methodologies have been implemented in software tools that are integrated with
the Apache Spark data processing engine. The solutions have been validated through tests and use cases with real datasets
Opportunities and Challenges for Constraint Programming
Constraint programming has become an important technology for solving hard combinatorial problems in a diverse range of application domains. It has its roots in artificial intelligence, mathematical programming, op- erations research, and programming languages. This paper gives a perspective on where constraint programming is today, and discusses a number of opportunities and challenges that could provide focus for the research community into the future