238 research outputs found
Configuration Analysis for Large Scale Feature Models: Towards Speculative-Based Solutions
Los sistemas de alta variabilidad son sistemas de software en los que la gestión de la
variabilidad es una actividad central. Algunos ejemplos actuales de sistemas de alta
variabilidad son el sistema web de gesión de contenidos Drupal, el núcleo de Linux,
y las distribuciones Debian de Linux.
La configuración en sistemas de alta variabilidad es la selección de opciones
de configuración según sus restricciones de configuración y los requerimientos de
usuario. Los modelos de características son un estándar “de facto” para modelar las
funcionalidades comunes y variables de sistemas de alta variabilidad. No obstante,
el elevado número de componentes y configuraciones que un modelo de características
puede contener hacen que el análisis manual de estos modelos sea una tarea muy
costosa y propensa a errores. Así nace el análisis automatizado de modelos de características
con mecanismos y herramientas asistidas por computadora para extraer
información de estos modelos. Las soluciones tradicionales de análisis automatizado
de modelos de características siguen un enfoque de computación secuencial para
utilizar una unidad central de procesamiento y memoria. Estas soluciones son adecuadas
para trabajar con sistemas de baja escala. Sin embargo, dichas soluciones demandan
altos costos de computación para trabajar con sistemas de gran escala y alta
variabilidad. Aunque existan recusos informáticos para mejorar el rendimiento de
soluciones de computación, todas las soluciones con un enfoque de computación secuencial
necesitan ser adaptadas para el uso eficiente de estos recursos y optimizar su
rendimiento computacional. Ejemplos de estos recursos son la tecnología de múltiples
núcleos para computación paralela y la tecnología de red para computación distribuida.
Esta tesis explora la adaptación y escalabilidad de soluciones para el analisis automatizado
de modelos de características de gran escala. En primer lugar, nosotros
presentamos el uso de programación especulativa para la paralelización de soluciones.
Además, nosotros apreciamos un problema de configuración desde otra perspectiva,
para su solución mediante la adaptación y aplicación de una solución no
tradicional. Más tarde, nosotros validamos la escalabilidad y mejoras de rendimiento
computacional de estas soluciones para el análisis automatizado de modelos de características
de gran escala.
Concretamente, las principales contribuciones de esta tesis son:
• Programación especulativa para la detección de un conflicto mínimo y
1
2
preferente. Los algoritmos de detección de conflictos mínimos determinan
el conjunto mínimo de restricciones en conflicto que son responsables de comportamiento
defectuoso en el modelo en análisis. Nosotros proponemos una
solución para, mediante programación especulativa, ejecutar en paralelo y reducir
el tiempo de ejecución de operaciones de alto costo computacional que
determinan el flujo de acción en la detección de conflicto mínimo y preferente
en modelos de características de gran escala.
• Programación especulativa para un diagnóstico mínimo y preferente. Los
algoritmos de diagnóstico mínimo determinan un conjunto mínimo de restricciones
que, por una adecuada adaptación de su estado, permiten conseguir un
modelo consistente o libre de conflictos. Este trabajo presenta una solución
para el diagnóstico mínimo y preferente en modelos de características de gran
escala mediante la ejecución especulativa y paralela de operaciones de alto
costo computacional que determinan el flujo de acción, y entonces disminuir
el tiempo de ejecución de la solución.
• Completar de forma mínima y preferente una configuración de modelo
por diagnóstico. Las soluciones para completar una configuración parcial
determinan un conjunto no necesariamente mínimo ni preferente de opciones
para obtener una completa configuración. Esta tesis soluciona el completar
de forma mínima y preferente una configuración de modelo mediante técnicas
previamente usadas en contexto de diagnóstico de modelos de características.
Esta tesis evalua que todas nuestras soluciones preservan los valores de salida esperados,
y también presentan mejoras de rendimiento en el análisis automatizado de
modelos de características con modelos de gran escala en las operaciones descrita
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
Topic models, and more specifically the class of Latent Dirichlet Allocation
(LDA), are widely used for probabilistic modeling of text. MCMC sampling from
the posterior distribution is typically performed using a collapsed Gibbs
sampler. We propose a parallel sparse partially collapsed Gibbs sampler and
compare its speed and efficiency to state-of-the-art samplers for topic models
on five well-known text corpora of differing sizes and properties. In
particular, we propose and compare two different strategies for sampling the
parameter block with latent topic indicators. The experiments show that the
increase in statistical inefficiency from only partial collapsing is smaller
than commonly assumed, and can be more than compensated by the speedup from
parallelization and sparsity on larger corpora. We also prove that the
partially collapsed samplers scale well with the size of the corpus. The
proposed algorithm is fast, efficient, exact, and can be used in more modeling
situations than the ordinary collapsed sampler.Comment: Accepted for publication in Journal of Computational and Graphical
Statistic
Automated model-based spreadsheet debugging
Spreadsheets are interactive data organization and calculation programs that are developed in spreadsheet environments like Microsoft Excel or LibreOffice Calc. They are probably the most successful example of end-user developed software and are utilized in almost all branches and at all levels of companies. Although spreadsheets often support important decision making processes, they are, like all software, prone to error. In several cases, faults in spreadsheets have caused severe losses of money.
Spreadsheet developers are usually not educated in the practices of software development. As they are thus not familiar with quality control methods like systematic testing or debugging, they have to be supported by the spreadsheet environment itself to search for faults in their calculations in order to ensure the correctness and a better overall quality of the developed spreadsheets.
This thesis by publication introduces several approaches to locate faults in spreadsheets. The presented approaches are based on the principles of Model-Based Diagnosis (MBD), which is a technique to find the possible reasons why a system does not behave as expected. Several new algorithmic enhancements of the general MBD approach are combined in this thesis to allow spreadsheet users to debug their spreadsheets and to efficiently find the reason of the observed unexpected output values. In order to assure a seamless integration into the environment that is well-known to the spreadsheet developers, the presented approaches are implemented as an extension for Microsoft Excel.
The first part of the thesis outlines the different algorithmic approaches that are introduced in this thesis and summarizes the improvements that were achieved over the general MBD approach. In the second part, the appendix, a selection of the author's publications are presented. These publications comprise (a) a survey of the research in the area of spreadsheet quality assurance, (b) a work describing how to adapt the general MBD approach to spreadsheets, (c) two new algorithmic improvements of the general technique to speed up the calculation of the possible reasons of an observed fault, (d) a new concept and algorithm to efficiently determine questions that a user can be asked during debugging in order to reduce the number of possible reasons for the observed unexpected output values, and (e) a new method to find faults in a set of spreadsheets and a new corpus of real-world spreadsheets containing faults that can be used to evaluate the proposed debugging approaches
Do We Really Sample Right In Model-Based Diagnosis?
Statistical samples, in order to be representative, have to be drawn from a
population in a random and unbiased way. Nevertheless, it is common practice in
the field of model-based diagnosis to make estimations from (biased) best-first
samples. One example is the computation of a few most probable possible fault
explanations for a defective system and the use of these to assess which aspect
of the system, if measured, would bring the highest information gain.
In this work, we scrutinize whether these statistically not well-founded
conventions, that both diagnosis researchers and practitioners have adhered to
for decades, are indeed reasonable. To this end, we empirically analyze various
sampling methods that generate fault explanations. We study the
representativeness of the produced samples in terms of their estimations about
fault explanations and how well they guide diagnostic decisions, and we
investigate the impact of sample size, the optimal trade-off between sampling
efficiency and effectivity, and how approximate sampling techniques compare to
exact ones
Analysis of 3D Cone-Beam CT Image Reconstruction Performance on a FPGA
Efficient and accurate tomographic image reconstruction has been an intensive topic of research due to the increasing everyday usage in areas such as radiology, biology, and materials science. Computed tomography (CT) scans are used to analyze internal structures through capture of x-ray images. Cone-beam CT scans project a cone-shaped x-ray to capture 2D image data from a single focal point, rotating around the object. CT scans are prone to multiple artifacts, including motion blur, streaks, and pixel irregularities, therefore must be run through image reconstruction software to reduce visual artifacts. The most common algorithm used is the Feldkamp, Davis, and Kress (FDK) backprojection algorithm. The algorithm is computationally intensive due to the O(n4) backprojection step, running slowly with large CT data files on CPUs, but exceptionally well on GPUs due to the parallel nature of the algorithm. This thesis will analyze the performance of 3D cone-beam CT image reconstruction implemented in OpenCL on a FPGA embedded into a Power System
Explanation in constraint satisfaction: A survey
Much of the focus on explanation in the field of artificial intelligence has focused on machine learning methods and, in particular, concepts produced by advanced methods such as neural networks and deep learning. However, there has been a long history of explanation generation in the general field of constraint satisfaction, one of the AI's most ubiquitous subfields. In this paper we survey the major seminal papers on the explanation and constraints, as well as some more recent works. The survey sets out to unify many disparate lines of work in areas such as model-based diagnosis, constraint programming, Boolean satisfiability, truth maintenance systems, quantified logics, and related areas
Geometric Inhomogeneous Random Graphs for Algorithm Engineering
The design and analysis of graph algorithms is heavily based on the worst case.
In practice, however, many algorithms perform much better than the worst case would suggest.
Furthermore, various problems can be tackled more efficiently if one assumes the input to be, in a sense, realistic.
The field of network science, which studies the structure and emergence of real-world networks, identifies locality and heterogeneity as two frequently occurring properties.
A popular model that captures these properties are geometric inhomogeneous random graphs (GIRGs), which is a generalization of hyperbolic random graphs (HRGs).
Aside from their importance to network science, GIRGs can be an immensely valuable tool in algorithm engineering.
Since they convincingly mimic real-world networks, guarantees about quality and performance of an algorithm on instances of the model can be transferred to real-world applications.
They have model parameters to control the amount of heterogeneity and locality, which allows to evaluate those properties in isolation while keeping the rest fixed.
Moreover, they can be efficiently generated which allows for experimental analysis.
While realistic instances are often rare, generated instances are readily available.
Furthermore, the underlying geometry of GIRGs helps to visualize the network, e.g.,~for debugging or to improve understanding of its structure.
The aim of this work is to demonstrate the capabilities of geometric inhomogeneous random graphs in algorithm engineering and establish them as routine tools to replace previous models like the Erd\H{o}s-R{\\u27e}nyi model, where each edge exists with equal probability.
We utilize geometric inhomogeneous random graphs to design, evaluate, and optimize efficient algorithms for realistic inputs.
In detail, we provide the currently fastest sequential generator for GIRGs and HRGs and describe algorithms for maximum flow, directed spanning arborescence, cluster editing, and hitting set.
For all four problems, our implementations beat the state-of-the-art on realistic inputs.
On top of providing crucial benchmark instances, GIRGs allow us to obtain valuable insights.
Most notably, our efficient generator allows us to
experimentally show sublinear running time of our flow algorithm,
investigate the solution structure of cluster editing,
complement our benchmark set of arborescence instances with a density for which there are no real-world networks available,
and generate networks with adjustable locality and heterogeneity to reveal the effects of these properties on our algorithms
- …