238 research outputs found

    Configuration Analysis for Large Scale Feature Models: Towards Speculative-Based Solutions

    Get PDF
    Los sistemas de alta variabilidad son sistemas de software en los que la gestión de la variabilidad es una actividad central. Algunos ejemplos actuales de sistemas de alta variabilidad son el sistema web de gesión de contenidos Drupal, el núcleo de Linux, y las distribuciones Debian de Linux. La configuración en sistemas de alta variabilidad es la selección de opciones de configuración según sus restricciones de configuración y los requerimientos de usuario. Los modelos de características son un estándar “de facto” para modelar las funcionalidades comunes y variables de sistemas de alta variabilidad. No obstante, el elevado número de componentes y configuraciones que un modelo de características puede contener hacen que el análisis manual de estos modelos sea una tarea muy costosa y propensa a errores. Así nace el análisis automatizado de modelos de características con mecanismos y herramientas asistidas por computadora para extraer información de estos modelos. Las soluciones tradicionales de análisis automatizado de modelos de características siguen un enfoque de computación secuencial para utilizar una unidad central de procesamiento y memoria. Estas soluciones son adecuadas para trabajar con sistemas de baja escala. Sin embargo, dichas soluciones demandan altos costos de computación para trabajar con sistemas de gran escala y alta variabilidad. Aunque existan recusos informáticos para mejorar el rendimiento de soluciones de computación, todas las soluciones con un enfoque de computación secuencial necesitan ser adaptadas para el uso eficiente de estos recursos y optimizar su rendimiento computacional. Ejemplos de estos recursos son la tecnología de múltiples núcleos para computación paralela y la tecnología de red para computación distribuida. Esta tesis explora la adaptación y escalabilidad de soluciones para el analisis automatizado de modelos de características de gran escala. En primer lugar, nosotros presentamos el uso de programación especulativa para la paralelización de soluciones. Además, nosotros apreciamos un problema de configuración desde otra perspectiva, para su solución mediante la adaptación y aplicación de una solución no tradicional. Más tarde, nosotros validamos la escalabilidad y mejoras de rendimiento computacional de estas soluciones para el análisis automatizado de modelos de características de gran escala. Concretamente, las principales contribuciones de esta tesis son: • Programación especulativa para la detección de un conflicto mínimo y 1 2 preferente. Los algoritmos de detección de conflictos mínimos determinan el conjunto mínimo de restricciones en conflicto que son responsables de comportamiento defectuoso en el modelo en análisis. Nosotros proponemos una solución para, mediante programación especulativa, ejecutar en paralelo y reducir el tiempo de ejecución de operaciones de alto costo computacional que determinan el flujo de acción en la detección de conflicto mínimo y preferente en modelos de características de gran escala. • Programación especulativa para un diagnóstico mínimo y preferente. Los algoritmos de diagnóstico mínimo determinan un conjunto mínimo de restricciones que, por una adecuada adaptación de su estado, permiten conseguir un modelo consistente o libre de conflictos. Este trabajo presenta una solución para el diagnóstico mínimo y preferente en modelos de características de gran escala mediante la ejecución especulativa y paralela de operaciones de alto costo computacional que determinan el flujo de acción, y entonces disminuir el tiempo de ejecución de la solución. • Completar de forma mínima y preferente una configuración de modelo por diagnóstico. Las soluciones para completar una configuración parcial determinan un conjunto no necesariamente mínimo ni preferente de opciones para obtener una completa configuración. Esta tesis soluciona el completar de forma mínima y preferente una configuración de modelo mediante técnicas previamente usadas en contexto de diagnóstico de modelos de características. Esta tesis evalua que todas nuestras soluciones preservan los valores de salida esperados, y también presentan mejoras de rendimiento en el análisis automatizado de modelos de características con modelos de gran escala en las operaciones descrita

    Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

    Full text link
    Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.Comment: Accepted for publication in Journal of Computational and Graphical Statistic

    Automated model-based spreadsheet debugging

    Get PDF
    Spreadsheets are interactive data organization and calculation programs that are developed in spreadsheet environments like Microsoft Excel or LibreOffice Calc. They are probably the most successful example of end-user developed software and are utilized in almost all branches and at all levels of companies. Although spreadsheets often support important decision making processes, they are, like all software, prone to error. In several cases, faults in spreadsheets have caused severe losses of money. Spreadsheet developers are usually not educated in the practices of software development. As they are thus not familiar with quality control methods like systematic testing or debugging, they have to be supported by the spreadsheet environment itself to search for faults in their calculations in order to ensure the correctness and a better overall quality of the developed spreadsheets. This thesis by publication introduces several approaches to locate faults in spreadsheets. The presented approaches are based on the principles of Model-Based Diagnosis (MBD), which is a technique to find the possible reasons why a system does not behave as expected. Several new algorithmic enhancements of the general MBD approach are combined in this thesis to allow spreadsheet users to debug their spreadsheets and to efficiently find the reason of the observed unexpected output values. In order to assure a seamless integration into the environment that is well-known to the spreadsheet developers, the presented approaches are implemented as an extension for Microsoft Excel. The first part of the thesis outlines the different algorithmic approaches that are introduced in this thesis and summarizes the improvements that were achieved over the general MBD approach. In the second part, the appendix, a selection of the author's publications are presented. These publications comprise (a) a survey of the research in the area of spreadsheet quality assurance, (b) a work describing how to adapt the general MBD approach to spreadsheets, (c) two new algorithmic improvements of the general technique to speed up the calculation of the possible reasons of an observed fault, (d) a new concept and algorithm to efficiently determine questions that a user can be asked during debugging in order to reduce the number of possible reasons for the observed unexpected output values, and (e) a new method to find faults in a set of spreadsheets and a new corpus of real-world spreadsheets containing faults that can be used to evaluate the proposed debugging approaches

    Do We Really Sample Right In Model-Based Diagnosis?

    Full text link
    Statistical samples, in order to be representative, have to be drawn from a population in a random and unbiased way. Nevertheless, it is common practice in the field of model-based diagnosis to make estimations from (biased) best-first samples. One example is the computation of a few most probable possible fault explanations for a defective system and the use of these to assess which aspect of the system, if measured, would bring the highest information gain. In this work, we scrutinize whether these statistically not well-founded conventions, that both diagnosis researchers and practitioners have adhered to for decades, are indeed reasonable. To this end, we empirically analyze various sampling methods that generate fault explanations. We study the representativeness of the produced samples in terms of their estimations about fault explanations and how well they guide diagnostic decisions, and we investigate the impact of sample size, the optimal trade-off between sampling efficiency and effectivity, and how approximate sampling techniques compare to exact ones

    Analysis of 3D Cone-Beam CT Image Reconstruction Performance on a FPGA

    Get PDF
    Efficient and accurate tomographic image reconstruction has been an intensive topic of research due to the increasing everyday usage in areas such as radiology, biology, and materials science. Computed tomography (CT) scans are used to analyze internal structures through capture of x-ray images. Cone-beam CT scans project a cone-shaped x-ray to capture 2D image data from a single focal point, rotating around the object. CT scans are prone to multiple artifacts, including motion blur, streaks, and pixel irregularities, therefore must be run through image reconstruction software to reduce visual artifacts. The most common algorithm used is the Feldkamp, Davis, and Kress (FDK) backprojection algorithm. The algorithm is computationally intensive due to the O(n4) backprojection step, running slowly with large CT data files on CPUs, but exceptionally well on GPUs due to the parallel nature of the algorithm. This thesis will analyze the performance of 3D cone-beam CT image reconstruction implemented in OpenCL on a FPGA embedded into a Power System

    Explanation in constraint satisfaction: A survey

    Get PDF
    Much of the focus on explanation in the field of artificial intelligence has focused on machine learning methods and, in particular, concepts produced by advanced methods such as neural networks and deep learning. However, there has been a long history of explanation generation in the general field of constraint satisfaction, one of the AI's most ubiquitous subfields. In this paper we survey the major seminal papers on the explanation and constraints, as well as some more recent works. The survey sets out to unify many disparate lines of work in areas such as model-based diagnosis, constraint programming, Boolean satisfiability, truth maintenance systems, quantified logics, and related areas

    Geometric Inhomogeneous Random Graphs for Algorithm Engineering

    Get PDF
    The design and analysis of graph algorithms is heavily based on the worst case. In practice, however, many algorithms perform much better than the worst case would suggest. Furthermore, various problems can be tackled more efficiently if one assumes the input to be, in a sense, realistic. The field of network science, which studies the structure and emergence of real-world networks, identifies locality and heterogeneity as two frequently occurring properties. A popular model that captures these properties are geometric inhomogeneous random graphs (GIRGs), which is a generalization of hyperbolic random graphs (HRGs). Aside from their importance to network science, GIRGs can be an immensely valuable tool in algorithm engineering. Since they convincingly mimic real-world networks, guarantees about quality and performance of an algorithm on instances of the model can be transferred to real-world applications. They have model parameters to control the amount of heterogeneity and locality, which allows to evaluate those properties in isolation while keeping the rest fixed. Moreover, they can be efficiently generated which allows for experimental analysis. While realistic instances are often rare, generated instances are readily available. Furthermore, the underlying geometry of GIRGs helps to visualize the network, e.g.,~for debugging or to improve understanding of its structure. The aim of this work is to demonstrate the capabilities of geometric inhomogeneous random graphs in algorithm engineering and establish them as routine tools to replace previous models like the Erd\H{o}s-R{\\u27e}nyi model, where each edge exists with equal probability. We utilize geometric inhomogeneous random graphs to design, evaluate, and optimize efficient algorithms for realistic inputs. In detail, we provide the currently fastest sequential generator for GIRGs and HRGs and describe algorithms for maximum flow, directed spanning arborescence, cluster editing, and hitting set. For all four problems, our implementations beat the state-of-the-art on realistic inputs. On top of providing crucial benchmark instances, GIRGs allow us to obtain valuable insights. Most notably, our efficient generator allows us to experimentally show sublinear running time of our flow algorithm, investigate the solution structure of cluster editing, complement our benchmark set of arborescence instances with a density for which there are no real-world networks available, and generate networks with adjustable locality and heterogeneity to reveal the effects of these properties on our algorithms
    corecore