466 research outputs found

    Boosting Answer Set Optimization with Weighted Comparator Networks

    Get PDF
    Answer set programming (ASP) is a paradigm for modeling knowledge intensive domains and solving challenging reasoning problems. In ASP solving, a typical strategy is to preprocess problem instances by rewriting complex rules into simpler ones. Normalization is a rewriting process that removes extended rule types altogether in favor of normal rules. Recently, such techniques led to optimization rewriting in ASP, where the goal is to boost answer set optimization by refactoring the optimization criteria of interest. In this paper, we present a novel, general, and effective technique for optimization rewriting based on comparator networks, which are specific kinds of circuits for reordering the elements of vectors. The idea is to connect an ASP encoding of a comparator network to the literals being optimized and to redistribute the weights of these literals over the structure of the network. The encoding captures information about the weight of an answer set in auxiliary atoms in a structured way that is proven to yield exponential improvements during branch-and-bound optimization on an infinite family of example programs. The used comparator network can be tuned freely, e.g., to find the best size for a given benchmark class. Experiments show accelerated optimization performance on several benchmark problems.Comment: 36 page

    Hexarray: A Novel Self-Reconfigurable Hardware System

    Get PDF
    Evolvable hardware (EHW) is a powerful autonomous system for adapting and finding solutions within a changing environment. EHW consists of two main components: a reconfigurable hardware core and an evolutionary algorithm. The majority of prior research focuses on improving either the reconfigurable hardware or the evolutionary algorithm in place, but not both. Thus, current implementations suffer from being application oriented and having slow reconfiguration times, low efficiencies, and less routing flexibility. In this work, a novel evolvable hardware platform is proposed that combines a novel reconfigurable hardware core and a novel evolutionary algorithm. The proposed reconfigurable hardware core is a systolic array, which is called HexArray. HexArray was constructed using processing elements with a redesigned architecture, called HexCells, which provide routing flexibility and support for hybrid reconfiguration schemes. The improved evolutionary algorithm is a genome-aware genetic algorithm (GAGA) that accelerates evolution. Guided by a fitness function the GAGA utilizes context-aware genetic operators to evolve solutions. The operators are genome-aware constrained (GAC) selection, genome-aware mutation (GAM), and genome-aware crossover (GAX). The GAC selection operator improves parallelism and reduces the redundant evaluations. The GAM operator restricts the mutation to the part of the genome that affects the selected output. The GAX operator cascades, interleaves, or parallel-recombines genomes at the cell level to generate better genomes. These operators improve evolution while not limiting the algorithm from exploring all areas of a solution space. The system was implemented on a SoC that includes a programmable logic (i.e., field-programmable gate array) to realize the HexArray and a processing system to execute the GAGA. A computationally intensive application that evolves adaptive filters for image processing was chosen as a case study and used to conduct a set of experiments to prove the developed system robustness. Through an iterative process using the genetic operators and a fitness function, the EHW system configures and adapts itself to evolve fitter solutions. In a relatively short time (e.g., seconds), HexArray is able to evolve autonomously to the desired filter. By exploiting the routing flexibility in the HexArray architecture, the EHW has a simple yet effective mechanism to detect and tolerate faulty cells, which improves system reliability. Finally, a mechanism that accelerates the evolution process by hiding the reconfiguration time in an “evolve-while-reconfigure” process is presented. In this process, the GAGA utilizes the array routing flexibility to bypass cells that are being configured and evaluates several genomes in parallel

    Iterative Schedule Optimization for Parallelization in the Polyhedron Model

    Get PDF
    In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program. Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization. We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm. To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible. Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized

    Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs

    Get PDF
    [Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs only outperform their dense counterparts for extremely high levels of sparsity. However, pruning up to such sparsity levels can seriously harm the accuracy of the Neural Networks (NNs). To alleviate this, novel performance-aware pruning techniques favor the generation of more regular sparse matrices that can improve the exploitation of the underlying hardware. Nevertheless, an important drawback is that these techniques heavily condition the location of the non-pruned values, which can strongly degrade the accuracy of the models. This paper focuses on improving the performance of the SpMM routine on DL workloads by combining performance-aware pruning with pruning-independent SpMM kernels to relax input-format constraints. We start with a microarchitecture-level performance study of SOTA SpMM implementations to identify their main bottlenecks and flaws. Then, the paper centers on maximizing the performance of the routine by adjusting the parameters of performance-aware pruning techniques to the hardware properties. This second study explains the intrinsic causes of the observed performance results. We show that, following this approach, a generic SpMM routine can perform up to 49% and 77% better for half and single precision, respectively, than using non-performance-aware pruning, providing speedups over cuBlas of up to 1.87× and 4.20×, respectively. Additionally, the performance achieved on half precision is boosted with a new Ampere-ready specialized implementation for the columnvector sparse format, CLASP, which achieves a 2.42× speedup over cuBlas. Finally, we also introduce ad-colPrune, a novel pruning technique that widens the design space of possible trade-offs between performance and accuracy. © 2022 Association for Computing Machinery.Xunta de Galicia; ED431C 2021/30Xunta de Galicia; ED431G 2019/01This research was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00, AEI, 10.13039/501100011033), the Ministry of Education (predoctoral grant of Roberto L. Castro, FPU19/03974), and by Xunta de Galicia under the Consolidation Program of Competitive Reference Groups (ED431C 2021/30). We also acknowledge the support from CITIC, funded by Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, ED431G 2019/01). Finally, we acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of their computers

    Clay Minerals in Hydrothermal Systems

    Get PDF
    The study of active and fossil hydrothermal systems shows clay minerals to be a fundamental tool for the identification and characterization of hydrothermal alteration facies. The occurrence and composition of hydrothermal alteration facies could provide useful information on the physicochemical conditions of the hydrothermal activity affecting a rock volume. In particular, clay minerals (i.e., smectite group, chlorite, illite, kaoline group, pyrophyllite, biotite) are pivotal for extrapolating important parameters that strongly affect the development of water/rock interaction processes such as the temperature and pH of the hydrothermal environment. This work aims to give a general reference scheme concerning the occurrence of clay minerals in hydrothermal alteration paragenesis, their significance, and the information that can be deduced by their presence and chemical composition, with some examples from active and fossil hydrothermal systems around the world. The main mineralogical geothermometers based on chlorite and illite composition are presented, together with the use of hydrogen and oxygen isotope investigation of clay minerals in hydrothermal systems. These techniques provide a useful tool for the reconstruction of the origin and evolution of fluids involved in hydrothermal alteration. Finally, a list of oxygen and hydrogen fractionation factor equations between the main clay minerals and water is also provided

    On continuous maximum ow image segmentation algorithm

    Get PDF
    Ces dernières années avec les progrès matériels, les dimensions et le contenu des images acquises se sont complexifiés de manière notable. Egalement, le différentiel de performance entre les architectures classiques mono-processeur et parallèles est passé résolument en faveur de ces dernières. Pourtant, les manières de programmer sont restées largement les mêmes, instituant un manque criant de performance même sur ces architectures. Dans cette thèse, nous explorons en détails un algorithme particulier, les flots maximaux continus. Nous explicitons pourquoi cet algorithme est important et utile, et nous proposons plusieurs implémentations sur diverses architectures, du mono-processeur à l'architecture SMP et NUMA, ainsi que sur les architectures massivement parallèles des GPGPU. Nous explorons aussi des applications et nous évaluons ses performances sur des images de grande taille en science des matériaux et en biologie à l'échelle nanoIn recent years, with the advance of computing equipment and image acquisition techniques, the sizes, dimensions and content of acquired images have increased considerably. Unfortunately as time passes there is a steadily increasing gap between the classical and parallel programming paradigms and their actual performance on modern computer hardware. In this thesis we consider in depth one particular algorithm, the continuous maximum flow computation. We review in detail why this algorithm is useful and interesting, and we propose efficient and portable implementations on various architectures. We also examine how it performs in the terms of segmentation quality on some recent problems of materials science and nano-scale biologyPARIS-EST-Université (770839901) / SudocSudocFranceF
    corecore