1,448 research outputs found

    Application-tailored Linear Algebra Algorithms: A search-based Approach

    Full text link
    In this paper, we tackle the problem of automatically generating algorithms for linear algebra operations by taking advantage of problem-specific knowledge. In most situations, users possess much more information about the problem at hand than what current libraries and computing environments accept; evidence shows that if properly exploited, such information leads to uncommon/unexpected speedups. We introduce a knowledge-aware linear algebra compiler that allows users to input matrix equations together with properties about the operands and the problem itself; for instance, they can specify that the equation is part of a sequence, and how successive instances are related to one another. The compiler exploits all this information to guide the generation of algorithms, to limit the size of the search space, and to avoid redundant computations. We applied the compiler to equations arising as part of sensitivity and genome studies; the algorithms produced exhibit, respectively, 100- and 1000-fold speedups

    Inverting Cryptographic Hash Functions via Cube-and-Conquer

    Full text link
    MD4 and MD5 are seminal cryptographic hash functions proposed in early 1990s. MD4 consists of 48 steps and produces a 128-bit hash given a message of arbitrary finite size. MD5 is a more secure 64-step extension of MD4. Both MD4 and MD5 are vulnerable to practical collision attacks, yet it is still not realistic to invert them, i.e. to find a message given a hash. In 2007, the 39-step version of MD4 was inverted via reducing to SAT and applying a CDCL solver along with the so-called Dobbertin's constraints. As for MD5, in 2012 its 28-step version was inverted via a CDCL solver for one specified hash without adding any additional constraints. In this study, Cube-and-Conquer (a combination of CDCL and lookahead) is applied to invert step-reduced versions of MD4 and MD5. For this purpose, two algorithms are proposed. The first one generates inversion problems for MD4 by gradually modifying the Dobbertin's constraints. The second algorithm tries the cubing phase of Cube-and-Conquer with different cutoff thresholds to find the one with minimal runtime estimation of the conquer phase. This algorithm operates in two modes: (i) estimating the hardness of a given propositional Boolean formula; (ii) incomplete SAT-solving of a given satisfiable propositional Boolean formula. While the first algorithm is focused on inverting step-reduced MD4, the second one is not area-specific and so is applicable to a variety of classes of hard SAT instances. In this study, 40-, 41-, 42-, and 43-step MD4 are inverted for the first time via the first algorithm and the estimating mode of the second algorithm. 28-step MD5 is inverted for four hashes via the incomplete SAT-solving mode of the second algorithm. For three hashes out of them this is done for the first time.Comment: 40 pages, 11 figures. A revised submission to JAI

    Farms, pipes, streams and reforestation : reasoning about structured parallel processes using types and hylomorphisms

    Get PDF
    The increasing importance of parallelism has motivated the creation of better abstractions for writing parallel software, including structured parallelism using nested algorithmic skeletons. Such approaches provide high-level abstractions that avoid common problems, such as race conditions, and often allow strong cost models to be defined. However, choosing a combination of algorithmic skeletons that yields good parallel speedups for a program on some specific parallel architecture remains a difficult task. In order to achieve this, it is necessary to simultaneously reason both about the costs of different parallel structures and about the semantic equivalences between them. This paper presents a new type-based mechanism that enables strong static reasoning about these properties. We exploit well-known properties of a very general recursion pattern, hylomorphisms, and give a denotational semantics for structured parallel processes in terms of these hylomorphisms. Using our approach, it is possible to determine formally whether it is possible to introduce a desired parallel structure into a program without altering its functional behaviour, and also to choose a version of that parallel structure that minimises some given cost model.Postprin

    MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators

    Get PDF
    Parallel programming is gaining ground in various domains due to the tremendous computational power that it brings; however, it also requires a substantial code crafting effort to achieve performance improvement. Unfortunately, in most cases, performance tuning has to be accomplished manually by programmers. We argue that automated tuning is necessary due to the combination of the following factors. First, code optimization is machine-dependent. That is, optimization preferred on one machine may be not suitable for another machine. Second, as the possible optimization search space increases, manually finding an optimized configuration is hard. Therefore, developing new compiler techniques for optimizing applications is of considerable interest. This thesis aims at generating new techniques that will help programmers develop efficient algorithms and code targeting hardware acceleration technologies, in a more effective manner. Our work is organized around a compilation framework, called MetaFork, for concurrency platforms and its application to automatic parallelization. MetaFork is a high-level programming language extending C/C++, which combines several models of concurrency including fork-join, SIMD and pipelining parallelism. MetaFork is also a compilation framework which aims at facilitating the design and implementation of concurrent programs through four key features which make MetaFork unique and novel: (1) Perform automatic code translation between concurrency platforms targeting multi-core architectures. (2) Provide a high-level language for expressing concurrency as in the fork-join model, the SIMD paradigm and the pipelining parallelism. (3) Generate parallel code from serial code with an emphasis on code depending on machine or program parameters (e.g. cache size, number of processors, number of threads per thread block). (4) Optimize code depending on parameters that are unknown at compile-time

    木を用いた構造化並列プログラミング

    Get PDF
    High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide-and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzaki’s work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosen’s high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjan’s formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic formalizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations.電気通信大学201

    Faster convergence in seismic history matching by dividing and conquering the unknowns

    Get PDF
    The aim in reservoir management is to control field operations to maximize both the short and long term recovery of hydrocarbons. This often comprises continuous optimization based on reservoir simulation models when the significant unknown parameters have been updated by history matching where they are conditioned to all available data. However, history matching of what is usually a high dimensional problem requires expensive computer and commercial software resources. Many models are generated, particularly if there are interactions between the properties that update and their effects on the misfit that measures the difference between model predictions to observed data. In this work, a novel 'divide and conquer' approach is developed to the seismic history matching method which efficiently searches for the best values of uncertain parameters such as barrier transmissibilities, net:gross, and permeability by matching well and 4D seismic predictions to observed data. The ‘divide’ is carried by applying a second order polynomial regression analysis to identify independent sub-volumes of the parameters hyperspace. These are then ‘conquered’ by searching separately but simultaneously with an adapted version of the quasi-global stochastic neighbourhood algorithm. This 'divide and conquer' approach is applied to the seismic history matching of the Schiehallion field, located on the UK continental shelf. The field model, supplied by the operator, contained a large number of barriers that affect flow at different times during production, and their transmissibilities were largely unknown. There was also some uncertainty in the petrophysical parameters that controlled permeability and net:gross. Application of the method was accomplished because it is found that the misfit function could be successfully represented as sub-misfits each dependent on changes in a smaller number of parameters which then could be searched separately but simultaneously. Ultimately, the number of models required to find a good match reduced by an order of magnitude. Experimental design was used to contribute to the efficiency and the ‘divide and conquer’ approach was also able to separate the misfit on a spatial basis by using time-lapse seismic data in the misfit. The method has effectively gained a greater insight into the reservoir behaviour and has been able to predict flow more accurately with a very efficient 'divide and conquer' approach

    Knowledge-Based Automatic Generation of Linear Algebra Algorithms and Code

    Get PDF
    This dissertation focuses on the design and the implementation of domain-specific compilers for linear algebra matrix equations. The development of efficient libraries for such equations, which lie at the heart of most software for scientific computing, is a complex process that requires expertise in a variety of areas, including the application domain, algorithms, numerical analysis and high-performance computing. Moreover, the process involves the collaboration of several people for a considerable amount of time. With our compilers, we aim to relieve the developers from both designing algorithms and writing code, and to generate routines that match or even surpass the performance of those written by human experts.Comment: Dissertatio

    On the Interoperability of Programming Languages based on the Fork-Join Parallelism Model

    Get PDF
    This thesis describes the implementation of MetaFork, a meta-language for concurrency platforms targeting multicore architectures. First of all, MetaFork is a multithreaded language based on the fork-join model of concurrency: it allows the programmer to express parallel algorithms assuming that tasks are dynamically scheduled at run-time. While MetaFork makes no assumption about the run-time system, it formally defines the serial C-elision of a MetaFork program. In addition, MetaFork is a suite of source-to-source compilers permitting the automatic translation of multithreaded programs between programming languages based on the fork-join model. Currently, this compilation framework supports the OpenMP and CilkPlus concurrency platforms. The implementation of those compilers explicitly manages parallelism according to the directives specified in MetaFork, OpenMP and CilkPlus. We evaluate experimentally the benefits of MetaFork. First, we show that this framework can be used to perform comparative implementation of a given multi- threaded algorithm so as to narrow performance bottlenecks in one implementation of this algorithm. Secondly, we show that the translation of hand written and highly optimized code within MetaFork generally produces code with similar performance as the original

    A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms

    Get PDF
    The benefits of automating design cycles for Bayesian inference-based algorithms are becoming increasingly recognized by the machine learning community. As a result, interest in probabilistic programming frameworks has much increased over the past few years. This paper explores a specific probabilistic programming paradigm, namely message passing in Forney-style factor graphs (FFGs), in the context of automated design of efficient Bayesian signal processing algorithms. To this end, we developed "ForneyLab" (https://github.com/biaslab/ForneyLab.jl) as a Julia toolbox for message passing-based inference in FFGs. We show by example how ForneyLab enables automatic derivation of Bayesian signal processing algorithms, including algorithms for parameter estimation and model comparison. Crucially, due to the modular makeup of the FFG framework, both the model specification and inference methods are readily extensible in ForneyLab. In order to test this framework, we compared variational message passing as implemented by ForneyLab with automatic differentiation variational inference (ADVI) and Monte Carlo methods as implemented by state-of-the-art tools "Edward" and "Stan". In terms of performance, extensibility and stability issues, ForneyLab appears to enjoy an edge relative to its competitors for automated inference in state-space models.Comment: Accepted for publication in the International Journal of Approximate Reasonin
    corecore