Search CORE

212 research outputs found

Design and optimisation of scientific programs in a categorical language

Author: Ashby Thomas James
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

This thesis presents an investigation into the use of advanced computer languages for scientific computing, an examination of performance issues that arise from using such languages for such a task, and a step toward achieving portable performance from compilers by attacking these problems in a way that compensates for the complexity of and differences between modern computer architectures. The language employed is Aldor, a functional language from computer algebra, and the scientific computing area is a subset of the family of iterative linear equation solvers applied to sparse systems. The linear equation solvers that are considered have much common structure, and this is factored out and represented explicitly in the lan-guage as a framework, by means of categories and domains. The flexibility introduced by decomposing the algorithms and the objects they act on into separate modules has a strong performance impact due to its negative effect on temporal locality. This necessi-tates breaking the barriers between modules to perform cross-component optimisation. In this instance the task reduces to one of collective loop fusion and array contrac

CiteSeerX

Edinburgh Research Archive

Parallelization of dynamic programming recurrences in computational biology

Author: Jacob Arpith
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

Washington University St. Louis: Open Scholarship

Beyond shared memory loop parallelism in the polyhedral model

Author: Yuki Tomofumi
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2013
Field of study

2013 Spring.Includes bibliographical references.With the introduction of multi-core processors, motivated by power and energy concerns, parallel processing has become main-stream. Parallel programming is much more difficult due to its non-deterministic nature, and because of parallel programming bugs that arise from non-determinacy. One solution is automatic parallelization, where it is entirely up to the compiler to efficiently parallelize sequential programs. However, automatic parallelization is very difficult, and only a handful of successful techniques are available, even after decades of research. Automatic parallelization for distributed memory architectures is even more problematic in that it requires explicit handling of data partitioning and communication. Since data must be partitioned among multiple nodes that do not share memory, the original memory allocation of sequential programs cannot be directly used. One of the main contributions of this dissertation is the development of techniques for generating distributed memory parallel code with parametric tiling. Our approach builds on important contributions to the polyhedral model, a mathematical framework for reasoning about program transformations. We show that many affine control programs can be uniformized only with simple techniques. Being able to assume uniform dependences significantly simplifies distributed memory code generation, and also enables parametric tiling. Our approach implemented in the AlphaZ system, a system for prototyping analyses, transformations, and code generators in the polyhedral model. The key features of AlphaZ are memory re-allocation, and explicit representation of reductions. We evaluate our approach on a collection of polyhedral kernels from the PolyBench suite, and show that our approach scales as well as PLuTo, a state-of-the-art shared memory automatic parallelizer using the polyhedral model. Automatic parallelization is only one approach to dealing with the non-deterministic nature of parallel programming that leaves the difficulty entirely to the compiler. Another approach is to develop novel parallel programming languages. These languages, such as X10, aim to provide highly productive parallel programming environment by including parallelism into the language design. However, even in these languages, parallel bugs remain to be an important issue that hinders programmer productivity. Another contribution of this dissertation is to extend the array dataflow analysis to handle a subset of X10 programs. We apply the result of dataflow analysis to statically guarantee determinism. Providing static guarantees can significantly increase programmer productivity by catching questionable implementations at compile-time, or even while programming

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Programming Languages and Systems

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book constitutes the proceedings of the 30th European Symposium on Programming, ESOP 2021, which was held during March 27 until April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The 24 papers included in this volume were carefully reviewed and selected from 79 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

OAPEN Library

Exact Bayesian Inference for Loopy Probabilistic Programs

Author: Blumenthal Christian
Chen Mingshuai
Katoen Joost-Pieter
Klinkenberg Lutz
Publication venue
Publication date: 18/07/2023
Field of study

We present an exact Bayesian inference method for inferring posterior distributions encoded by probabilistic programs featuring possibly unbounded looping behaviors. Our method is built on an extended denotational semantics represented by probability generating functions, which resolves semantic intricacies induced by intertwining discrete probabilistic loops with conditioning (for encoding posterior observations). We implement our method in a tool called Prodigy; it augments existing computer algebra systems with the theory of generating functions for the (semi-)automatic inference and quantitative verification of conditioned probabilistic programs. Experimental results show that Prodigy can handle various infinite-state loopy programs and outperforms state-of-the-art exact inference tools over benchmarks of loop-free programs

arXiv.org e-Print Archive

Die Herausforderungen nichtlinearer Parameter und Variablen in automatischer Schleifenparallelisierung

Author: Größlinger Armin
Publication venue
Publication date: 18/12/2009
Field of study

With the rise of manycore processors, parallelism is becoming a mainstream necessity. Unfortunately, parallel programming is inherently more difficult than sequential programming; therefore, techniques for automatic parallelisation will become indispensable. We aim at extending the well-known polyhedron model, which promises this automation, beyond some of its current restrictions. Up to now, loop bounds and array subscripts in the modelled codes must be expressions linear in both the variables and the parameters. We lift this restriction and allow certain polynomial expressions instead of linear ones. With our extensions, we are able to handle more programs in all phases of the parallelisation process (dependence analysis, transformation of the program model, code generation). We extend Banerjee's classical dependence analysis to handle one non-linear parameter p, i.e., we are able to determine precisely the solutions of the system of conflict equalities for input programs with non-linear array accesses like A[p*i] in dependence of the residue class of p. We make contributions to three transformations desirable in automatic parallelisation. First, we show that using a generalised Simplex algorithm, which we have developed, schedules with non-linear parameters like theta(i)=floor(i/n) can be computed. In addition, such schedules can be expressed easily as a quantifier elimination problem but this approach turns out to be computationally less efficient with the available implementation. As a second transformation, we study parametric tiling which is used to adapt a parallelised program to the number of available processors at run time. Third, we present a localisation technique to exploit scratchpad memories on architectures on which data caching has to be handled by software. We transform a given code such that it keeps values which are reused in successive iterations of a sequential loop in the scratchpad. An access to a value written in an earlier iteration is served from the scratchpad to accelerate the access. In general, this transformation introduces non-linear loop bounds in the transformed model. Finally, we present an algorithm for generating code for arbitrary semi-algebraic iteration sets, i.e., for iteration sets described by polynomial inequalities in the variables and parameters. This is a vast generalisation of existing polyhedral code generation techniques. Although our algorithm is less efficient than polyhedral code generators, this paves the way for a code generator that can handle arbitrary parametric tilings and other transformations which introduce non-linear parameters (like non-linear schedules and the localisation we present) or even non-linear variables

Verifying and Synthesizing Constant-Resource Implementations with Types

Author: Dehesa-Azuara Mario
Fredrikson Matthew
Hoffmann Jan
Ngo Van Chan
Publication venue
Publication date: 05/01/2018
Field of study

We propose a novel type system for verifying that programs correctly implement constant-resource behavior. Our type system extends recent work on automatic amortized resource analysis (AARA), a set of techniques that automatically derive provable upper bounds on the resource consumption of programs. We devise new techniques that build on the potential method to achieve compositionality, precision, and automation. A strict global requirement that a program always maintains constant resource usage is too restrictive for most practical applications. It is sufficient to require that the program's resource behavior remain constant with respect to an attacker who is only allowed to observe part of the program's state and behavior. To account for this, our type system incorporates information flow tracking into its resource analysis. This allows our system to certify programs that need to violate the constant-time requirement in certain cases, as long as doing so does not leak confidential information to attackers. We formalize this guarantee by defining a new notion of resource-aware noninterference, and prove that our system enforces it. Finally, we show how our type inference algorithm can be used to synthesize a constant-time implementation from one that cannot be verified as secure, effectively repairing insecure programs automatically. We also show how a second novel AARA system that computes lower bounds on resource usage can be used to derive quantitative bounds on the amount of information that a program leaks through its resource use. We implemented each of these systems in Resource Aware ML, and show that it can be applied to verify constant-time behavior in a number of applications including encryption and decryption routines, database queries, and other resource-aware functionality.Comment: 30, IEEE S&P 201

arXiv.org e-Print Archive

Crossref

Computer Aided Verification

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2022
Field of study

This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book

Directory of Open Access Books (DOAB)

Proceedings of the 12th Cologne-Twente Workshop on Graphs and Combinatorial Optimization (CTW 2013) - Preface

Author: Hurink Johann L.
Manthey Bodo
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2013
Field of study

University of Twente Research Information

Assessment of a multi-measure functional connectivity approach

Author: Fernandes Miguel Claudino Leão Garrett
Publication venue
Publication date: 01/01/2017
Field of study

Efforts to find differences in brain activity patterns of subjects with neurological and psychiatric disorders that could help in their diagnosis and prognosis have been increasing in recent years and promise to revolutionise clinical practice and our understanding of such illnesses in the future. Resting-state functional magnetic resonance imaging (rsfMRI) data has been increasingly used to evaluate said activity and to characterize the connectivity between distinct brain regions, commonly organized in functional connectivity (FC) matrices. Here, machine learning methods were used to assess the extent to which multiple FC matrices, each determined with a different statistical method, could change classification performance relative to when only one matrix is used, as is common practice. Used statistical methods include correlation, coherence, mutual information, transfer entropy and non-linear correlation, as implemented in the MULAN toolbox. Classification was made using random forests and support vector machine (SVM) classifiers. Besides the previously mentioned objective, this study had three other goals: to individually investigate which of these statistical methods yielded better classification performances, to confirm the importance of the blood-oxygen-level-dependent (BOLD) signal in the frequency range 0.009-0.08 Hz for FC based classifications as well as to assess the impact of feature selection in SVM classifiers. Publicly available rs-fMRI data from the Addiction Connectome Preprocessed Initiative (ACPI) and the ADHD-200 databases was used to perform classification of controls vs subjects with Attention-Deficit/Hyperactivity Disorder (ADHD). Maximum accuracy and macro-averaged f-measure values of 0.744 and 0.677 were respectively achieved in the ACPI dataset and of 0.678 and 0.648 in the ADHD-200 dataset. Results show that combining matrices could significantly improve classification accuracy and macro-averaged f-measure if feature selection is made. Also, the results of this study suggest that mutual information methods might play an important role in FC based classifications, at least when classifying subjects with ADHD

Repositório da Universidade Nova de Lisboa