Search CORE

414 research outputs found

Separation logic for high-level synthesis

Author: Winterstein Felix
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/05/2016
Field of study

High-level synthesis (HLS) promises a significant shortening of the digital hardware design cycle by raising the abstraction level of the design entry to high-level languages such as C/C++. However, applications using dynamic, pointer-based data structures remain difficult to implement well, yet such constructs are widely used in software. Automated optimisations that leverage the memory bandwidth of dedicated hardware implementations by distributing the application data over separate on-chip memories and parallelise the implementation are often ineffective in the presence of dynamic data structures, due to the lack of an automated analysis that disambiguates pointer-based memory accesses. This thesis takes a step towards closing this gap. We explore recent advances in separation logic, a rigorous mathematical framework that enables formal reasoning about the memory access of heap-manipulating programs. We develop a static analysis that automatically splits heap-allocated data structures into provably disjoint regions. Our algorithm focuses on dynamic data structures accessed in loops and is accompanied by automated source-to-source transformations which enable loop parallelisation and physical memory partitioning by off-the-shelf HLS tools. We then extend the scope of our technique to pointer-based memory-intensive implementations that require access to an off-chip memory. The extended HLS design aid generates parallel on-chip multi-cache architectures. It uses the disjointness property of memory accesses to support non-overlapping memory regions by private caches. It also identifies regions which are shared after parallelisation and which are supported by parallel caches with a coherency mechanism and synchronisation, resulting in automatically specialised memory systems. We show up to 15x acceleration from heap partitioning, parallelisation and the insertion of the custom cache system in demonstrably practical applications.Open Acces

Spiral - Imperial College Digital Repository

A virtualisation framework for embedded systems

Author: Penneman Niels
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Abstraction Raising in General-Purpose Compilers

Author: Chelini Lorenzo
Publication venue: Eindhoven University of Technology
Publication date: 31/08/2021
Field of study

Pure OAI Repository

Natural language software registry (second edition)

Author: Hinkelman Elizabeth
Jung Christoph
Vonerden Markus
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1993
Field of study

Universaar

Acronym

Q(sqrt(-3))-Integral Points on a Mordell Curve

Author: Bianchi Francesca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We use an extension of quadratic Chabauty to number fields,recently developed by the author with Balakrishnan, Besser and M ̈uller,combined with a sieving technique, to determine the integral points overQ(√−3) on the Mordell curve y2 = x3 − 4

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Learning to see across domains and modalities

Author: Carlucci FABIO MARIA
Publication venue
Publication date: 13/02/2019
Field of study

Deep learning has recently raised hopes and expectations as a general solution for many applications (computer vision, natural language processing, speech recognition, etc.); indeed it has proven effective, but it also showed a strong dependence on large quantities of data. Generally speaking, deep learning models are especially susceptible to overfitting, due to their large number of internal parameters. Luckily, it has also been shown that, even when data is scarce, a successful model can be trained by reusing prior knowledge. Thus, developing techniques for \textit{transfer learning} (as this process is known), in its broadest definition, is a crucial element towards the deployment of effective and accurate intelligent systems into the real world. This thesis will focus on a family of transfer learning methods applied to the task of visual object recognition, specifically image classification. The visual recognition problem is central to computer vision research: many desired applications, from robotics to information retrieval, demand the ability to correctly identify categories, places, and objects. Transfer learning is a general term, and specific settings have been given specific names: when the learner has access to only unlabeled data from the target domain (where the model should perform) and labeled data from a different domain (the source), the problem is called unsupervised domain adaptation (DA). The first part of this thesis will focus on three methods for this setting. The three presented techniques for domain adaptation are fully distinct: the first one proposes the use of Domain Alignment layers to structurally align the distributions of the source and target domains in feature space. While the general idea of aligning feature distribution is not novel, we distinguish our method by being one of the very few that do so without adding losses. The second method is based on GANs: we propose a bidirectional architecture that jointly learns how to map the source images into the target visual style and vice-versa, thus alleviating the domain shift at the pixel level. The third method features an adversarial learning process that transforms both the images and the features of both domains in order to map them to a common, agnostic, space. While the first part of the thesis presented general purpose DA methods, the second part will focus on the real life issues of robotic perception, specifically RGB-D recognition. Robotic platforms are usually not limited to color perception; very often they also carry a Depth camera. Unfortunately, the depth modality is rarely used for visual recognition due to the lack of pretrained models from which to transfer and little data to train one on from scratch. We will first explore the use of synthetic data as proxy for real images by training a Convolutional Neural Network (CNN) on virtual depth maps, rendered from 3D CAD models, and then testing it on real robotic datasets. The second approach leverages the existence of RGB pretrained models, by learning how to map the depth data into the most discriminative RGB representation and then using existing models for recognition. This second technique is actually a pretty generic Transfer Learning method which can be applied to share knowledge across modalities

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

A Dynamically Scheduled HLS Flow in MLIR

Author: Petersen Morten Borup
Publication venue
Publication date: 21/02/2022
Field of study

In High-Level Synthesis (HLS), we consider abstractions that span from software to hardware and target heterogeneous architectures. Therefore, managing the complexity introduced by this is key to implementing good, maintainable, and extendible HLS compilers. Traditionally, HLS flows have been built on top of software compilation infrastructure such as LLVM, with hardware aspects of the flow existing peripherally to the core of the compiler. Through this work, we aim to show that MLIR, a compiler infrastructure with a focus on domain-specific intermediate representations (IR), is a better infrastructure for HLS compilers. Using MLIR, we define HLS and hardware abstractions as first-class citizens of the compiler, simplifying analysis, transformations, and optimization. To demonstrate this, we present a C-to-RTL, dynamically scheduled HLS flow. We find that our flow generates circuits comparable to those of an equivalent LLVM-based HLS compiler. Notably, we achieve this while lacking key optimization passes typically found in HLS compilers and through the use of an experimental front-end. To this end, we show that significant improvements in the generated RTL are but low-hanging fruit, requiring engineering effort to attain. We believe that our flow is more modular and more extendible than comparable open-source HLS compilers and is thus a good candidate as a basis for future research. Apart from the core HLS flow, we provide MLIR-based tooling for C-to-RTL cosimulation and visual debugging, with the ultimate goal of building an MLIR-based HLS infrastructure that will drive innovation in the field

Infoscience - École polytechnique fédérale de Lausanne

A framework for network traffic analysis using GPUs

Author: Suñé Clos Marc
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2010
Field of study

During the last years the computer networks have become an important part of our society. Networks have kept growing in size and complexity, making more complex its management and traffic monitoring and analysis processes, due to the huge amount of data and calculations involved. In the last decade, several researchers found effective to use graphics processing units (GPUs) rather than a traditional processors (CPU) to boost the execution of some algorithms not related to graphics (GPGPU). In 2006 the GPU chip manufacturer NVIDIA launched CUDA, a library that allows software developers to use their GPUs to perform general purpose algorithm calculations, using the C programming language. This thesis presents a framework which tries to simplify the task of programming network traffic analysis with CUDA to software developers. The objectives of the framework have been abstracting the task of obtaining network packets, simplify the task of creating network analysis programs using CUDA and offering an easy way to reuse the analysis code. Several network traffic analysis have also been developed

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Towards justifying computer algebra algorithms in Isabelle/HOL

Author: Li Wenda
Publication venue: University of Cambridge
Publication date: 05/02/2019
Field of study

As verification efforts using interactive theorem proving grow, we are in need of certified algorithms in computer algebra to tackle problems over the real numbers. This is important because uncertified procedures can drastically increase the size of the trust base and under- mine the overall confidence established by interactive theorem provers, which usually rely on a small kernel to ensure the soundness of derived results. This thesis describes an ongoing effort using the Isabelle theorem prover to certify the cylindrical algebraic decomposition (CAD) algorithm, which has been widely implemented to solve non-linear problems in various engineering and mathematical fields. Because of the sophistication of this algorithm, people are in doubt of the correctness of its implementation when deploying it to safety-critical verification projects, and such doubts motivate this thesis. In particular, this thesis proposes a library of real algebraic numbers, whose distinguishing features include a modular architecture and a sign determination algorithm requiring only rational arithmetic. With this library, an Isabelle tactic based on univariate CAD has been built in a certificate-based way: external, untrusted code delivers solutions in the form of certificates that are checked within Isabelle. To lay the foundation for the multivariate case, I have formalised various analytical results including Cauchy’s residue theorem and the bivariate case of the projection theorem of CAD. During this process, I have also built a tactic to evaluate winding numbers through Cauchy indices and verified procedures to count complex roots in some domains. The formalisation effort in this thesis can be considered as the first step towards a certified computer algebra system inside a theorem prover, so that various engineering projections and mathematical calculations can be carried out in a high-confidence framework

Apollo (Cambridge)