Search CORE

17 research outputs found

PERFORMANCE OPTIMISATIONS FOR HETEROGENEOUS MANAGED RUNTIME SYSTEMS

Author: Papadimitriou Michail
Publication venue
Publication date: 31/12/2021
Field of study

The University of Manchester - Institutional Repository

Cilk : efficient multithreaded computing

Author: Randall Keith H. (Keith Harold)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 170-179).by Keith H. Randall.Ph.D

DSpace@MIT

Streamroller : A Unified Compilation and Synthesis System for Streaming Applications.

Author: Kudlur Manjunath V.
Publication venue
Publication date
Field of study

The growing complexity of applications has increased the need for higher processing power. In the embedded domain, the convergence of audio, video, and networking on a handheld device has prompted the need for low cost, low power,and high performance implementations of these applications in the form of custom hardware. In a more mainstream domain like gaming consoles, the move towards more realism in physics simulations and graphics has forced the industry towards multicore systems. Many of the applications in these domains are streaming in nature. The key challenge is to get efficient implementations of custom hardware from these applications and map these applications efficiently onto multicore architectures. This dissertation presents a unified methodology, referred to as Streamroller, that can be applied for the problem of scheduling stream programs to multicore architectures and to the problem of automatic synthesis of custom hardware for stream applications. Firstly, a method called stream-graph modulo scheduling is presented, which maps stream programs effectively onto a multicore architecture. Many aspects of a real system, like limited memory and explicit DMAs are modeled in the scheduler. The scheduler is evaluated for a set of stream programs on IBM's Cell processor. Secondly, an automated high-level synthesis system for creating custom hardware for stream applications is presented. The template for the custom hardware is a pipeline of accelerators. The synthesis involves designing loop accelerators for individual kernels, instantiating buffers to store data passed between kernels, and linking these building blocks to form a pipeline. A unique aspect of this system is the use of multifunction accelerators, which improves cost by efficiently sharing hardware between multiple kernels. Finally, a method to improve the integer linear program formulations used in the schedulers that exploits symmetry in the solution space is presented. Symmetry-breaking constraints are added to the formulation, and the performance of the solver is evaluated.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61662/1/kvman_1.pd

Deep Blue Documents at the University of Michigan

Learning for Optimization with Virtual Savant

Author: Massobrio Renzo
Publication venue
Publication date: 25/05/2021
Field of study

Optimization problems arising in multiple fields of study demand efficient algorithms that can exploit modern parallel computing platforms. The remarkable development of machine learning offers an opportunity to incorporate learning into optimization algorithms to efficiently solve large and complex problems. This thesis explores Virtual Savant, a paradigm that combines machine learning and parallel computing to solve optimization problems. Virtual Savant is inspired in the Savant Syndrome, a mental condition where patients excel at a specific ability far above the average. In analogy to the Savant Syndrome, Virtual Savant extracts patterns from previously-solved instances to learn how to solve a given optimization problem in a massively-parallel fashion. In this thesis, Virtual Savant is applied to three optimization problems related to software engineering, task scheduling, and public transportation. The efficacy of Virtual Savant is evaluated in different computing platforms and the experimental results are compared against exact and approximate solutions for both synthetic and realistic instances of the studied problems. Results show that Virtual Savant can find accurate solutions, effectively scale in the problem dimension, and take advantage of the availability of multiple computing resources.Los problemas de optimización que surgen en múltiples campos de estudio demandan algoritmos eficientes que puedan explotar las plataformas modernas de computación paralela. El notable desarrollo del aprendizaje automático ofrece la oportunidad de incorporar el aprendizaje en algoritmos de optimización para resolver problemas complejos y de grandes dimensiones de manera eficiente. Esta tesis explora Savant Virtual, un paradigma que combina aprendizaje automático y computación paralela para resolver problemas de optimización. Savant Virtual está inspirado en el Sı́ndrome de Savant, una condición mental en la que los pacientes se destacan en una habilidad especı́fica muy por encima del promedio. En analogı́a con el sı́ndrome de Savant, Savant Virtual extrae patrones de instancias previamente resueltas para aprender a resolver un determinado problema de optimización de forma masivamente paralela. En esta tesis, Savant Virtual se aplica a tres problemas de optimización relacionados con la ingenierı́a de software, la planificación de tareas y el transporte público. La eficacia de Savant Virtual se evalúa en diferentes plataformas informáticas y los resultados se comparan con soluciones exactas y aproximadas para instancias tanto sintéticas como realistas de los problemas estudiados. Los resultados muestran que Savant Virtual puede encontrar soluciones precisas, escalar eficazmente en la dimensión del problema y aprovechar la disponibilidad de múltiples recursos de cómputo.Fundación Carolina Agencia Nacional de Investigación e Innovación (ANII, Uruguay) Universidad de Cádiz Universidad de la Repúblic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

REDI - Digital Repository of the National Agency of Research and Innovation

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz

Lossy Polynomial Datapath Synthesis

Author: Drane Theo
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/03/2014
Field of study

The design of the compute elements of hardware, its datapath, plays a crucial role in determining the speed, area and power consumption of a device. The building blocks of datapath are polynomial in nature. Research into the implementation of adders and multipliers has a long history and developments in this area will continue. Despite such efficient building block implementations, correctly determining the necessary precision of each building block within a design is a challenge. It is typical that standard or uniform precisions are chosen, such as the IEEE floating point precisions. The hardware quality of the datapath is inextricably linked to the precisions of which it is composed. There is, however, another essential element that determines hardware quality, namely that of the accuracy of the components. If one were to implement each of the official IEEE rounding modes, significant differences in hardware quality would be found. But in the same fashion that standard precisions may be unnecessarily chosen, it is typical that components may be constructed to return one of these correctly rounded results, where in fact such accuracy is far from necessary. Unfortunately if a lesser accuracy is permissible then the techniques that exist to reduce hardware implementation cost by exploiting such freedom invariably produce an error with extremely difficult to determine properties. This thesis addresses the problem of how to construct hardware to efficiently implement fixed and floating-point polynomials while exploiting a global error freedom. This is a form of lossy synthesis. The fixed-point contributions include resource minimisation when implementing mutually exclusive polynomials, the construction of minimal lossy components with guaranteed worst case error and a technique for efficient composition of such components. Contributions are also made to how a floating-point polynomial can be implemented with guaranteed relative error.Open Acces

Spiral - Imperial College Digital Repository

Energy efficient hardware acceleration of multimedia processing tools

Author: Kinane Andrew
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

Irish Universities

DCU Online Research Access Service

Recommended from our members

Numerical simulations of instabilities in general relativity

Author: Kunesch Markus
Publication venue: University of Cambridge
Publication date: 03/10/2018
Field of study

General relativity, one of the pillars of our understanding of the universe, has been a remarkably successful theory. It has stood the test of time for more than 100 years and has passed all experimental tests so far. Most recently, the LIGO collaboration made the first-ever direct detection of gravitational waves, confirming a long-standing prediction of general relativity. Despite this, several fundamental mathematical questions remain unanswered, many of which relate to the global existence and the stability of solutions to Einstein’s equations. This thesis presents our efforts to use numerical relativity to investigate some of these questions. We present a complete picture of the end points of black ring instabilities in five dimensions. Fat rings collapse to Myers-Perry black holes. For intermediate rings, we discover a previously unknown instability that stretches the ring without changing its thickness and causes it to collapse to a Myers-Perry black hole. Most importantly, however, we find that for very thin rings, the Gregory-Laflamme instability dominates and causes the ring to break. This provides the first concrete evidence that in higher dimensions, the weak cosmic censorship conjecture may be violated even in asymptotically flat spacetimes. For Myers-Perry black holes, we investigate instabilities in five and six dimensions. In six dimensions, we demonstrate that both axisymmetric and non-axisymmetric instabilities can cause the black hole to pinch off, and we study the approach to the naked singularity in detail. Another question that has attracted intense interest recently is the instability of anti-de Sitter space. In this thesis, we explore how breaking spherical symmetry in gravitational collapse in anti-de Sitter space affects black hole formation. These findings were made possible by our new open source general relativity code, GRChombo, whose adaptive mesh capabilities allow accurate simulations of phenomena in which new length scales are produced dynamically. In this thesis, we describe GRChombo in detail, and analyse its performance on the latest supercomputers. Furthermore, we outline numerical advances that were necessary for simulating higher dimensional black holes stably and efficiently.My PhD was funded by an STFC studentship initially and by the European Research Council Grant No. ERC-2014-StG 639022-NewNGR in my final year. Furthermore, I received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant agreement No. 690904. The simulations presented in this thesis were carried out on the following supercomputers: *) The COSMOS Shared Memory system at DAMTP, University of Cambridge, operated on behalf of the STFC DiRAC HPC Facility. This sytem is funded by BIS National E-infrastructure capital Grant No.~ST/ J005673/1 and STFC Grants No.~ST/H008586/1, No.~ST/K00333X/1. *) MareNostrum III and MareNostrum IV at the Barcelona Supercomputing Centre through the grants FI-2016-3-0006 and PRACE Tier-0 PPFPWG respectively. *) Stampede and Stampede2 at the Texas Advanced Computing Center, University of Texas at Austin, through the NSF-XSEDE grant No.~PHY-090003 and an allocation provided by Intel for their Parallel Computing Centres. *) SuperMike-II at Louisiana State University under allocation NUMREL06. *) Cartesius, SURFsara, in the Netherlands through the PRACE DECI grant NRBA

Apollo (Cambridge)

29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

Author: ISAAC <29. 2018, Jiaoxi, Yilan>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/12/2018
Field of study

Digitale Bibliothek Thüringen

MIMO Systems

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

In recent years, it was realized that the MIMO communication systems seems to be inevitable in accelerated evolution of high data rates applications due to their potential to dramatically increase the spectral efficiency and simultaneously sending individual information to the corresponding users in wireless systems. This book, intends to provide highlights of the current research topics in the field of MIMO system, to offer a snapshot of the recent advances and major issues faced today by the researchers in the MIMO related areas. The book is written by specialists working in universities and research centers all over the world to cover the fundamental principles and main advanced topics on high data rates wireless communications systems over MIMO channels. Moreover, the book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

Directory of Open Access Books (DOAB)