17 research outputs found
Cilk : efficient multithreaded computing
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 170-179).by Keith H. Randall.Ph.D
Streamroller : A Unified Compilation and Synthesis System for Streaming Applications.
The growing complexity of applications has increased the need for higher processing power. In the embedded domain, the convergence of audio, video, and networking on a handheld device has prompted the need for low cost, low power,and high performance implementations of these applications in the form of custom
hardware. In a more mainstream domain like gaming consoles, the move towards more realism in physics simulations and graphics has forced the industry towards multicore systems. Many of the applications in these domains are streaming in nature. The key challenge is to get efficient implementations of custom hardware from these applications and map these applications efficiently onto multicore architectures.
This dissertation presents a unified methodology, referred to as Streamroller, that can be applied for the problem of scheduling stream programs to multicore architectures and to the problem of automatic synthesis of
custom hardware for stream applications. Firstly, a method called stream-graph modulo scheduling is presented, which maps stream programs effectively onto a multicore architecture. Many aspects of a real system, like
limited memory and explicit DMAs are modeled in the scheduler. The scheduler is evaluated for a set of stream programs on IBM's Cell processor.
Secondly, an automated high-level synthesis system for creating custom hardware for stream applications is presented. The template for the custom hardware is a pipeline of accelerators. The synthesis involves designing loop accelerators for individual kernels, instantiating buffers to store data passed between kernels, and linking these building blocks to form a pipeline. A unique aspect of this system is the use of multifunction accelerators, which improves cost by
efficiently sharing hardware between multiple kernels.
Finally, a method to improve the integer linear program formulations used in the schedulers that exploits symmetry in the solution space is
presented. Symmetry-breaking constraints are added to the formulation, and the performance of the solver is evaluated.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61662/1/kvman_1.pd
Learning for Optimization with Virtual Savant
Optimization problems arising in multiple fields of study demand efficient algorithms that can exploit modern parallel computing platforms. The remarkable development of machine learning offers an opportunity to incorporate learning into optimization algorithms to efficiently solve large and complex problems. This thesis explores Virtual Savant, a paradigm that combines machine learning and parallel computing to solve optimization problems. Virtual Savant is inspired in the Savant Syndrome, a mental condition where patients excel at a specific ability far above the average. In analogy to the Savant Syndrome, Virtual Savant extracts patterns from previously-solved instances to learn how to solve a given optimization problem in a massively-parallel fashion. In this thesis, Virtual Savant is applied to three optimization problems related to software engineering, task scheduling, and public transportation. The efficacy of Virtual Savant is evaluated in different computing platforms and the experimental results are compared against exact and approximate solutions for both synthetic and realistic instances of the studied problems. Results show that Virtual Savant can find accurate solutions, effectively scale in the problem dimension, and take advantage of the availability of multiple computing resources.Los problemas de optimizaci贸n que surgen en m煤ltiples campos de estudio demandan algoritmos eficientes que puedan explotar las plataformas modernas de computaci贸n paralela. El notable desarrollo del aprendizaje autom谩tico ofrece la oportunidad de incorporar el aprendizaje en algoritmos de optimizaci贸n para resolver problemas complejos y de grandes dimensiones de manera eficiente. Esta tesis explora Savant Virtual, un paradigma que combina aprendizaje autom谩tico y computaci贸n paralela para resolver problemas de optimizaci贸n. Savant Virtual est谩 inspirado en el S谋虂ndrome de Savant, una condici贸n mental en la que los pacientes se destacan en una habilidad espec谋虂fica muy por encima del promedio. En analog谋虂a con el s谋虂ndrome de Savant, Savant Virtual extrae patrones de instancias previamente resueltas para aprender a resolver un determinado problema de optimizaci贸n de forma masivamente paralela. En esta tesis, Savant Virtual se aplica a tres problemas de optimizaci贸n relacionados con la ingenier谋虂a de software, la planificaci贸n de tareas y el transporte p煤blico. La eficacia de Savant Virtual se eval煤a en diferentes plataformas inform谩ticas y los resultados se comparan con soluciones exactas y aproximadas para instancias tanto sint茅ticas como realistas de los problemas estudiados. Los resultados muestran que Savant Virtual puede encontrar soluciones precisas, escalar
eficazmente en la dimensi贸n del problema y aprovechar la disponibilidad de m煤ltiples recursos de c贸mputo.Fundaci贸n Carolina
Agencia Nacional de Investigaci贸n e Innovaci贸n (ANII, Uruguay)
Universidad de C谩diz
Universidad de la Rep煤blic
Lossy Polynomial Datapath Synthesis
The design of the compute elements of hardware, its datapath, plays a crucial role in determining the speed, area and power consumption of a device. The building blocks of datapath are polynomial in nature. Research into the implementation of adders and multipliers has a long history and developments in this area will continue. Despite such efficient building block implementations, correctly determining the necessary precision of each building block within a design is a challenge. It is typical that standard or uniform precisions are chosen, such as the IEEE floating point precisions. The hardware quality of the datapath is inextricably linked to the precisions of which it is composed. There is, however, another essential element that determines hardware quality, namely that of the accuracy of the components. If one were to implement each of the official IEEE rounding modes, significant differences in hardware quality would be found. But in the same fashion that standard precisions may be unnecessarily chosen, it is typical that components may be constructed to return one of these correctly rounded results, where in fact such accuracy is far from necessary. Unfortunately if a lesser accuracy is permissible then the techniques that exist to reduce hardware implementation cost by exploiting such freedom invariably produce an error with extremely difficult to determine properties.
This thesis addresses the problem of how to construct hardware to efficiently implement fixed and floating-point polynomials while exploiting a global error freedom. This is a form of lossy synthesis. The fixed-point contributions include resource minimisation when implementing mutually exclusive polynomials, the construction of minimal lossy components with guaranteed worst case error and a technique for efficient composition of such components. Contributions are also made to how a floating-point polynomial can be implemented with guaranteed relative error.Open Acces
Energy efficient hardware acceleration of multimedia processing tools
The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores.
To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature.
The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings
Recommended from our members
Numerical simulations of instabilities in general relativity
General relativity, one of the pillars of our understanding of the universe, has been a remarkably successful theory. It has stood the test of time for more than 100 years and has passed all experimental tests so far. Most recently, the LIGO collaboration made the first-ever direct detection of gravitational waves, confirming a long-standing prediction of general relativity. Despite this, several fundamental mathematical questions remain unanswered, many of which relate to the global existence and the stability of solutions to Einstein鈥檚 equations. This thesis presents our efforts to use numerical relativity to investigate some of these questions.
We present a complete picture of the end points of black ring instabilities in five dimensions. Fat rings collapse to Myers-Perry black holes. For intermediate rings, we discover a previously unknown instability that stretches the ring without changing its thickness and causes it to collapse to a Myers-Perry black hole. Most importantly, however, we find that for very thin rings, the Gregory-Laflamme instability dominates and causes the ring to break. This provides the first concrete evidence that in higher dimensions, the weak cosmic censorship conjecture may be violated even in asymptotically flat spacetimes.
For Myers-Perry black holes, we investigate instabilities in five and six dimensions. In six dimensions, we demonstrate that both axisymmetric and non-axisymmetric instabilities can cause the black hole to pinch off, and we study the approach to the naked singularity in detail.
Another question that has attracted intense interest recently is the instability of anti-de Sitter space. In this thesis, we explore how breaking spherical symmetry in gravitational collapse in anti-de Sitter space affects black hole formation.
These findings were made possible by our new open source general relativity code, GRChombo, whose adaptive mesh capabilities allow accurate simulations of phenomena in which new length scales are produced dynamically. In this thesis, we describe GRChombo in detail, and analyse its performance on the latest supercomputers. Furthermore, we outline numerical advances that were necessary for simulating higher dimensional black holes stably and efficiently.My PhD was funded by an STFC studentship initially and by the European Research Council Grant No. ERC-2014-StG 639022-NewNGR in my final year. Furthermore, I received funding from the European Union鈥檚 Horizon 2020 research and innovation programme under the Marie Sk艂odowska-Curie Grant agreement No. 690904.
The simulations presented in this thesis were carried out on the following supercomputers:
*) The COSMOS Shared Memory system at DAMTP, University of Cambridge, operated on behalf of the STFC DiRAC HPC Facility. This sytem is funded by BIS National E-infrastructure capital Grant No.~ST/ J005673/1 and STFC Grants No.~ST/H008586/1, No.~ST/K00333X/1.
*) MareNostrum III and MareNostrum IV at the Barcelona Supercomputing Centre through the grants FI-2016-3-0006 and PRACE Tier-0 PPFPWG respectively.
*) Stampede and Stampede2 at the Texas Advanced Computing Center, University of Texas at Austin, through the NSF-XSEDE grant No.~PHY-090003 and an allocation provided by Intel for their Parallel Computing Centres.
*) SuperMike-II at Louisiana State University under allocation NUMREL06.
*) Cartesius, SURFsara, in the Netherlands through the PRACE DECI grant NRBA
MIMO Systems
In recent years, it was realized that the MIMO communication systems seems to be inevitable in accelerated evolution of high data rates applications due to their potential to dramatically increase the spectral efficiency and simultaneously sending individual information to the corresponding users in wireless systems. This book, intends to provide highlights of the current research topics in the field of MIMO system, to offer a snapshot of the recent advances and major issues faced today by the researchers in the MIMO related areas. The book is written by specialists working in universities and research centers all over the world to cover the fundamental principles and main advanced topics on high data rates wireless communications systems over MIMO channels. Moreover, the book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity