Search CORE

622 research outputs found

Recommended from our members

Timing models for high-level synthesis

Author: Chaiyakul Viraphol
Gajski Daniel D.
Wu Allen C.H.
Publication venue: eScholarship, University of California
Publication date: 31/10/1991
Field of study

In this paper, we describe a timing model for clock estimation during high-level synthesis. In order to obtain realistic timing estimates, the proposed model considers all delay elements, including datapath, control and wire delays, and several technology factors, such as layout architecture, technology mapping, buffers insertion and loading effects. The experimental results show that this model can provide much better estimates than previous models. This model is well suited for automatic and interactive synthesis as well as feedback-driven synthesis where performance matrices must be rapidly and incrementally calculated

eScholarship - University of California

Recommended from our members

Harmonic scheduling of linear recurrences in digital filter design

Author: Dutt Nikil
Nicolau Alexandru
Wang Haigeng
Publication venue: eScholarship, University of California
Publication date: 14/02/1992
Field of study

Linear difference equations involving recurrences are fundamental equations that describe many important signal processing applications. For many high sample rate digital filter applications, we need to effectively parallelize the linear difference equations used to describe digital filters - a difficult task due to the recurrences inherent in the data dependences. We present a novel approach, Harmonic Scheduling, that exploits parallelism in these recurrences beyond loop-carried dependencies, and which generates optimal schedules for parallel evaluation of linear difference equations with resource constraints. This approach also enables us to derive a parallel schedule with minimum control overhead, given an execution time with resource constraints. We also present a Harmonic Scheduling algorithm that generates optimal schedules for digital filters described by second-order difference equations with resource constraints

eScholarship - University of California

VLSI design of high-speed adders for digital signal processing applications.

Author: Bazarjani Seyfollah Seyfollahi
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/1987
Field of study

Scholarship at UWindsor

A New Optimization Technique for Improving Resource Exploitation and Critical Path Minimization

Author: Landwehr Birger
Marwedel Peter
Publication venue: Universität Dortmund
Publication date
Field of study

This paper presents a novel approach to algebraic optimization of data-flow graphs in the domain of computationally intensive applications. The presented approach is based upon the paradigm of simulated evolution which has been proven to be a powerful method for solving large non-linear optimization problems. We introduce a genetic algorithm with a new chromosomal representation of data-flow graphs that serves as a basis for preserving the correctness of algebraic transformations and allows an efficient implementation of the genetic operators. Furthermore, we introduce a new class of hardware-related transformation rules which for the first time allow to take existing component libraries into account. The efficiency of our method is demonstrated by encouraging experimental results for several standard benchmarks

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Real-time execution of image change detection

Author: van Lint R.H.
Publication venue
Publication date: 01/01/2012
Field of study

Repository TU/e

Pure OAI Repository

FlexCore: Massively Parallel and Flexible Processing for Large MIMO Access Points

Author: Georgis G
Husmann C
Jamieson K
Nikitopoulos K
Publication venue: 'The Academy of Transdisciplinary Learning and Advanced Studies'
Publication date: 27/03/2017
Field of study

Large MIMO base stations remain among wireless network designers’ best tools for increasing wireless throughput while serving many clients, but current system designs, sacrifice throughput with simple linear MIMO detection algorithms. Higher-performance detection techniques are known, but remain off the table because these systems parallelize their computation at the level of a whole OFDM subcarrier, sufficing only for the less demanding linear detection approaches they opt for. This paper presents FlexCore, the first computational architecture capable of parallelizing the detection of large numbers of mutually-interfering information streams at a granularity below individual OFDM subcarriers, in a nearly-embarrassingly parallel manner while utilizing any number of available processing elements. For 12 clients sending 64-QAM symbols to a 12-antenna base station, our WARP testbed evaluation shows similar network throughput to the state-of-the-art while using an order of magnitude fewer processing elements. For the same scenario, our combined WARP-GPU testbed evaluation demonstrates a 19x computational speedup, with 97% increased energy efficiency when compared with the state of the art. Finally, for the same scenario, an FPGA-based comparison between FlexCore and the state of the art shows that FlexCore can achieve up to 96% better energy efficiency, and can offer up to 32x the processing throughput

UCL Discovery

Efficient Mapping of Neural Network Models on a Class of Parallel Architectures.

Author: Arad Behnam Seyed
Publication venue: LSU Digital Commons
Publication date: 01/01/1997
Field of study

This dissertation develops a formal and systematic methodology for efficient mapping of several contemporary artificial neural network (ANN) models on k-ary n-cube parallel architectures (KNC\u27s). We apply the general mapping to several important ANN models including feedforward ANN\u27s trained with backpropagation algorithm, radial basis function networks, cascade correlation learning, and adaptive resonance theory networks. Our approach utilizes a parallel task graph representing concurrent operations of the ANN model during training. The mapping of the ANN is performed in two steps. First, the parallel task graph of the ANN is mapped to a virtual KNC of compatible dimensionality. This involves decomposing each operation into its atomic tasks. Second, the dimensionality of the virtual KNC architecture is recursively reduced through a sequence of transformations until a desired metric is optimized. We refer to this process as folding the virtual architecture. The optimization criteria we consider in this dissertation are defined in terms of the iteration time of the algorithm on the folded architecture. If necessary, the mapping scheme may utilize a subset of the processors of a given KNC architecture if it results in the most efficient simulation. A unique feature of our mapping is that it systematically selects an appropriate degree of parallelism leading to a highly efficient realization of the ANN model on KNC architectures. A novel feature of our work is its ability to efficiently map unit-allocating ANN\u27s. These networks possess a dynamic structure which grows during training. We present a highly efficient scheme for simulating such networks on existing KNC parallel architectures. We assume an upper bound on size of the neural network We perform the folding such that the iteration time of the largest network is minimized. We show that our mapping leads to near-optimal simulation of smaller instances of the neural network. In addition, based on our mapping no data migration or task rescheduling is needed as the size of network grows

Louisiana State University

Emerging Technologies - NanoMagnets Logic (NML)

Author: Vacca Marco
Publication venue
Publication date: 01/01/2013
Field of study

In the last decades CMOS technology has ruled the electronic scenario thanks to the constant scaling of transistor sizes. With the reduction of transistor sizes circuit area decreases, clock frequency increases and power consumption decreases accordingly. However CMOS scaling is now approaching its physical limits and many believe that CMOS technology will not be able to reach the end of the Roadmap. This is mainly due to increasing difficulties in the fabrication process, that is becoming very expensive, and to the unavoidable impact of leakage losses, particularly thanks to gate tunnel current. In this scenario many alternative technologies are studied to overcome the limitations of CMOS transistors. Among these possibilities, magnetic based technologies, like NanoMagnet Logic (NML) are among the most interesting. The reason of this interest lies in their magnetic nature, that opens up entire new possibilities in the design of logic circuits, like the possibility to mix logic and memory in the same device. Moreover they have no standby power consumption and potentially a much lower power consumption of CMOS transistors. In literature NML logic is well studied and theoretical and experimental proofs of concept were already found. However two important points are not enough considered in the analysis approach followed by most of the work in literature. First of all, no complex circuits are analyzed. NML logic is very different from CMOS technologies, so to completely understand the potential of this technology it is mandatory to investigate complex architectures. Secondly, most of the solutions proposed do not take into account the constraints derived from fabrication process, making them unrealistic and difficult to be fabricated experimentally. This thesis focuses therefore on NML logic keeping into account these two important limitations in the research approach followed in literature. The aim is to obtain a complete and accurate overview of NML logic, finding realistic circuital solutions and trying to improve at the same time their performance. After a brief and complete introduction (Chapter 1), the thesis is divided in two parts, which cover the two fundamental points followed in this three years of research: A circuits architecture analysis and a technological analysis. In the architecture analysis first an innovative VHDL model is described in Chapter 2. This model is extensively used in the analysis because it allows fast simulation of complex circuits, with, at the same time, the possibility to estimate circuit per- formance, like area and power consumption. In Chapter 3 the problem of signals synchronization in complex NML circuits is analyzed and solved, using as benchmark a simple but complete NML microprocessor. Different solutions based on asynchronous logic are studied and a new asynchronous solution, specifically designed to exploit the potential of NML logic, is developed. In Chapter 4 the layout of NML circuits is studied on a more physical level, considering the limitations of fabrication processes. The layout of NML circuits is therefore changed accordingly to these constraints. Secondly CMOS circuits architectures are compared to more simple architectures, evaluating therefore which one is more suited for NML logic. Finally the problem of interconnections in NML technology is analyzed and solutions to improve it are found. In Chapter 5 the problem of feedback signals in heavy pipelined technologies, like NML, is studied. Solutions to improve performances and synchronize signals are developed. Systolic arrays are then analyzed as possible candidate to exploit NML potential. Finally in Chapter 6 ToPoliNano, a simulator dedicated to NML and other emerging technologies, that we are developing, is described. This simulator allows to follow the same top-down approach followed for CMOS technology. The layout generator and the simulation engine are detailed described. In the first chapter of the technological analysis (Chapter 7), the performance of NML logic is explored throughout low level simulations. The aim is to understand if these circuits can be fabricated with optical lithography, allowing therefore the commercial development of NML logic. Basic logic gates and the clock system are there analyzed from a low level perspective. In Chapter 8 an innovative electric clock system for NML technology is shown and the first experimental results are reported. This clock system allows to achieve true low power for NML technology, obtaining a reduction of power consumption of 20 times considering the best CMOS transistors available. This power consumption takes into account all the losses, also the clock system losses. Moreover the solution presented can be fabricated with current technological processes. The research work behind this thesis represents an important breakthrough in NML logic. The solutions here presented allow the design and fabrication of complex NML circuits, considering the particular characteristics of this technology and considerably improving the performance. Moreover the technological solutions here presented allow the design and fabrication of circuits with available fabrication process with a considerable advantage over CMOS in terms of power consumption. This thesis represents therefore a considerable step froward in the study and development of NML technolog

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A Practical Hierarchical Model of Parallel Computation ll: Binary Tree and FFT Algorithms

Author: Heywood Todd
Ranka Sanjay
Publication venue: SURFACE at Syracuse University
Publication date: 01/10/1991
Field of study

A companion paper has introduced the Hierarchical PRAM (H-PRAM) model of parallel computation, which achieves a good balance between simplicity of usage and reflectivity of realistic parallel computers. In this paper, we demonstrate the usage of the model by designing and analyzing various algorithms for computing the complete binary tree, and the FFT/butterfly graph. By concentrating on two problems, we are able to demonstrate the results of different combinations of organizational strategies and different types of sub-models of the H-PRAM. The philosophy in algorithm design is to maximize the number of processors P that are efficiently usable with respect to an input size N, and to minimize the inefficiency when efficiency is not possible (when P is too large with respect to N). This can be done because of the H-PRAM\u27s representation of general locality, i.e. both strict and neighborhood locality, and results in algorithms that can efficiently employ more processors (and are thus faster) than algorithms for models that only represent strict locality

Syracuse University Research Facility and Collaborative Environment

Characterization and Avoidance of Critical Pipeline Structures in Aggressive Superscalar Processors

Author: Sassone Peter G.
Publication venue: Georgia Institute of Technology
Publication date: 20/07/2005
Field of study

In recent years, with only small fractions of modern processors now accessible in a single cycle, computer architects constantly fight against propagation issues across the die. Unfortunately this trend continues to shift inward, and now the even most internal features of the pipeline are designed around communication, not computation. To address the inward creep of this constraint, this work focuses on the characterization of communication within the pipeline itself, architectural techniques to avoid it when possible, and layout co-design for early detection of problems. I present work in creating a novel detection tool for common case operand movement which can rapidly characterize an applications dataflow patterns. The results produced are suitable for exploitation as a small number of patterns can describe a significant portion of modern applications. Work on dynamic dependence collapsing takes the observations from the pattern results and shows how certain groups of operations can be dynamically grouped, avoiding unnecessary communication between individual instructions. This technique also amplifies the efficiency of pipeline data structures such as the reorder buffer, increasing both IPC and frequency. I also identify the same sets of collapsible instructions at compile time, producing the same benefits with minimal hardware complexity. This technique is also done in a backward compatible manner as the groups are exposed by simple reordering of the binarys instructions. I present aggressive pipelining approaches for these resources which avoids the critical timing often presumed necessary in aggressive superscalar processors. As these structures are designed for the worst case, pipelining them can produce greater frequency benefit than IPC loss. I also use the observation that the dynamic issue order for instructions in aggressive superscalar processors is predictable. Thus, a hardware mechanism is introduced for caching the wakeup order for groups of instructions efficiently. These wakeup vectors are then used to speculatively schedule instructions, avoiding the dynamic scheduling when it is not necessary. Finally, I present a novel approach to fast and high-quality chip layout. By allowing architects to quickly evaluate what if scenarios during early high-level design, chip designs are less likely to encounter implementation problems later in the process.Ph.D.Committee Chair: Scott Wills; Committee Member: David Schimmel; Committee Member: Gabriel Loh; Committee Member: Hsien-Hsin Lee; Committee Member: Yorai Ward

Scholarly Materials And Research @ Georgia Tech