Search CORE

4,931 research outputs found

A Micro Power Hardware Fabric for Embedded Computing

Author: Mehta Gayatri
Publication venue
Publication date: 25/09/2009
Field of study

Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

D-Scholarship@Pitt

Time-Space Tradeoffs for the Memory Game

Author: Chakrabarti Amit
Chen Yining
Publication venue
Publication date: 25/01/2018
Field of study

A single-player game of Memory is played with

n

distinct pairs of cards, with the cards in each pair bearing identical pictures. The cards are laid face-down. A move consists of revealing two cards, chosen adaptively. If these cards match, i.e., they bear the same picture, they are removed from play; otherwise, they are turned back to face down. The object of the game is to clear all cards while minimizing the number of moves. Past works have thoroughly studied the expected number of moves required, assuming optimal play by a player has that has perfect memory. In this work, we study the Memory game in a space-bounded setting. We prove two time-space tradeoff lower bounds on algorithms (strategies for the player) that clear all cards in

T

moves while using at most

S

bits of memory. First, in a simple model where the pictures on the cards may only be compared for equality, we prove that

ST = \Omega(n^2 \log n)

. This is tight: it is easy to achieve

ST = O(n^2 \log n)

essentially everywhere on this tradeoff curve. Second, in a more general model that allows arbitrary computations, we prove that

ST^2 = \Omega(n^3)

. We prove this latter tradeoff by modeling strategies as branching programs and extending a classic counting argument of Borodin and Cook with a novel probabilistic argument. We conjecture that the stronger tradeoff

ST = \widetilde{\Omega}(n^2)

in fact holds even in this general model

arXiv.org e-Print Archive

Dartmouth Digital Commons (Dartmouth College)

From Small Space to Small Width in Resolution

Author: Filmus Yuval
Lauria Massimo
Mikša Mladen
Nordström Jakob
Vinyals Marc
Publication venue
Publication date: 01/01/2014
Field of study

In 2003, Atserias and Dalmau resolved a major open question about the resolution proof system by establishing that the space complexity of CNF formulas is always an upper bound on the width needed to refute them. Their proof is beautiful but somewhat mysterious in that it relies heavily on tools from finite model theory. We give an alternative, completely elementary proof that works by simple syntactic manipulations of resolution refutations. As a by-product, we develop a "black-box" technique for proving space lower bounds via a "static" complexity measure that works against any resolution refutation---previous techniques have been inherently adaptive. We conclude by showing that the related question for polynomial calculus (i.e., whether space is an upper bound on degree) seems unlikely to be resolvable by similar methods

arXiv.org e-Print Archive

CiteSeerX

Copenhagen University Research Information System

Dagstuhl Research Online Publication Server

Archivio della ricerca- Università di Roma La Sapienza

Principles for problem aggregation and assignment in medium scale multiprocessors

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior

NASA Technical Reports Server

Recommended from our members

EXTEND-L : an input language for extensible register transfer compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Publication venue: eScholarship, University of California
Publication date: 15/04/1988
Field of study

This report discusses the model and input language for EXTEND, a synthesis system that permits extensible register transfer synthesis. EXTEND-L fills the need for a language that bridges the gap between existing behavioral input descriptions, which are too abstract, and structural schematics, which cannot capture the high-level behavior. The report first discusses previous work in behavioral synthesis and summarizes the deficiencies of these behavioral specifications. The report then describes the proposed langauge in detail, and concludes with a few examples that show its utility

eScholarship - University of California

Recommended from our members

A system for microarchitecture and logic optimization

Author: Zanden Nels Vander
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

This thesis spans two levels of the design process by examining optimization at both the register-transfer level and at the logic level. More specifically, this thesis addresses the following two problems: 1) performing logic synthesis for custom layout rather than the traditional approach that focuses on synthesis for standard cells, and 2) performing optimization for custom layout from register-transfer level netlists. Thus optimization is performed on the microarchitecture design and at a lower level for individual microarchitecture components.First, techniques are introduced for generating gate-level netlists that take advantage of custom layout capabilities. Such techniques include limiting serial/parallel transistor chains, transistor sizes, and capacitive loads in forming complex gates. These considerations have not been incorporated in previous logic synthesis systems.Second, techniques are introduced for improving the microarchitecture structure and using estimates from lower-level optimization tools to guide microarchitecture design optimizations that attempt to meet user specified area and time constraints. These techniques include the capability for mixing layout styles such as custom layout for random-logic components and bit-slicing for regularly structured components. In this manner the entire design, control logic and datapath, can be optimized at the same time. Further, this paper presents a new methodology for microarchitecture-level optimization that greatly reduces the amount of technology-specific knowledge necessary to perform the optimizations

eScholarship - University of California

Recommended from our members

VSS : a VHDL synthesis system

Author: Gajski Daniel D.
Lis Joseph S.
Publication venue: eScholarship, University of California
Publication date: 06/05/1988
Field of study

This report describes a register transfer synthesis system that allows a designer to interact with the design process. The designer can modify the compiled design by changing the input description, selecting optimization and mapping strategies, or graphically changing the generated design schematic. The VHDL language is used for input and output descriptions. An intermediate representation which incorporates signal typing and component attributes simplifies compilation and facilitates design optimization. The compilation process consists of two phases. First, a design composed of generic components is synthesized from the input description. Second, this design is translated into components from a particular library by a mapper and optimized by a logic optimizer. Redesign to new technologies can be accomplished by changing only the component library

eScholarship - University of California