4,931 research outputs found

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

    Time-Space Tradeoffs for the Memory Game

    Get PDF
    A single-player game of Memory is played with nn distinct pairs of cards, with the cards in each pair bearing identical pictures. The cards are laid face-down. A move consists of revealing two cards, chosen adaptively. If these cards match, i.e., they bear the same picture, they are removed from play; otherwise, they are turned back to face down. The object of the game is to clear all cards while minimizing the number of moves. Past works have thoroughly studied the expected number of moves required, assuming optimal play by a player has that has perfect memory. In this work, we study the Memory game in a space-bounded setting. We prove two time-space tradeoff lower bounds on algorithms (strategies for the player) that clear all cards in TT moves while using at most SS bits of memory. First, in a simple model where the pictures on the cards may only be compared for equality, we prove that ST=Ω(n2logn)ST = \Omega(n^2 \log n). This is tight: it is easy to achieve ST=O(n2logn)ST = O(n^2 \log n) essentially everywhere on this tradeoff curve. Second, in a more general model that allows arbitrary computations, we prove that ST2=Ω(n3)ST^2 = \Omega(n^3). We prove this latter tradeoff by modeling strategies as branching programs and extending a classic counting argument of Borodin and Cook with a novel probabilistic argument. We conjecture that the stronger tradeoff ST=Ω~(n2)ST = \widetilde{\Omega}(n^2) in fact holds even in this general model

    From Small Space to Small Width in Resolution

    Get PDF
    In 2003, Atserias and Dalmau resolved a major open question about the resolution proof system by establishing that the space complexity of CNF formulas is always an upper bound on the width needed to refute them. Their proof is beautiful but somewhat mysterious in that it relies heavily on tools from finite model theory. We give an alternative, completely elementary proof that works by simple syntactic manipulations of resolution refutations. As a by-product, we develop a "black-box" technique for proving space lower bounds via a "static" complexity measure that works against any resolution refutation---previous techniques have been inherently adaptive. We conclude by showing that the related question for polynomial calculus (i.e., whether space is an upper bound on degree) seems unlikely to be resolvable by similar methods

    Principles for problem aggregation and assignment in medium scale multiprocessors

    Get PDF
    One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior
    corecore