149,326 research outputs found

    Tree-Based Hardware Recursion for Divide-and-Conquer Algorithms

    Get PDF
    It is well-known that custom hardware accelerators implemented as application-specific integrated circuits (ASICs) or on field-programmable gate arrays (FPGAs) can solve many problems much faster than software running on a central processing unit (CPU). This is because FPGAs and ASICs can have handcrafted data and control paths which exploit parallelism in ways that CPUs cannot. However, designing custom hardware is complicated and implementing algorithms in a way that takes advantage of the desired parallelism can be difficult. One class of algorithms that exemplifies this is divide-and-conquer algorithms. A divide-and-conquer algorithm is a type of recursive algorithm that solves a problem by repeatedly dividing it into smaller sub-problems. These algorithms have a lot of parallelism because they generate a large number of sub-problems that can be computed independently of each other. Unfortunately, traditional stack-based approaches to handling recursion in hardware make exploiting this parallelism difficult. This work proposes a new general-purpose approach to implementing recursive functions in hardware, which we call TreeRecur. TreeRecur uses trees to represent the branching recursive function calls of divide-and-conquer algorithms, which makes it possible to take advantage of their procedure-level parallelism. To allow for design flexibility, TreeRecur executes algorithms using a configurable number of independent function processors. These processors are generated using high-level synthesis, making it easy to implement a variety of different algorithms. Our solution was tested on three different algorithms and compared against software implementations of the same algorithms. Performance results were collected in terms of execution speed and energy consumption. TreeRecur was found to have execution speeds comparable to software when differences in clock speed were accounted for and was found to consume up to 11.2 times less energy

    Pipelined Asynchronous High Level Synthesis for General Programs

    Get PDF
    High-level synthesis (HLS) translates algorithms from software programming language into hardware. We use the dataflow HLS methodology to translate programs into asynchronous circuits by implementing programs using asynchronous dataflow elements as hardware building blocks. We extend the prior work in dataflow synthesis in the following aspects:i) we propose Fluid to synthesize pipelined dataflow circuits for real-world programs with complex control flows, which are not supported in the previous work; ii) we propose PipeLink to permit pipelined access to shared resources in the dataflow circuit. Dataflow circuit results in distributed control and an implicitly pipelined implementation. However, resource sharing in the presence of pipelining is challenging in this context due to the absence of a global scheduler. Traditional solutions to this problem impose restrictions on pipelining to guarantee mutually exclusive access to the shared resource, but PipeLink removes such restrictions and can generate pipelined asynchronous dataflow circuits for shared function calls, pipelined memory accesses and function pointers; iii) we apply several dataflow optimizations to improve the quality of the synthesized dataflow circuits; iv) we implement our system (Fluid + PipeLink) on the LLVM compiler framework, which allows us to take advantage of the optimization efforts from the compiler community; v) we compare our system with a widely-used academic HLS tool and two commercial HLS tools. Compared to commercial (academic) HLS tools, our system results in 12X (20X) reduction in energy, 1.29X (1.64X) improvement in throughput, 1.27X (1.61X) improvement in latency at a cost of 2.4X (1.61X) increase in the area

    Recursive Program Optimization Through Inductive Synthesis Proof Transformation

    Get PDF
    The research described in this paper involved developing transformation techniques which increase the efficiency of the noriginal program, the source, by transforming its synthesis proof into one, the target, which yields a computationally more efficient algorithm. We describe a working proof transformation system which, by exploiting the duality between mathematical induction and recursion, employs the novel strategy of optimizing recursive programs by transforming inductive proofs. We compare and contrast this approach with the more traditional approaches to program transformation, and highlight the benefits of proof transformation with regards to search, correctness, automatability and generality

    Empirical Evidence of Large-Scale Diversity in API Usage of Object-Oriented Software

    Get PDF
    In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on "usage diversity'", defined as the different statically observable combinations of methods called on the same object. We present empirical evidence that there is a significant usage diversity for many classes. For instance, we observe in our dataset that Java's String is used in 2460 manners. We discuss the reasons of this observed diversity and the consequences on software engineering knowledge and research
    corecore