Search CORE

99 research outputs found

Σύνθεση & ανάλυση dual-core mips 32-bit επεξεργαστή

Author: Γρατσία Αικατερίνη Ν.
Publication venue
Publication date: 01/01/2011
Field of study

University of Thessaly Institutional Repository

Recommended from our members

From Functional Programs to Pipelined Dataflow Circuits

Author: Edwards Stephen A.
Kim Martha Allen
Townsend Richard Morse
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

We present a translation from programs expressed in a functional IR into dataflow networks as an intermediate step within a Haskell-to-Hardware compiler. Our networks exploit pipeline parallelism, particularly across multiple tail-recursive calls, via non-strict function evaluation. To handle the long-latency memory operations common to our target applications, we employ a latency-insensitive methodology that ensures arbitrary delays do not change the functionality of the circuit. We present empirical results comparing our networks against their strict counterparts, showing that nonstrictness can mitigate small increases in memory latency and improve overall performance by up to 2x

Columbia University Academic Commons

FLiMS: a fast lightweight 2-way merger for sorting

Author: Brooks C
Luk W
Papaphilippou P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

In this paper, we present FLiMS, a highly-efficient and simple parallel algorithm for merging two sorted lists residing in banked and/or wide memory. On FPGAs, its implementation uses fewer hardware resources than the state-of-the-art alternatives, due to the reduced number of comparators and elimination of redundant logic found on prior attempts. In combination with the distributed nature of the selector stage, a higher performance is achieved for the same amount of parallelism or higher. This is useful in many applications such as in parallel merge trees to achieve high-throughput sorting, where the resource utilisation of the merger is critical for building larger trees and internalising the workload for faster computation. Also presented are efficient variations of FLiMS for optimizing throughput for skewed datasets, achieving stable sorting or using fewer dequeue signals. FLiMS is also shown to perform well as conventional software on modern CPUs supporting single-instruction multiple-data (SIMD) instructions, surpassing the performance of some standard libraries for sorting

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

A Streaming High-Throughput Linear Sorter System with Contention Buffering

Author: David Andrews
Jorge Ortiz
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2011
Field of study

Popular sorting algorithms do not translate well into hardware implementations. Instead, hardware-based solutions like sorting networks, systolic sorters, and linear sorters exploit parallelism to increase sorting efficiency. Linear sorters, built from identical nodes with simple control, have less area and latency than sorting networks, but they are limited in their throughput. We present a system composed of multiple linear sorters acting in parallel to increase overall throughput. Interleaving is used to increase bandwidth and allow sorting of multiple values per clock cycle, and the amount of interleaving and depth of the linear sorters can be adapted to suit specific applications. Contention for available linear sorters in the system is solved through the use of buffers that accumulate conflicting requests, dispatching them in bulk to reduce latency penalties. Implementation of this system into a field programmable gate array (FPGA) results in a speedup of 68 compared to a MicroBlaze processor running quicksort

Crossref

Directory of Open Access Journals

Recommended from our members

Compiling Irregular Software to Specialized Hardware

Author: Townsend Richard Morse
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

High-level synthesis (HLS) has simplified the design process for energy-efficient hardware accelerators: a designer specifies an accelerator’s behavior in a “high-level” language, and a toolchain synthesizes register-transfer level (RTL) code from this specification. Many HLS systems produce efficient hardware designs for regular algorithms (i.e., those with limited conditionals or regular memory access patterns), but most struggle with irregular algorithms that rely on dynamic, data-dependent memory access patterns (e.g., traversing pointer-based structures like lists, trees, or graphs). HLS tools typically provide imperative, side-effectful languages to the designer, which makes it difficult to correctly specify and optimize complex, memory-bound applications. In this dissertation, I present an alternative HLS methodology that leverages properties of functional languages to synthesize hardware for irregular algorithms. The main contribution is an optimizing compiler that translates pure functional programs into modular, parallel dataflow networks in hardware. I give an overview of this compiler, explain how its source and target together enable parallelism in the face of irregularity, and present two specific optimizations that further exploit this parallelism. Taken together, this dissertation verifies my thesis that pure functional programs exhibiting irregular memory access patterns can be compiled into specialized hardware and optimized for parallelism. This work extends the scope of modern HLS toolchains. By relying on properties of pure functional languages, our compiler can synthesize hardware from programs containing constructs that commercial HLS tools prohibit, e.g., recursive functions and dynamic memory allocation. Hardware designers may thus use our compiler in conjunction with existing HLS systems to accelerate a wider class of algorithms than before

Columbia University Academic Commons

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

Author: Bingmann Timo
Publication venue
Publication date: 01/01/2018
Field of study

This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill.Comment: 396 pages, dissertation, Karlsruher Instituts f\"ur Technologie (2018). arXiv admin note: text overlap with arXiv:1101.3448 by other author

arXiv.org e-Print Archive

KITopen

Architectures of new switching systems.

Author
Publication venue
Publication date: 01/01/1998
Field of study

by Lam Wan.Thesis submitted in: November 1997.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 96-102).Abstract also in Chinese.Part IChapter 1 --- Introduction to Integrated Intelligent Personal Communication System --- p.1Chapter 2 --- The Switching Architecture --- p.5Chapter 2.1 --- The Overall Switching Architecture --- p.6Chapter 2.2 --- Switching Module --- p.10Chapter 2.2.1 --- Traffic Routing in Switching Module --- p.11Chapter 2.2.2 --- Structure of Switching Module --- p.15Chapter 2.2.3 --- Wireless Base Interface --- p.16Chapter 2.2.4 --- Trunk Interface --- p.18Chapter 2.2.5 --- Analog Interfaces --- p.18Chapter 2.3 --- Network Intelligence --- p.19Chapter 2.4 --- Wireless Part --- p.21Chapter 2.4.1 --- Call-Setup in IIPCS --- p.24Chapter 2.4.2 --- Handoff --- p.25Chapter 2.4.3 --- Wireless Base --- p.27Chapter 2.5 --- Downstream Wired Extensions --- p.28Chapter 2.6 --- Upstream Wired Part --- p.28Chapter 2.7 --- Voice System --- p.28Chapter 2.8 --- Features of the IIPCS --- p.29Chapter 3 --- Concluding Remarks --- p.33Chapter 3.1 --- Summary --- p.35Chapter 3.2 --- Directions for Further Research --- p.36Part IIChapter 4 --- Introduction to Next-Generation Switch --- p.37Chapter 5 --- Architecture of Next-Generation Switch --- p.41Chapter 5.1 --- Overall Architecture of Next-Generation Switch --- p.42Chapter 5.1.1 --- Interface module --- p.44Chapter 5.1.2 --- Packetizer --- p.46Chapter 5.2 --- Concentration Fabric --- p.50Chapter 5.3 --- Shared-Buffer Memory Switch --- p.53Chapter 6 --- Concentration Networks --- p.56Chapter 6.1 --- Background of Concentration Networks --- p.56Chapter 6.2 --- k-Sorting --- p.63Chapter 6.3 --- Concentrator --- p.72Chapter 6.3.1 --- Nk-to-k Concentrator --- p.73Chapter 6.3.2 --- Match between Circles with Cost Reduction --- p.75Chapter 6.4 --- The Structure of a Molecule --- p.78Chapter 6.5 --- Summary --- p.81Chapter 7 --- Lock-Latch Algorithm --- p.82Chapter 8 --- Performance Evaluation --- p.88Chapter 9 --- Concluding Remarks --- p.93Chapter 9.1 --- LSI Implementation --- p.94Chapter 9.2 --- Summary --- p.95Bibliograph

CUHK Digital Repository