270 research outputs found
Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization
Optimizing data movements is becoming one of the biggest challenges in
heterogeneous computing to cope with data deluge and, consequently, big data
applications. When creating specialized accelerators, modern high-level
synthesis (HLS) tools are increasingly efficient in optimizing the
computational aspects, but data transfers have not been adequately improved. To
combat this, novel architectures such as High-Bandwidth Memory with wider data
busses have been developed so that more data can be transferred in parallel.
Designers must tailor their hardware/software interfaces to fully exploit the
available bandwidth. HLS tools can automate this process, but the designer must
follow strict coding-style rules. If the bus width is not evenly divisible by
the data width (e.g., when using custom-precision data types) or if the arrays
are not power-of-two length, the HLS-generated accelerator will likely not
fully utilize the available bandwidth, demanding even more manual effort from
the designer. We propose a methodology to automatically find and implement a
data layout that, when streamed between memory and an accelerator, uses a
higher percentage of the available bandwidth than a naive or HLS-optimized
design. We borrow concepts from multiprocessor scheduling to achieve such high
efficiency.Comment: Accepted for presentation at ASPDAC'2
Reconfigurable Computing and Hardware/Software Codesign
none3Article ID 731830 - EditorialPLAKS T. P; SANTAMBROGIO M. D; D. SCIUTOPLAKS T., P; Santambrogio, MARCO DOMENICO; Sciuto, Donatell
Exploring the Role of Inter-Organizational Information Systems within SMEs Aggregations
Interorganizational Information Systems (IOIS) will play a relevant role in shaping
competition in the next years. Even though companies have become extremely efficient in managing information and logistics inside their boundaries, communication and coordination among partners is still far from effective. Both obsolete technologies and
very scarce ICT supported interorganizational process are found in practice. In a global
market where the entire supply chain is involved in company success, the proper design
and implementation of an IOS is becoming mandatory. SMEs, and in particular those
inside industrial aggregations, could greatly benefit from IOIS implementation, however
a widely accepted IOS adoption theory is still lacking. Focusing on the description of an
industrial aggregation this paper proposes a framework, its implementation and a field
test on 70 companies belonging to an industrial district, to understand the relationships among aggregation’s main players. The analysis of the results proved that this approach offers useful insight for the comprehension of the aggregation and suggest its use as a pre-design IOIS tool.6-8 June 200
Dataflow Computing with Polymorphic Registers
Heterogeneous systems are becoming increasingly popular for data processing. They improve performance of simple kernels applied to large amounts of data. However, sequential data loads may have negative impact. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high speed, parallel access to performance-critical data. Furthermore, by PRF customization, specific data path features are exposed to the programmer in a very convenient way. PRFs allow additional control over the registers dimensions, and the number of elements which can be simultaneously accessed by computational units. This paper shows how PRFs can be integrated in dataflow computational platforms. In particular, starting from an annotated source code, we present a compiler-based methodology that automatically generates the customized PRFs and the enhanced computational kernels that efficiently exploit them
The Case for Polymorphic Registers in Dataflow Computing
Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9×9
or larger mask sizes, even in bandwidth-constrained systems
ASSURE: RTL Locking Against an Untrusted Foundry
Semiconductor design companies are integrating proprietary intellectual
property (IP) blocks to build custom integrated circuits (IC) and fabricate
them in a third-party foundry. Unauthorized IC copies cost these companies
billions of dollars annually. While several methods have been proposed for
hardware IP obfuscation, they operate on the gate-level netlist, i.e., after
the synthesis tools embed the semantic information into the netlist. We propose
ASSURE to protect hardware IP modules operating on the register-transfer level
(RTL) description. The RTL approach has three advantages: (i) it allows
designers to obfuscate IP cores generated with many different methods (e.g.,
hardware generators, high-level synthesis tools, and pre-existing IPs). (ii) it
obfuscates the semantics of an IC before logic synthesis; (iii) it does not
require modifications to EDA flows. We perform a cost and security assessment
of ASSURE.Comment: Submitted to IEEE Transactions on VLSI Systems on 11-Oct-2020,
28-Jan-202
- …