4,581 research outputs found
Recommended from our members
MILO : a microarchitecture and logic optimizer
In this report we discuss strengths and weaknesses of logic synthesis systems and describe a system for microarchitectural and logic optimization. Our system uses a set of algorithms for synthesizing SSI/MSI macros from parameterized microarchitecture components. In addition, it uses rules for optimizing both at the microarchitecture and logic level. The system increases designer productivity and requires less design knowledge and experience from circuit engineers
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications
Heterogeneous High-Performance Computing
(HPC) platforms present a significant programming challenge,
especially because the key users of HPC resources are scientists,
not parallel programmers. We contend that compiler technology
has to evolve to automatically create the best program variant
by transforming a given original program. We have developed a
novel methodology based on type transformations for generating
correct-by-construction design variants, and an associated
light-weight cost model for evaluating these variants for
implementation on FPGAs. In this paper we present a key
enabler of our approach, the cost model. We discuss how we
are able to quickly derive accurate estimates of performance
and resource-utilization from the designâs representation in our
intermediate language. We show results confirming the accuracy
of our cost model by testing it on three different scientific
kernels. We conclude with a case-study that compares a solution
generated by our framework with one from a conventional
high-level synthesis tool, showing better performance and
power-efficiency using our cost model based approach
Optimizing compilation with preservation of structural code coverage metrics to support software testing
Code-coverage-based testing is a widely-used testing strategy with the aim of providing a meaningful decision criterion for the adequacy of a test suite. Code-coverage-based testing is also mandated for the development of safety-critical applications; for example, the DO178b document requires the application of the modified condition/decision coverage. One critical issue of code-coverage testing is that structural code coverage criteria are typically applied to source code whereas the generated machine code may result in a different code structure because of code optimizations performed by a compiler. In this work, we present the automatic calculation of coverage profiles describing which structural code-coverage criteria are preserved by which code optimization, independently of the concrete test suite. These coverage profiles allow to easily extend compilers with the feature of preserving any given code-coverage criteria by enabling only those code optimizations that preserve it. Furthermore, we describe the integration of these coverage profile into the compiler GCC. With these coverage profiles, we answer the question of how much code optimization is possible without compromising the error-detection likelihood of a given test suite. Experimental results conclude that the performance cost to achieve preservation of structural code coverage in GCC is rather low.Peer reviewedSubmitted Versio
Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add
The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and âreal-worldâ application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using âreal-worldâ benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P.
The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft
FlashProfile: A Framework for Synthesizing Data Profiles
We address the problem of learning a syntactic profile for a collection of
strings, i.e. a set of regex-like patterns that succinctly describe the
syntactic variations in the strings. Real-world datasets, typically curated
from multiple sources, often contain data in various syntactic formats. Thus,
any data processing task is preceded by the critical step of data format
identification. However, manual inspection of data to identify the different
formats is infeasible in standard big-data scenarios.
Prior techniques are restricted to a small set of pre-defined patterns (e.g.
digits, letters, words, etc.), and provide no control over granularity of
profiles. We define syntactic profiling as a problem of clustering strings
based on syntactic similarity, followed by identifying patterns that succinctly
describe each cluster. We present a technique for synthesizing such profiles
over a given language of patterns, that also allows for interactive refinement
by requesting a desired number of clusters.
Using a state-of-the-art inductive synthesis framework, PROSE, we have
implemented our technique as FlashProfile. Across tasks over large
real datasets, we observe a median profiling time of only s.
Furthermore, we show that access to syntactic profiles may allow for more
accurate synthesis of programs, i.e. using fewer examples, in
programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201
Superlattice Nanowire Pattern Transfer (SNAP)
During the past 15 years or so, nanowires (NWs) have emerged as a new and distinct class of materials. Their novel structural and physical properties separate them from wires that can be prepared using the standard methods for manufacturing electronics. NW-based applications that range from traditional electronic devices (logic and memory) to novel biomolecular and chemical sensors, thermoelectric materials, and optoelectronic devices, all have appeared during the past few years. From a fundamental perspective, NWs provide a route toward the investigation of new physics in confined dimensions.
Perhaps the most familiar fabrication method is the vaporâliquidâsolid (VLS) growth technique, which produces semiconductor nanowires as bulk materials. However, other fabrication methods exist and have their own advantages.
In this Account, I review a particular class of NWs produced by an alternative method called superlattice nanowire pattern transfer (SNAP). The SNAP method is distinct from other nanowire preparation methods in several ways. It can produce large NW arrays from virtually any thin-film material, including metals, insulators, and semiconductors. The dimensions of the NWs can be controlled with near-atomic precision, and NW widths and spacings can be as small as a few nanometers. In addition, SNAP is almost fully compatible with more traditional methods for manufacturing electronics. The motivation behind the development of SNAP was to have a general nanofabrication method for preparing electronics-grade circuitry, but one that would operate at macromolecular dimensions and with access to a broad materials set. Thus, electronics applications, including novel demultiplexing architectures; large-scale, ultrahigh-density memory circuits; and complementary symmetry nanowire logic circuits, have served as drivers for developing various aspects of the SNAP method. Some of that work is reviewed here.
As the SNAP method has evolved into a robust nanofabrication method, it has become an enabling tool for the investigation of new physics. In particular, the application of SNAP toward understanding heat transport in low-dimensional systems is discussed. This work has led to the surprising discovery that Si NWs can serve as highly efficient thermoelectric materials. Finally, we turn toward the application of SNAP to the investigation of quasi-one-dimensional (quasi-1D) superconducting physics in extremely high aspect ratio Nb NWs
- âŠ