17,585 research outputs found
OpenPARF: An Open-Source Placement and Routing Framework for Large-Scale Heterogeneous FPGAs with Deep Learning Toolkit
This paper proposes OpenPARF, an open-source placement and routing framework
for large-scale FPGA designs. OpenPARF is implemented with the deep learning
toolkit PyTorch and supports massive parallelization on GPU. The framework
proposes a novel asymmetric multi-electrostatic field system to solve FPGA
placement. It considers fine-grained routing resources inside configurable
logic blocks (CLBs) for FPGA routing and supports large-scale irregular routing
resource graphs. Experimental results on ISPD 2016 and ISPD 2017 FPGA contest
benchmarks and industrial benchmarks demonstrate that OpenPARF can achieve
0.4-12.7% improvement in routed wirelength and more than speedup in
placement. We believe that OpenPARF can pave the road for developing FPGA
physical design engines and stimulate further research on related topics
Recommended from our members
Serial Biasing Technique for Rapid Single Flux Quantum Circuits
Superconductor electronics based on the Single Flux Quantum (SFQ) technology are considered a strong contender for the ‘beyond CMOS’ future of digital circuits because of the high speed and low power dissipation associated with them. In fact, digital operations beyond tens of GHz have been routinely demonstrated in the SFQ technology. These circuits have widespread applications such as high-speed analog-to-digital conversion, digital signal processing, high speed computing and in emerging topics such as control circuitry for superconducting quantum computing.
Rapid Single Flux Quantum (RSFQ) circuits have emerged as a promising candidate within the SFQ technology, with information encoded in picosecond wide, milli-volt voltage pulses. As is the case with any integrated circuit technology, scalability of RSFQ circuits is essential to realizing their applications. These circuits, based on the Josephson junction, require a DC bias current for the correct operation. The DC bias current requirement increases with circuit complexity, and this has multiple implications on circuit operation. Large currents produce magnetic fields that can interfere with logic operation. Furthermore, the heat load delivered to the superconducting chip also increases with current which could result in the circuit becoming ‘normal’ and not superconducting. These problems make reduction of the bias current necessary.
Serial Biasing (SB) is a bias current reduction technique, that has been proposed in the past. In this technique, a digital circuit is partitioned into multiple identical islands and bias current is provided to each island in a serial manner. While this scheme is promising, there are multiple challenges such as design of the driver-receiver pair circuit resulting in robust and wide operating bias margins, current management on the floating islands, etc.
This thesis investigates SB in a systematic manner, focusing on the design and measurement of the fundamental components of this technique with an emphasis on reliability and scalability. It presents works on circuit techniques achieving high speed serially biased RSFQ circuits with robust operating margins and the experimental evidence to support the ideas. It develops a framework for serial biasing that could be used by electronic design tools to automate design and synthesis of complex RSFQ circuits. It also investigates Passive Transmission Lines (PTLs) for use as passive interconnects between library cells in a complex design, reducing the DC bias current required by the active circuitry
Meso-scale FDM material layout design strategies under manufacturability constraints and fracture conditions
In the manufacturability-driven design (MDD) perspective, manufacturability of the product or system is the most important of the design requirements. In addition to being able to ensure that complex designs (e.g., topology optimization) are manufacturable with a given process or process family, MDD also helps mechanical designers to take advantage of unique process-material effects generated during manufacturing. One of the most recognizable examples of this comes from the scanning-type family of additive manufacturing (AM) processes; the most notable and familiar member of this family is the fused deposition modeling (FDM) or fused filament fabrication (FFF) process. This process works by selectively depositing uniform, approximately isotropic beads or elements of molten thermoplastic material (typically structural engineering plastics) in a series of pre-specified traces to build each layer of the part. There are many interesting 2-D and 3-D mechanical design problems that can be explored by designing the layout of these elements. The resulting structured, hierarchical material (which is both manufacturable and customized layer-by-layer within the limits of the process and material) can be defined as a manufacturing process-driven structured material (MPDSM). This dissertation explores several practical methods for designing these element layouts for 2-D and 3-D meso-scale mechanical problems, focusing ultimately on design-for-fracture. Three different fracture conditions are explored: (1) cases where a crack must be prevented or stopped, (2) cases where the crack must be encouraged or accelerated, and (3) cases where cracks must grow in a simple pre-determined pattern. Several new design tools, including a mapping method for the FDM manufacturability constraints, three major literature reviews, the collection, organization, and analysis of several large (qualitative and quantitative) multi-scale datasets on the fracture behavior of FDM-processed materials, some new experimental equipment, and the refinement of a fast and simple g-code generator based on commercially-available software, were developed and refined to support the design of MPDSMs under fracture conditions. The refined design method and rules were experimentally validated using a series of case studies (involving both design and physical testing of the designs) at the end of the dissertation. Finally, a simple design guide for practicing engineers who are not experts in advanced solid mechanics nor process-tailored materials was developed from the results of this project.U of I OnlyAuthor's request
Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules
We target the problem of automatically synthesizing proofs of semantic
equivalence between two programs made of sequences of statements. We represent
programs using abstract syntax trees (AST), where a given set of
semantics-preserving rewrite rules can be applied on a specific AST pattern to
generate a transformed and semantically equivalent program. In our system, two
programs are equivalent if there exists a sequence of application of these
rewrite rules that leads to rewriting one program into the other. We propose a
neural network architecture based on a transformer model to generate proofs of
equivalence between program pairs. The system outputs a sequence of rewrites,
and the validity of the sequence is simply checked by verifying it can be
applied. If no valid sequence is produced by the neural network, the system
reports the programs as non-equivalent, ensuring by design no programs may be
incorrectly reported as equivalent. Our system is fully implemented for a given
grammar which can represent straight-line programs with function calls and
multiple types. To efficiently train the system to generate such sequences, we
develop an original incremental training technique, named self-supervised
sample selection. We extensively study the effectiveness of this novel training
approach on proofs of increasing complexity and length. Our system, S4Eq,
achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent
programsComment: 30 pages including appendi
BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs
Deep neural network (DNN) inference using reduced integer precision has been
shown to achieve significant improvements in memory utilization and compute
throughput with little or no accuracy loss compared to full-precision
floating-point. Modern FPGA-based DNN inference relies heavily on the on-chip
block RAM (BRAM) for model storage and the digital signal processing (DSP) unit
for implementing the multiply-accumulate (MAC) operation, a fundamental DNN
primitive. In this paper, we enhance the existing BRAM to also compute MAC by
proposing BRAMAC (Compute-in-AM
rchitectures for
ultiply-cumulate). BRAMAC supports
2's complement 2- to 8-bit MAC in a small dummy BRAM array using a hybrid
bit-serial & bit-parallel data flow. Unlike previous compute-in-BRAM
architectures, BRAMAC allows read/write access to the main BRAM array while
computing in the dummy BRAM array, enabling both persistent and tiling-based
DNN inference. We explore two BRAMAC variants: BRAMAC-2SA (with 2 synchronous
dummy arrays) and BRAMAC-1DA (with 1 double-pumped dummy array).
BRAMAC-2SA/BRAMAC-1DA can boost the peak MAC throughput of a large Arria-10
FPGA by 2.6/2.1, 2.3/2.0, and
1.9/1.7 for 2-bit, 4-bit, and 8-bit precisions, respectively at
the cost of 6.8%/3.4% increase in the FPGA core area. By adding
BRAMAC-2SA/BRAMAC-1DA to a state-of-the-art tiling-based DNN accelerator, an
average speedup of 2.05/1.7 and 1.33/1.52 can
be achieved for AlexNet and ResNet-34, respectively across different model
precisions.Comment: 11 pages, 13 figures, 3 tables, FCCM conference 202
Cardiovascular diseases prediction by machine learning incorporation with deep learning
It is yet unknown what causes cardiovascular disease (CVD), but we do know that it is associated with a high risk of death, as well as severe morbidity and disability. There is an urgent need for AI-based technologies that are able to promptly and reliably predict the future outcomes of individuals who have cardiovascular disease. The Internet of Things (IoT) is serving as a driving force behind the development of CVD prediction. In order to analyse and make predictions based on the data that IoT devices receive, machine learning (ML) is used. Traditional machine learning algorithms are unable to take differences in the data into account and have a low level of accuracy in their model predictions. This research presents a collection of machine learning models that can be used to address this problem. These models take into account the data observation mechanisms and training procedures of a number of different algorithms. In order to verify the efficacy of our strategy, we combined the Heart Dataset with other classification models. The proposed method provides nearly 96 percent of accuracy result than other existing methods and the complete analysis over several metrics has been analysed and provided. Research in the field of deep learning will benefit from additional data from a large number of medical institutions, which may be used for the development of artificial neural network structures
StringENT test suite: ENT battery revisited for efficient P value computation
Random numbers play a key role in a wide variety of applications, ranging from mathematical simulation to cryptography. Generating random or pseudo-random numbers is not an easy task, especially when hardware, time and energy constraints are considered. In order to assess whether generators behave in a random fashion, there are several statistical test batteries. ENT is one of the simplest and most popular, at least in part due to its efficacy and speed. Nonetheless, only one of the tests of this suite provides a p value, which is the most useful and standard way to determine whether the randomness hypothesis holds, for a certain significance level. As a consequence of this, rather arbitrary and at times misleading bounds are set in order to decide which intervals are acceptable for its results. This paper introduces an extension of the battery, named StringENT, which, while sticking to the fast speed that makes ENT popular and useful, still succeeds in providing p values with which sound decisions can be made about the randomness of a sequence. It also highlights a flagrant randomness flaw that the classical ENT battery is not capable of detecting but the new StringENT notices, and introduces two additional tests
An Improved Latch for SerDes Interface: Design and Analysis under PVT and AC Noise
Digital subsystem prefers CMOS process, but it is difficult to manage speed and average power (Pavg) trade-off in each era with power supply voltage (Vdd) scaling. Current mode logic (CML) has emerged as an alternative to design the fundamental block of a SerDes, namely, the latch. However, available CML circuits consume significant Pavg and suffer from rapid input slewing. Typically, fast switching inputs enable current flow to effective supply voltage VP and overcharges output. In fact, VP is different than externally applied Vdd and oscillates with time as and when an abrupt current is drawn. This affects delay td and introduces jitter. The topic presents a new latch for SerDes interface using a new current steering circuit and coupled to a power delivery network (PDN). The significant point is to attain an almost constant td in comparison to conventional designs while the Vdd changes. The post-layout results at 0.09-μm CMOS and 1.1 V Vdd indicate that the Pavg and td are 339.5 µW and 61.9 ps, respectively, at 27OC. Surprisingly, the td variation is noted to be minimum and the power supply noise induced jitter is around 1.5 ns when VP close to the circuit varies due to sudden current
Optimisation of Triboelectric Nanogenerator performance in vertical contact-separation mode
Triboelectric nanogenerator (TENG) is one of the most promising energy harvesters – a technology that uses repeated or reciprocating contact of suitably chosen materials to generate charge via the triboelectric effect (TE) and utilizes this as usable voltage and current. TENGs are attractive as they can continuously generate charge over a wide range of operating conditions and have several valuable advantages such as light weight, simple structure, low cost and high efficiency. Therefore, TENGs have been explored in a wide range of applications, including self-powered wearable electronics, powering electronics and even for harvesting ocean wave/wind energy. One of the major limitations of TENGs is their low power output (usually <500 W/m2). This thesis focuses of a few specific approaches to optimising TENG output performance. This thesis begins by presenting a solution to this challenge by optimizing a low permittivity substrate beneath the tribo-contact layer. The open circuit voltage is found to increase by a factor of 1.3 in moving from PET to the lower permittivity PTFE. TENG performance is also believed to depend on contact force, but the origin of the dependence had not previously been explored. Herein, we show that this behaviour results from a contact force dependent real contact area Ar as governed by surface roughness. The open circuit voltage Voc, short circuit current Isc and Ar for a TENG were found to increase with contact force/pressure. Critically, Voc and Isc saturate at the same contact pressure as Ar suggesting that electrical output follows the same evolution as Ar. Assuming that tribo charges can only transfer across the interface at areas of real contact, it follows that an increasing Ar with contact pressure should produce a corresponding increase in the electrical output. These results underline the importance of accounting for real contact area in TENG design, as well as the distinction between real and nominal contact area in tribo-charge density definition. High-performance ferroelectricassisted TENGs (Fe-TENGs) are developed using electrospun fibrous surfaces based on P(VDFTrFE) with dispersed BaTiO3 (BTO) nanofillers in either cubic (CBTO) or tetragonal (TBTO) form in this thesis. TENGs with three types of tribo-negative surface were investigated and output increased progressively. Critically, P(VDF-TrFE)/TBTO produced higher output than P(VDFTrFE)/ CBTO even though permittivity is nearly identical. Thus, it is shown that BTO fillers boost output, not just by increasing permittivity, but also by enhancing the crystallinity and amount of the β-phase (as TBTO produced a more crystalline β-phase present in greater amounts)
- …