12 research outputs found
Electronic device comprising a memory accessible via a JTAG interface, and corresponding method of accessing a memory
An electronic device includes a processing unit with a memory, a JTAG interface with test-data-input and testÂmode-select lines coupled to the processing unit, a bridge circuit, and a multiplexer circuit. The bridge circuit includes a serial communication interface receiving a serial data input signal which conveys an input serial data frame. The bridge circuit includes a serial-to-parallel converter circuit block receiving the input serial data frame, processing the input serial data frame to read first and second subsets of input binary values therefrom, and transmitting the first subset via a first output signal and the second subset via a second output signal. The multiplexer circuit selectively propagates a received test-data-input signal or the first output signal to the test data input line, and selectively propagates a testÂmode-select signal or the second output signal to the test mode select line of the JTAG interface
Mix & Latch: High-Performance Designs with Single-Clock Mixed-Polarity Latches and Flip-Flops
L'abstract è presente nell'allegato / the abstract is in the attachmen
Design and Optimization of Residual Neural Network Accelerators for Low-Power FPGAs Using High-Level Synthesis
Residual neural networks are widely used in computer vision tasks. They
enable the construction of deeper and more accurate models by mitigating the
vanishing gradient problem. Their main innovation is the residual block which
allows the output of one layer to bypass one or more intermediate layers and be
added to the output of a later layer. Their complex structure and the buffering
required by the residual block make them difficult to implement on
resource-constrained platforms. We present a novel design flow for implementing
deep learning models for field programmable gate arrays optimized for ResNets,
using a strategy to reduce their buffering overhead to obtain a
resource-efficient implementation of the residual layer. Our high-level
synthesis (HLS)-based flow encompasses a thorough set of design principles and
optimization strategies, exploiting in novel ways standard techniques such as
temporal reuse and loop merging to efficiently map ResNet models, and
potentially other skip connection-based NN architectures, into FPGA. The models
are quantized to 8-bit integers for both weights and activations, 16-bit for
biases, and 32-bit for accumulations. The experimental results are obtained on
the CIFAR-10 dataset using ResNet8 and ResNet20 implemented with Xilinx FPGAs
using HLS on the Ultra96-V2 and Kria KV260 boards. Compared to the
state-of-the-art on the Kria KV260 board, our ResNet20 implementation achieves
2.88X speedup with 0.5% higher accuracy of 91.3%, while ResNet8 accuracy
improves by 2.8% to 88.7%. The throughputs of ResNet8 and ResNet20 are 12971
FPS and 3254 FPS on the Ultra96 board, and 30153 FPS and 7601 FPS on the Kria
KV26, respectively. They Pareto-dominate state-of-the-art solutions concerning
accuracy, throughput, and energy
LESS: Low-Power Energy-Efficient Subgraph Isomorphism on FPGA
Low-power energy-efflcient subgraph isomorphism (LESS) is an open-source field-programmable gate array-only low-memory sub graph matching solver designed for energy efficiency. Depending on the input datagraph, the energy consumption of LESS, averaged on different diverse queries, is up to 38x and 93x lower than CPU and GPU solvers respectively
Mix & Latch: An Optimization Flow for High-Performance Designs with Single-Clock Mixed-Polarity Latches and Flip-Flops
Flip-flops are the most used sequential elements in synchronous circuits, but designs based on latches can operate at higher frequencies and occupy less area. Techniques to increase the maximum operating frequency of flip-flop based designs, such as time-borrowing, rely on tight hold constraints that are difficult to satisfy using traditional back-end optimization techniques. We propose Mix & Latch , a methodology to increase the operating frequency of synchronous digital circuits using a single clock tree and a mixed distribution of positive- and negative-edge-triggered flops, and positive- and negative-level-sensitive latches. An efficient mathematical model is proposed to optimize the type and location of the sequential elements of the circuit. We ensure that the initial registers are not moved from their initial location, although they may change type, thus allowing the use of equivalence checking and static timing analysis to verify formally the correctness of the transformation. The technique is validated using a 28nm CMOS FDSOI technology, obtaining 1.33X post-layout average operating frequency improvement on a broad set of benchmarks over a standard commercial design flow. Additionally, the circuit area was also reduced by more than 1.19X on average for the same benchmarks, although the overall area reduction is not a goal of the optimization algorithm. To the best of our knowledge, this is the first work that proposes combining mixed-polarity flip-flops and latches to improve the circuit performance
Mix & Latch: Comparison with state-of-the-art retiming on a RISC-V benchmark
Flip-flops (FFs) are the most commonly used sequential elements in synchronous circuits, but their timing requirements limit the operating frequency. Borrowing time with a latch-based approach can increase operating frequency, but traditional backend optimization tools struggle to manage hold time requirements. The Mix & Latch technique achieves higher frequencies and often lower area than commercial state-of-the-art retiming by exploiting four types of synchronous sequential gates, namely positive and negative edge-triggered FFs and positive and negative transparent latches, all using a single clock tree. In this paper we first significantly accelerate the Mix & Latch flow convergence with respect to past work, by using a post-synthesis-based timing analysis that eliminates the first placement and routing needed for post-layout timing analysis. Then, by adding tolerance margins to the timing model, the pessimism is reduced to improve both convergence speed and maximum frequency. Finally, we reduce the complexity of the problem by applying the methodology only to the sequential elements belonging to critical paths. The effectiveness of Mix & Latch is then demonstrated on a RISC-V processor core from the Pulp platform using 28nm CMOS FDSOI technology. The results are compared to both the original Mix & Latch flow and a retiming performed with a state-of-the-art tool, showing a 25% frequency improvement over the original flow and 7.5% over the retiming flow. Compared to the retiming flow, we achieve comparable or lower power and area, while preserving the original registers and allowing logic equivalence checking.Peer ReviewedPostprint (author's final draft
Protection and characterization of an open source soft core against radiation effects.
The effects of external radiations on electronic systems are becoming more and more evident with the scaling of the technologies used to produce integrated circuits; in order to reduce these effects, particular techniques are applied during the design and the production of electronic devices. These problems are crucial for the following fields: Automotive; Space; High Energy Physics; Because of the low cost and the high versatility of FPGAs, these devices are replacing custom solutions and radiation hardened microcontrollers in electronic systems working in harsh radiation environments; especially for what concerns the high energy physics experiments, the high bandwidth guaranteed by commercial SRAM-based FPGAs is particularly useful for the Readout electronics. Despite these advantages, a microcontroller able to easily perform common automation jobs, like com- municating with other systems, could be necessary in order to avoid the growing of firmware complexity for FPGAs-based systems; for this reason, Soft Cores are programmed on FPGAs. The usage of soft microcontrollers in harsh radiation environments requires the application of specific techniques with the object of reducing the possibility of a misbehavior during the operational time of the device. The purpose of this study is, starting from a core already designed and tested at functional level, to explore the techniques that could be applied in order to harden the core using both commercial tools and custom approaches. The whole research has been performed in the framework of the ITS (Inner Tracking System) Detector update for the ALICE experiment; the design has been developed and tested on a Kintex Ultrascale XCKU040 FPGA from Xilinx
Mix & Latch: Comparison With State-of-the-Art Retiming On a RISC-V Benchmark
Flip-flops (FFs) are the most commonly used sequential elements in synchronous circuits, but their timing requirements limit the operating frequency. Borrowing time with a latch-based approach can increase operating frequency, but traditional backend optimization tools struggle to manage hold time requirements. The Mix & Latch technique achieves higher frequencies and often lower area than commercial state-of-the-art retiming by exploiting four types of synchronous sequential gates, namely positive and negative edge-triggered FFs and positive and negative transparent latches, all using a single clock tree. In this paper we first significantly accelerate the Mix & Latch flow convergence with respect to past work, by using a post-synthesis-based timing analysis that eliminates the first placement and routing needed for post-layout timing analysis. Then, by adding tolerance margins to the timing model, the pessimism is reduced to improve both convergence speed and maximum frequency. Finally, we reduce the complexity of the problem by applying the methodology only to the sequential elements belonging to critical paths. The effectiveness of Mix & Latch is then demonstrated on a RISC-V processor core from the Pulp platform using 28nm CMOS FDSOI technology. The results are compared to both the original Mix & Latch flow and a retiming performed with a state-of-the-art tool, showing a 25% frequency improvement over the original flow and 7.5% over the retiming flow. Compared to the retiming flow, we achieve comparable or lower power and area, while preserving the original registers and allowing logic equivalence checking
Day +60 WT1 assessment on CD34 selected bone marrow better predicts relapse and mortality after allogeneic stem cell transplantation in acute myeloid leukemia patients
The aim of this study was to evaluate the role of WT1 expression after allogeneic stem cell transplantation (alloHSCT) in patients with acute myeloid leukemia (AML). We studied WT1 expression in bone marrow cells from 50 patients in complete remission on day +60 after transplant. WT1 was assessed on unfractionated bone marrow mononuclear cells (MNC) and on CD34+ selected cells (CD34+). A ROC curve analysis identified 800 WT1 copies on CD34+ selected cells, as the best cut-off predicting relapse (AUC 0.842, p=0.0006, 85.7% sensitivity and 81.6% specificity) and 100 copies in MNC (AUC 0.819, p=0.007, 83.3% sensitivity and 88.2% specificity). Using the 800 WT1 copy cut off in CD34+ cells, the 2 year cumulative incidence of relapse was 12% vs 38% (p=0.005), and 2 year survival 88% vs 55% (p=0.02). Using the 100 WT1 copy cut off in unfractionated MNC, the 2 year cumulative incidence of relapse 13% vs 44% (p=0.01) and the 2 year survival 88% vs 55% (p=0.08). In a multivariate Cox analysis WT1 expression in CD34 cells proved to highly predictive of relapse (p=0.004); also WT1 expression on unfractionated cells predicted relapse (p=0.03). In conclusion, day-60 WT1 expression after allogeneic HSCT is a significant predictor of relapse, particularly when tested on CD34+ selected bone marrow cells
Pre-transplant gene profiling characterization by next-generation DNA sequencing might predict relapse occurrence after hematopoietic stem cell transplantation in patients affected by AML
Background: In the last decade, many steps forward have been made in acute myeloid leukemia prognostic stratification, adding next-generation sequencing techniques to the conventional molecular assays. This resulted in the revision of the current risk classification and the introduction of new target therapies.
Aims and methods: We wanted to evaluate the prognostic impact of acute myeloid leukemia (AML) mutational pattern on relapse occurrence and survival after allogeneic stem cell transplantation. A specific next-generation sequencing (NGS) panel containing 26 genes was designed for the study. Ninety-six patients studied with NGS at diagnosis were included and retrospectively studied for post-transplant outcomes.
Results: Only eight patients did not show any mutations. Multivariate Cox regression revealed FLT3 (HR, 3.36; p=0.02), NRAS (HR, 4.78; p=0.01), TP53 (HR, 4.34; p=0.03), and WT1 (HR 5.97; p=0.005) mutations as predictive variables for relapse occurrence after transplantation. Other independent variables for relapse recurrence were donor age (HR, 0.97; p=0.04), the presence of an adverse cytogenetic risk at diagnosis (HR, 3.03; p=0.04), and the obtainment of complete remission of the disease before transplantation (HR, 0.23; p=0.001). Overall survival appeared to be affected only by grade 2-4 acute GvHD occurrence (HR, 2.29; p=0.05) and relapse occurrence (HR, 4.33; p=0.0001) in multivariate analysis.
Conclusions: The small number of patients and the retrospective design of the study might affect the resonance of our data. Although results on TP53, FLT3, and WT1 were comparable to previous reports, the interesting data on NRAS deserve attention