65 research outputs found

    Architectural Vulnerability Factor (AVF) Assessment of x86 CPUs using Architectural Correct Execution (ACE) analysis in the Gem5 Simulator

    Get PDF
    Ο συντελεστής AVF έχει υπολογιστεί για 10 διαφορετικά προγράμματα τόσο για το αρχείο φυσικών ακέραιων καταχωρητών του Gem5 όσο και για τη μνήμη δεδομένων πρώτου επιπέδου. Για κάθε ένα απο αυτά έχουν υπολογιστεί οι χρόνοι εκτέλεσής τους καθώς και τα ευάλωτα διαστήματά τους.In this study, we computed AVF for ten different benchmarks in two different microarchitectural modules, the integer physical integer register file and the L1 Data Cache of Gem5. For each benchmark statistics about its runtime and ACE interval time are reported

    Efficient design-space exploration of custom instruction-set extensions

    Get PDF
    Customization of processors with instruction set extensions (ISEs) is a technique that improves performance through parallelization with a reasonable area overhead, in exchange for additional design effort. This thesis presents a collection of novel techniques that reduce the design effort and cost of generating ISEs by advancing automation and reconfigurability. In addition, these techniques maximize the perfomance gained as a function of the additional commited resources. Including ISEs into a processor design implies development at many levels. Most prior works on ISEs solve separate stages of the design: identification, selection, and implementation. However, the interations between these stages also hold important design trade-offs. In particular, this thesis addresses the lack of interaction between the hardware implementation stage and the two previous stages. Interaction with the implementation stage has been mostly limited to accurately measuring the area and timing requirements of the implementation of each ISE candidate as a separate hardware module. However, the need to independently generate a hardware datapath for each ISE limits the flexibility of the design and the performance gains. Hence, resource sharing is essential in order to create a customized unit with multi-function capabilities. Previously proposed resource-sharing techniques aggressively share resources amongst the ISEs, thus minimizing the area of the solution at any cost. However, it is shown that aggressively sharing resources leads to large ISE datapath latency. Thus, this thesis presents an original heuristic that can be parameterized in order to control the degree of resource sharing amongst a given set of ISEs, thereby permitting the exploration of the existing implementation trade-offs between instruction latency and area savings. In addition, this thesis introduces an innovative predictive model that is able to quickly expose the optimal trade-offs of this design space. Compared to an exhaustive exploration of the design space, the predictive model is shown to reduce by two orders of magnitude the number of executions of the resource-sharing algorithm that are required in order to find the optimal trade-offs. This thesis presents a technique that is the first one to combine the design spaces of ISE selection and resource sharing in ISE datapath synthesis, in order to offer the designer solutions that achieve maximum speedup and maximum resource utilization using the available area. Optimal trade-offs in the design space are found by guiding the selection process to favour ISE combinations that are likely to share resources with low speedup losses. Experimental results show that this combined approach unveils new trade-offs between speedup and area that are not identified by previous selection techniques; speedups of up to 238% over previous selection thecniques were obtained. Finally, multi-cycle ISEs can be pipelined in order to increase their throughput. However, it is shown that traditional ISE identification techniques do not allow this optimization due to control flow overhead. In order to obtain the benefits of overlapping loop executions, this thesis proposes to carefully insert loop control flow statements into the ISEs, thus allowing the ISE to control the iterations of the loop. The proposed ISEs broaden the scope of instruction-level parallelism and obtain higher speedups compared to traditional ISEs, primarily through pipelining, the exploitation of spatial parallelism, and reducing the overhead of control flow statements and branches. A detailed case study of a real application shows that the proposed method achieves 91% higher speedups than the state-of-the-art, with an area overhead of less than 8% in hardware implementation

    Increasing the efficacy of automated instruction set extension

    Get PDF
    The use of Instruction Set Extension (ISE) in customising embedded processors for a specific application has been studied extensively in recent years. The addition of a set of complex arithmetic instructions to a baseline core has proven to be a cost-effective means of meeting design performance requirements. This thesis proposes and evaluates a reconfigurable ISE implementation called “Configurable Flow Accelerators” (CFAs), a number of refinements to an existing Automated ISE (AISE) algorithm called “ISEGEN”, and the effects of source form on AISE. The CFA is demonstrated repeatedly to be a cost-effective design for ISE implementation. A temporal partitioning algorithm called “staggering” is proposed and demonstrated on average to reduce the area of CFA implementation by 37% for only an 8% reduction in acceleration. This thesis then turns to concerns within the ISEGEN AISE algorithm. A methodology for finding a good static heuristic weighting vector for ISEGEN is proposed and demonstrated. Up to 100% of merit is shown to be lost or gained through the choice of vector. ISEGEN early-termination is introduced and shown to improve the runtime of the algorithm by up to 7.26x, and 5.82x on average. An extension to the ISEGEN heuristic to account for pipelining is proposed and evaluated, increasing acceleration by up to an additional 1.5x. An energyaware heuristic is added to ISEGEN, which reduces the energy used by a CFA implementation of a set of ISEs by an average of 1.6x, up to 3.6x. This result directly contradicts the frequently espoused notion that “bigger is better” in ISE. The last stretch of work in this thesis is concerned with source-level transformation: the effect of changing the representation of the application on the quality of the combined hardwaresoftware solution. A methodology for combined exploration of source transformation and ISE is presented, and demonstrated to improve the acceleration of the result by an average of 35% versus ISE alone. Floating point is demonstrated to perform worse than fixed point, for all design concerns and applications studied here, regardless of ISEs employed

    Methodology for complex dataflow application development

    Get PDF
    This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces

    Real-Time Narrowband and Wideband Beamforming Techniques for Fully-Digital RF Arrays

    Get PDF
    Elemental digital beamforming offers increased flexibility for multi-function radio frequency (RF) systems supporting radar and communications applications. As fully digital arrays, components, and subsystems are becoming more affordable in the military and commercial industries, analog components such as phase shifters, filters, and mixers have begun to be replaced by digital circuits which presents efficiency challenges in power constrained scenarios. Furthermore, multi-function radar and communications systems are exploiting the multiple simultaneous beam capability provided by digital at every element beamforming. Along with further increasing data samples rates and increasing instantaneous bandwidths (IBW), real time processing in the digital domain has become a challenge due to the amount of data produced and processed in current systems. These arrays generate hundreds of gigabits per second of data throughput or more which is costly to send off-chip to an adjunct processor fundamentally limiting the overall performance of an RF array system. In this dissertation, digital filtering techniques and architectures are described which calibrate and beamform both narrowband and wideband RF arrays on receive. The techniques are shown to optimize one or many parameters of the digital transceiver system to improve the overall system efficiency. Digitally beamforming in the beamspace is shown to further increase the processing efficiency of an adaptive system compared to state of the art frequency domain approaches by minimizing major processing bottlenecks of generating adaptive filter coefficients. The techniques discussed are compared and contrasted across different hardware processor modules including field-programmable gate arrays (FPGAs), graphical processing units (GPUs), and central processing units (CPUs)

    Speeding up dynamic compilation: concurrent and parallel dynamic compilation

    Get PDF
    The main challenge faced by a dynamic compilation system is to detect and translate frequently executed program regions into highly efficient native code as fast as possible. To efficiently reduce dynamic compilation latency, a dynamic compilation system must improve its workload throughput, i.e. compile more application hotspots per time. As time for dynamic compilation adds to the overall execution time, the dynamic compiler is often decoupled and operates in a separate thread independent from the main execution loop to reduce the overhead of dynamic compilation. This thesis proposes innovative techniques aimed at effectively speeding up dynamic compilation. The first contribution is a generalised region recording scheme optimised for program representations that require dynamic code discovery (e.g. binary program representations). The second contribution reduces dynamic compilation cost by incrementally compiling several hot regions in a concurrent and parallel task farm. Altogether the combination of generalised light-weight code discovery, large translation units, dynamic work scheduling, and concurrent and parallel dynamic compilation ensures timely and efficient processing of compilation workloads. Compared to state-of-the-art dynamic compilation approaches, speedups of up to 2.08 are demonstrated for industry standard benchmarks such as BioPerf, Spec Cpu 2006, and Eembc. Next, innovative applications of the proposed dynamic compilation scheme to speed up architectural and micro-architectural performance modelling are demonstrated. The main contribution in this context is to exploit runtime information to dynamically generate optimised code that accurately models architectural and micro-architectural components. Consequently, compilation units are larger and more complex resulting in increased compilation latencies. Large and complex compilation units present an ideal use case for our concurrent and parallel dynamic compilation infrastructure. We demonstrate that our novel micro-architectural performance modelling is faster than state-of-the-art Fpga-based simulation, whilst providing the same level of accuracy

    Novel load identification techniques and a steady state self-tuning prototype for switching mode power supplies

    Get PDF
    Control of Switched Mode Power Supplies (SMPS) has been traditionally achieved through analog means with dedicated integrated circuits (ICs). However, as power systems are becoming increasingly complex, the classical concept of control has gradually evolved into the more general problem of power management, demanding functionalities that are hardly achievable in analog controllers. The high flexibility offered by digital controllers and their capability to implement sophisticated control strategies, together with the programmability of controller parameters, make digital control very attractive as an option for improving the features of dcdc converters. On the other side, digital controllers find their major weak point in the achievable dynamic performances of the closed loop system. Indeed, analogto-digital conversion times, computational delays and sampling-related delays strongly limit the small signal closed loop bandwidth of a digitally controlled SMPS. Quantization effects set other severe constraints not known to analog solutions. For these reasons, intensive scientific research activity is addressing the problem of making digital compensator stronger competitors against their analog counterparts in terms of achievable performances. In a wide range of applications, dcdc converters with high efficiency over the whole range of their load values are required. Integrated digital controllers for Switching Mode Power Supplies are gaining growing interest, since it has been shown the feasibility of digital controller ICs specifically developed for high frequency switching converters. One very interesting potential benefit is the use of autotuning of controller parameters (on-line controllers), so that the dynamic response can be set at the software level, independently of output capacitor filters, component variations and ageing. These kind of algorithms are able to identify the output filter configuration (system identification) and then automatically compute the best compensator gains to adjust system margins and bandwidth. In order to be an interesting solution, however, the self-tuning should satisfy two important requirements: it should not heavily affect converter operation under nominal condition and it should be based on a simple and robust algorithm whose complexity does not require a significant increase of the silicon area of the IC controller. The first issue is avoided performing the system identification (SI) with the system open loop configuration, where perturbations can be induced in the system before the start up. Much more challenging is to satisfy this requirement during steady state operations, where perturbations on the output voltage are limited by the regular operations of the converter. The main advantage of steady state SI methods, is the detection of possible non-idealities occurring during the converter operations. In this way, the system dynamics can be consequently adjusted with the compensator parameters tuning. The resource saving issue, requires the development of äd-hocßelf-tuning techniques specifically tailored for integrated digitally controlled converters. Considering the flexibility of digital control, self-tuning algorithms can be studied and easily integrated at hardware level into closed loop SMPS reducing development time and R & D costs. The work of this dissertation finds its origin in this context. Smart power management is accomplished by tuning the controller parameters accordingly to the identified converter configuration. Themain difficult for self-tuning techniques is the identification of the converter output filter configuration. Two novel system identification techniques have been validated in this dissertation. The open loop SI method is based on the system step response, while dithering amplification effects are exploited for the steady state SI method. The open loop method can be used as autotunig approach during or before the system start up, a step evolving reference voltage has been used as system perturbation and to obtain the output filter information with the Power Spectral Density (PSD) computation of the system step response. The use of ¢§ modulator is largely increasing in digital control feedback. During the steady state, the finite resolution introduces quantization effects on the signal path causing low frequency contributes of the digital control word. Through oversampling-dithering capabilities of ¢§ modulators, resolution improvements are obtained. The presented steady state identification techniques demonstrates that, amplifying the dithering effects on the signal path, the output filter information can be obtained on the digital side by processing with the PSD computation the perturbed output voltage. The amount of noise added on the output voltage does not affect the converter operations, mathematical considerations have been addressed and then justified both with a Matlab/Simulink fixed-point and a FPGA-based closed loop system. The load output filter identification of both algorithms, refer to the frequency domain. When the respective perturbations occurs, the system response is observed on the digital side and processed with the PSD computation. The extracted parameters are the resonant frequency ans the possible ESR (Effective Series Resistance) contributes,which can be detected as maximumin the PSD output. The SI methods have been validated for different configurations of buck converters on a fixed-point closed loop model, however, they can be easily applied to further converter configurations. The steady state method has been successfully integrated into a FPGA-based prototype for digitally controlled buck converters, that integrates a PSD computer needed for the load parameters identification. At this purpose, a novel VHDL-coded full-scalable hybrid processor for Constant Geometry FFT (CG-FFT) computation has been designed and integrated into the PSD computation system. The processor is based on a variation of the conventional algorithm used for FFT, which is the Constant-Geometry FFT (CG-FFT).Hybrid CORDIC-LUT scalable architectures, has been introduced as alternative approach for the twiddle factors (phase factors) computation needed during the FFT algorithms execution. The shared core architecture uses a single phase rotator to satisfy all TF requests. It can achieve improved logic saving by trading off with computational speed. The pipelined architecture is composed of a number of stages equal to the number of PEs and achieves the highest possible throughput, at the expense of more hardware usage

    Improving Compute & Data Efficiency of Flexible Architectures

    Get PDF

    Applications in Electronics Pervading Industry, Environment and Society

    Get PDF
    This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs
    corecore