35,635 research outputs found

    Field-based branch prediction for packet processing engines

    Get PDF
    Network processors have exploited many aspects of architecture design, such as employing multi-core, multi-threading and hardware accelerator, to support both the ever-increasing line rates and the higher complexity of network applications. Micro-architectural techniques like superscalar, deep pipeline and speculative execution provide an excellent method of improving performance without limiting either the scalability or flexibility, provided that the branch penalty is well controlled. However, it is difficult for traditional branch predictor to keep increasing the accuracy by using larger tables, due to the fewer variations in branch patterns of packet processing. To improve the prediction efficiency, we propose a flow-based prediction mechanism which caches the branch histories of packets with similar header fields, since they normally undergo the same execution path. For packets that cannot find a matching entry in the history table, a fallback gshare predictor is used to provide branch direction. Simulation results show that the our scheme achieves an average hit rate in excess of 97.5% on a selected set of network applications and real-life packet traces, with a similar chip area to the existing branch prediction architectures used in modern microprocessors

    The weakening of branch predictor performance as an inevitable side effect of exploiting control independence

    Get PDF
    Many algorithms are inherently sequential and hard to explicitly parallelize. Cores designed to aggressively handle these problems exhibit deeper pipelines and wider fetch widths to exploit instruction-level parallelism via out-of-order execution. As these parameters increase, so does the amount of instructions fetched along an incorrect path when a branch is mispredicted. Many of the instructions squashed after a branch are control independent, meaning they will be fetched regardless of whether the candidate branch is taken or not. There has been much research in retaining these control independent instructions on misprediction of the candidate branch. This research shows that there is potential for exploiting control independence since under favorable circumstances many benchmarks can exhibit 30% or more speedup. Though these control independent processors are meant to lessen the damage of misprediction, an inherent side-effect of fetching out of order, branch weakening, keeps realized speedup from reaching its potential. This thesis introduces, formally defines, and identifies the types of branch weakening. Useful information is provided to develop techniques that may reduce weakening. A classification is provided that measures each type of weakening to help better determine potential speedup of control independence processors. Experimentation shows that certain applications suffer greatly from weakening. Total branch mispredictions increase by 30% in several cases. Analysis has revealed two broad causes of weakening: changes in branch predictor update times and changes in the outcome history used by branch predictors. Each of these broad causes are classified into more specific causes, one of which is due to the loss of nearby correlation data and cannot be avoided. The classification technique presented in this study measures that 45% of the weakening in the selected SPEC CPU 2000 benchmarks are of this type while 40% involve other changes in outcome history. The remaining 15% is caused by changes in predictor update times. In applying fundamental techniques that reduce weakening, the Control Independence Aware Branch Predictor is developed. This predictor reduces weakening for the majority of chosen benchmarks. In doing so, a control independence processor, snipper, to attain significantly higher speedup for 10 out of 15 studied benchmarks

    Recent ASDEX Upgrade research in support of ITER and DEMO

    Get PDF
    Recent experiments on the ASDEX Upgrade tokamak aim at improving the physics base for ITER and DEMO to aid the machine design and prepare efficient operation. Type I edge localized mode (ELM) mitigation using resonant magnetic perturbations (RMPs) has been shown at low pedestal collisionality ( ν ∗ ped < 0 . 4 ) . In contrast to the previous high ν ∗ regime, suppression only occurs in a narrow RMP spectral window, indicating a resonant process, and a concomitant confinement drop is observed due to a reduction of pedestal top density and electron temperature. Strong evidence is found for the ion heat flux to be the decisive element for the L–H power threshold. A physics based scaling of the density at which the minimum P LH occurs indicates that ITER could take advantage of it to initiate H-mode at lower density than that of the final Q = 10 operational point. Core density fluctuation measurements resolved in radius and wave number show that an increase of R/L T e introduced by off-axis electron cyclotron resonance heating (ECRH) mainly increases the large scale fluctuations. The radial variation of the fluctuation level is in agreement with simulations using the GENE code. Fast particles are shown to undergo classical slowing down in the absence of large scale magnetohydrodynamic (MHD) events and for low heating power, but show signs of anomalous radial redistribution at large heating power, consistent with a broadened off-axis neutral beam current drive current profile under these conditions. Neoclassical tearing mode (NTM) suppression experiments using electron cyclotron current drive (ECCD) with feedback controlled deposition have allowed to test several control strategies for ITER, including automated control of (3,2) and (2,1) NTMs during a single discharge. Disruption mitigation studies using massive gas injection (MGI) can show an increased fuelling efficiency with high field side injection, but a saturation of the fuelling efficiency is observed at high injected mass as needed for runaway electron suppression. Large locked modes can significantly decrease the fuelling efficiency and increase the asymmetry of radiated power during MGI mitigation. Concerning power exhaust, the partially detached ITER divertor scenario has been demonstrated at P sep /R = 10 MW m − 1 in ASDEX Upgrade, with a peak time averaged target load around 5MWm − 2 , well consistent with the component limits for ITER. Developing this towards DEMO, full detachment was achieved at P sep /R = 7MWm − 1 and stationary discharges with core radiation fraction of the order of DEMO requirements (70% instead of the 30% needed for ITER) were demonstrated. Finally, it remains difficult to establish the standard ITER Q = 10 scenario at low q 95 = 3 in the all-tungsten (all-W) ASDEX Upgrade due to the observed poor confinement at low β N . This is mainly due to a degraded pedestal performance and hence investigations at shifting the operational point to higher β N by lowering the current have been started. At higher q 95 , pedestal performance can be recovered by seeding N 2 as well as CD 4 , which is interpreted as improved pedestal stability due to the decrease of bootstrap current with increasing Z eff . Concerning advanced scenarios, the upgrade of ECRH power has allowed experiments with central ctr-ECCD to modify the q -profile in improved H-mode scenarios, showing an increase in confinement at still good MHD stability with flat elevated q -profiles at values between 1.5 and 2.European Commission (EUROfusion 633053

    Encrypted statistical machine learning: new privacy preserving methods

    Full text link
    We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.Comment: 39 page

    Reducing complexity of processor front ends with static analysis and selective preloading

    Get PDF
    General purpose processors were once designed with the major goal of maximizing performance. As power consumption has grown, with the advent of multi-core processors and the rising importance of embedded and mobile devices, the importance of designing efficient and low cost architectures has increased. This dissertation focuses on reducing the complexity of the front end of the processor, mainly branch predictors. Branch predictors have also been designed with a focus on improving prediction accuracy so that performance is maximized. To accomplish this, the predictors proposed in the literature and used in real systems have become increasingly complex and larger, a trend that is inconsistent with the anticipated trend of simpler and more numerous cores in future processors. Much of the increased complexity in many recently proposed predictors is used to select a part of history most correlated to a branch. This makes them costly, if not impossible to implement practically. We suggest that the complex decisions do not have to be made in hardware at prediction or run time and can be moved offline. High accuracy can be achieved by making complex prediction decisions in a one-time profile run instead of using complex hardware. We apply these techniques to Spotlight, our own low cost, low complexity branch predictor. A static analysis step determines, for each branch, the history segment yielding the highest accuracy. This information is placed in unused instruction space. Spotlight achieves higher accuracy than other implementation-simple predictors such as Gshare and YAGS and matches or outperforms the two complex neural predictors that we compare it to. To ensure timely access, we evaluate using a hardware table (called a BIT) to store profile bits after they are extracted from instructions, and the accuracy of using this table. The drawback of a BIT is its size. We introduce a novel technique, Preloading that places data for an instruction in prior blocks on the path to the instruction. By doing so, it is able to significantly reduce the size of the BIT needed for good performance. We discuss other applications of Preloading on the front end other than branch predictors

    Learning from the machine: interpreting machine learning algorithms for point- and extended- source classification

    Get PDF
    We investigate star-galaxy classification for astronomical surveys in the context of four methods enabling the interpretation of black-box machine learning systems. The first is outputting and exploring the decision boundaries as given by decision tree based methods, which enables the visualization of the classification categories. Secondly, we investigate how the Mutual Information based Transductive Feature Selection (MINT) algorithm can be used to perform feature pre-selection. If one would like to provide only a small number of input features to a machine learning classification algorithm, feature pre-selection provides a method to determine which of the many possible input properties should be selected. Third is the use of the tree-interpreter package to enable popular decision tree based ensemble methods to be opened, visualized, and understood. This is done by additional analysis of the tree based model, determining not only which features are important to the model, but how important a feature is for a particular classification given its value. Lastly, we use decision boundaries from the model to revise an already existing method of classification, essentially asking the tree based method where decision boundaries are best placed and defining a new classification method. We showcase these techniques by applying them to the problem of star-galaxy separation using data from the Sloan Digital Sky Survey (hereafter SDSS). We use the output of MINT and the ensemble methods to demonstrate how more complex decision boundaries improve star-galaxy classification accuracy over the standard SDSS frames approach (reducing misclassifications by up to ≈33%\approx33\%). We then show how tree-interpreter can be used to explore how relevant each photometric feature is when making a classification on an object by object basis.Comment: 12 pages, 8 figures, 8 table
    • …
    corecore