35,635 research outputs found
Field-based branch prediction for packet processing engines
Network processors have exploited many aspects of architecture design, such as employing multi-core, multi-threading and hardware accelerator, to support both the ever-increasing line rates and the higher complexity of network applications. Micro-architectural techniques like superscalar, deep pipeline and speculative execution provide an excellent method of improving performance without limiting either the scalability or flexibility, provided that the branch penalty is well controlled. However, it is difficult for traditional branch predictor to keep increasing the accuracy by using larger tables, due to the fewer variations in branch patterns of packet processing. To improve the prediction efficiency, we propose a flow-based prediction mechanism which caches the branch histories of packets with similar header fields, since they normally undergo the same execution path. For packets that cannot find a matching entry in the history table, a fallback gshare predictor is used to provide branch direction. Simulation results show that the our scheme achieves an average hit rate in excess of 97.5% on a selected set of network applications and real-life packet traces, with a similar chip area to the existing branch prediction architectures used in modern microprocessors
The weakening of branch predictor performance as an inevitable side effect of exploiting control independence
Many algorithms are inherently sequential and hard to explicitly parallelize. Cores designed to aggressively handle these problems exhibit deeper pipelines and wider fetch widths to exploit instruction-level parallelism via out-of-order execution. As these parameters increase, so does the amount of instructions fetched along an incorrect path when a branch is mispredicted. Many of the instructions squashed after a branch are control independent, meaning they will be fetched regardless of whether the candidate branch is taken or not. There has been much research in retaining these control independent instructions on misprediction of the candidate branch. This research shows that there is potential for exploiting control independence since under favorable circumstances many benchmarks can exhibit 30% or more speedup. Though these control independent processors are meant to lessen the damage of misprediction, an inherent side-effect of fetching out of order, branch weakening, keeps realized speedup from reaching its potential. This thesis introduces, formally defines, and identifies the types of branch weakening. Useful information is provided to develop techniques that may reduce weakening. A classification is provided that measures each type of weakening to help better determine potential speedup of control independence processors. Experimentation shows that certain applications suffer greatly from weakening. Total branch mispredictions increase by 30% in several cases. Analysis has revealed two broad causes of weakening: changes in branch predictor update times and changes in the outcome history used by branch predictors. Each of these broad causes are classified into more specific causes, one of which is due to the loss of nearby correlation data and cannot be avoided. The classification technique presented in this study measures that 45% of the weakening in the selected SPEC CPU 2000 benchmarks are of this type while 40% involve other changes in outcome history. The remaining 15% is caused by changes in predictor update times. In applying fundamental techniques that reduce weakening, the Control Independence Aware Branch Predictor is developed. This predictor reduces weakening for the majority of chosen benchmarks. In doing so, a control independence processor, snipper, to attain significantly higher speedup for 10 out of 15 studied benchmarks
Recent ASDEX Upgrade research in support of ITER and DEMO
Recent experiments on the ASDEX Upgrade tokamak aim at improving the physics base for ITER and DEMO to aid the machine
design and prepare efficient operation. Type I edge localized mode (ELM) mitigation using resonant magnetic perturbations
(RMPs) has been shown at low pedestal collisionality (
ν
â
ped
<
0
.
4
)
. In contrast to the previous high
ν
â
regime, suppression only
occurs in a narrow RMP spectral window, indicating a resonant process, and a concomitant confinement drop is observed due
to a reduction of pedestal top density and electron temperature. Strong evidence is found for the ion heat flux to be the decisive
element for the LâH power threshold. A physics based scaling of the density at which the minimum
P
LH
occurs indicates that
ITER could take advantage of it to initiate H-mode at lower density than that of the final
Q
=
10 operational point. Core density
fluctuation measurements resolved in radius and wave number show that an increase of
R/L
T
e
introduced by off-axis electron
cyclotron resonance heating (ECRH) mainly increases the large scale fluctuations. The radial variation of the fluctuation level
is in agreement with simulations using the GENE code. Fast particles are shown to undergo classical slowing down in the
absence of large scale magnetohydrodynamic (MHD) events and for low heating power, but show signs of anomalous radial
redistribution at large heating power, consistent with a broadened off-axis neutral beam current drive current profile under these
conditions. Neoclassical tearing mode (NTM) suppression experiments using electron cyclotron current drive (ECCD) with
feedback controlled deposition have allowed to test several control strategies for ITER, including automated control of (3,2) and
(2,1) NTMs during a single discharge. Disruption mitigation studies using massive gas injection (MGI) can show an increased
fuelling efficiency with high field side injection, but a saturation of the fuelling efficiency is observed at high injected mass as
needed for runaway electron suppression. Large locked modes can significantly decrease the fuelling efficiency and increase
the asymmetry of radiated power during MGI mitigation. Concerning power exhaust, the partially detached ITER divertor
scenario has been demonstrated at
P
sep
/R
=
10 MW m
â
1
in ASDEX Upgrade, with a peak time averaged target load around
5MWm
â
2
, well consistent with the component limits for ITER. Developing this towards DEMO, full detachment was achieved
at
P
sep
/R
=
7MWm
â
1
and stationary discharges with core radiation fraction of the order of DEMO requirements (70% instead
of the 30% needed for ITER) were demonstrated. Finally, it remains difficult to establish the standard ITER
Q
=
10 scenario at
low
q
95
=
3 in the all-tungsten (all-W) ASDEX Upgrade due to the observed poor confinement at low
β
N
. This is mainly due to
a degraded pedestal performance and hence investigations at shifting the operational point to higher
β
N
by lowering the current
have been started. At higher
q
95
, pedestal performance can be recovered by seeding N
2
as well as CD
4
, which is interpreted as
improved pedestal stability due to the decrease of bootstrap current with increasing
Z
eff
. Concerning advanced scenarios, the
upgrade of ECRH power has allowed experiments with central ctr-ECCD to modify the
q
-profile in improved H-mode scenarios,
showing an increase in confinement at still good MHD stability with flat elevated
q
-profiles at values between 1.5 and 2.European Commission (EUROfusion 633053
Encrypted statistical machine learning: new privacy preserving methods
We present two new statistical machine learning methods designed to learn on
fully homomorphic encrypted (FHE) data. The introduction of FHE schemes
following Gentry (2009) opens up the prospect of privacy preserving statistical
machine learning analysis and modelling of encrypted data without compromising
security constraints. We propose tailored algorithms for applying extremely
random forests, involving a new cryptographic stochastic fraction estimator,
and na\"{i}ve Bayes, involving a semi-parametric model for the class decision
boundary, and show how they can be used to learn and predict from encrypted
data. We demonstrate that these techniques perform competitively on a variety
of classification data sets and provide detailed information about the
computational practicalities of these and other FHE methods.Comment: 39 page
Reducing complexity of processor front ends with static analysis and selective preloading
General purpose processors were once designed with the major goal of maximizing performance. As power consumption has grown, with the advent of multi-core processors and the rising importance of embedded and mobile devices, the importance of designing efficient and low cost architectures has increased. This dissertation focuses on reducing the complexity of the front end of the processor, mainly branch predictors. Branch predictors have also been designed with a focus on improving prediction accuracy so that performance is maximized. To accomplish this, the predictors proposed in the literature and used in real systems have become increasingly complex and larger, a trend that is inconsistent with the anticipated trend of simpler and more numerous cores in future processors. Much of the increased complexity in many recently proposed predictors is used to select a part of history most correlated to a branch. This makes them costly, if not impossible to implement practically. We suggest that the complex decisions do not have to be made in hardware at prediction or run time and can be moved offline. High accuracy can be achieved by making complex prediction decisions in a one-time profile run instead of using complex hardware. We apply these techniques to Spotlight, our own low cost, low complexity branch predictor. A static analysis step determines, for each branch, the history segment yielding the highest accuracy. This information is placed in unused instruction space. Spotlight achieves higher accuracy than other implementation-simple predictors such as Gshare and YAGS and matches or outperforms the two complex neural predictors that we compare it to. To ensure timely access, we evaluate using a hardware table (called a BIT) to store profile bits after they are extracted from instructions, and the accuracy of using this table. The drawback of a BIT is its size. We introduce a novel technique, Preloading that places data for an instruction in prior blocks on the path to the instruction. By doing so, it is able to significantly reduce the size of the BIT needed for good performance. We discuss other applications of Preloading on the front end other than branch predictors
Learning from the machine: interpreting machine learning algorithms for point- and extended- source classification
We investigate star-galaxy classification for astronomical surveys in the
context of four methods enabling the interpretation of black-box machine
learning systems. The first is outputting and exploring the decision boundaries
as given by decision tree based methods, which enables the visualization of the
classification categories. Secondly, we investigate how the Mutual Information
based Transductive Feature Selection (MINT) algorithm can be used to perform
feature pre-selection. If one would like to provide only a small number of
input features to a machine learning classification algorithm, feature
pre-selection provides a method to determine which of the many possible input
properties should be selected. Third is the use of the tree-interpreter package
to enable popular decision tree based ensemble methods to be opened,
visualized, and understood. This is done by additional analysis of the tree
based model, determining not only which features are important to the model,
but how important a feature is for a particular classification given its value.
Lastly, we use decision boundaries from the model to revise an already existing
method of classification, essentially asking the tree based method where
decision boundaries are best placed and defining a new classification method.
We showcase these techniques by applying them to the problem of star-galaxy
separation using data from the Sloan Digital Sky Survey (hereafter SDSS). We
use the output of MINT and the ensemble methods to demonstrate how more complex
decision boundaries improve star-galaxy classification accuracy over the
standard SDSS frames approach (reducing misclassifications by up to
). We then show how tree-interpreter can be used to explore how
relevant each photometric feature is when making a classification on an object
by object basis.Comment: 12 pages, 8 figures, 8 table
- âŚ