2,293 research outputs found
Recommended from our members
Branch prediction apparatus, systems, and methods
An apparatus and a system, as well as a method and article, may operate to predict a branch within a first operating context, such as a user context, using a first strategy; and to predict a branch within a second operating context, such as an operating system context, using a second strategy. In some embodiments, apparatus and systems may comprise one or more first storage locations to store branch history information associated with a first operating context, and one ore more second storage locations to store branch history information associated with a second operating context.Board of Regents, University of Texas Syste
Control-flow speculation through value prediction for superscalar processors
In this paper, we introduce a new branch predictor that predicts the outcomes of branches by predicting the value of their inputs and performing an early computation of their results according to the predicted values. The design of a hybrid predictor comprising our branch predictor and a correlating branch predictor is presented. We also propose a new selector that chooses the most reliable prediction for each branch. This selector is based on the path followed to reach the branch. Results for immediate updates show a significant improvement with respect to a conventional hybrid predictor for different size configurations. In addition, the proposed hybrid predictor with a size of 8 KB achieves the same miss ratio as a conventional one of 64 KB. Performance evaluation for a dynamically-scheduled superscalar processor, with realistic updates, shows a speed-up of 11% despite its higher latency (up to 4 cycles)Peer ReviewedPostprint (published version
The predictability of data values
reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact
Ultra low power cooperative branch prediction
Branch Prediction is a key task in the operation of a high performance processor. An
inaccurate branch predictor results in increased program run-time and a rise in energy
consumption. The drive towards processors with limited die-space and tighter energy
requirements will continue to intensify over the coming years, as will the shift towards
increasingly multicore processors. Both trends make it increasingly important and
increasingly difficult to find effective and efficient branch predictor designs.
This thesis presents savings in energy and die-space through the use of more efficient
cooperative branch predictors achieved through novel branch prediction designs.
The first contribution is a new take on the problem of a hybrid dynamic-static branch
predictor allocating branches to be predicted by one of its sub-predictors. A new bias
parameter is introduced as a mechanism for trading off a small amount of performance
for savings in die-space and energy. This is achieved by predicting more branches
with the static predictor, ensuring that only the branches that will most benefit from
the dynamic predictor’s resources are predicted dynamically. This reduces pressure on
the dynamic predictor’s resources allowing for a smaller predictor to achieve very high
accuracy. An improvement in run-time of 7-8% over the baseline BTFN predictor is
observed at a cost of a branch predictor bits budget of much less than 1KB.
Next, a novel approach to branch prediction for multicore data-parallel applications
is presented. The Peloton branch prediction scheme uses a pack of cyclists as an
illustration of how a group of processors running similar tasks can share branch predictions
to improve accuracy and reduce runtime. The results show that sharing updates
for conditional branches across the existing interconnect for I-cache and D-cache updates
results in a reduction of mispredictions of up to 25% and a reduction in run-time
of up to 6%. McPAT is used to present an energy model that suggests the savings are
achieved at little to no increase in energy required. The technique is then extended to
architectures where the size of the branch predictors may differ between cores. The
results show that such heterogeneity can dramatically reduce the die-space required
for an accurate branch predictor while having little impact on performance and up to
9% energy savings. The approach can be combined with the Peloton branch prediction
scheme for reduction in branch mispredictions of up to 5%
An innovative approach to performance metrics calculus in cloud computing environments: a guest-to-host oriented perspective
In virtualized systems, the task of profiling and resource monitoring is not straight-forward. Many datacenters perform CPU overcommittment using hypervisors, running multiple virtual machines on a single computer where the total number of virtual CPUs exceeds the total number of physical CPUs available.
From a customer point of view, it could be indeed interesting to know if the purchased service levels are effectively respected by the cloud provider. The innovative approach to performance profiling described in this work is based on the use of virtual performance counters, only recently made available by some hypervisors to their virtual machines, to implement guest-wide profiling. Although it isn't possible for the virtual machine to access Virtual Machine Monitor, with this method it is able to gather interesting informations to deduce the state of resource overcommittment of the virtualization host where it is executed. Tests have been carried out inside the compute nodes of FIWARE Genoa Node, an instance of a widely distributed federated community cloud, based on OpenStack and KVM. AgiLab-DITEN, the laboratory I belonged to and where I conducted my studies, together with TnT-Lab\u2013DITEN and CNIT-GE-Unit designed, installed and configured the whole Genoa Node, that was hosted on DITEN-UniGE equipment rooms. All the software measuring instruments, operating systems and programs used in this research are publicly available and free, and can be easily installed in a micro
instance of virtual machine, rapidly deployable also in public clouds
Recommended from our members
Enabling high-performance, mixed-signal approximate computing
textFor decades, the semiconductor industry enjoyed exponential improvements in microprocessor power and performance with the device scaling of successive technology generations. Scaling limitations at sub-micron technologies, however, have ceased to provide these historical performance improvements within a limited power budget. While device scaling provides a larger number of transistors per chip, for the same chip area, a growing percentage of the chip will have to be powered off at any given time due to power constraints. As such, the architecture community has focused on energy-efficient designs and is looking to specialized hardware to provide gains in performance. A focus on energy efficiency, along with increasingly less reliable transistors due to device scaling, has led to research in the area of approximate computing, where accuracy is traded for energy efficiency when precise computation is not required. There is a growing body of approximation-tolerant applications that, for example, compute on noisy or incomplete data, such as real-world sensor inputs, or make approximations to decrease the computation load in the analysis of cumbersome data sets. These approximation-tolerant applications span application domains, such as machine learning, image processing, robotics, and financial analysis, among others. Since the advent of the modern processor, computing models have largely presumed the attribute of accuracy. A willingness to relax accuracy requirements, however, with goal of gaining energy efficiency, warrants the re-investigation of the potential of analog computing. Analog hardware offers the opportunity for fast and low-power computation; however, it presents challenges in the form of accuracy. Where analog compute blocks have been applied to solve fixed-function problems, general-purpose computing has relied on digital hardware implementations that provide generality and programmability. The work presented in this thesis aims to answer the following questions: Can analog circuits be successfully integrated into general-purpose computing to provide performance and energy savings? And, what is required to address the historical analog challenges of inaccuracy, programmability, and a lack of generality to enable such an approach? This thesis work investigates a neural approach as a means to address the historical analog challenges of inaccuracy, programmability, and generality and to enable the use of analog circuits in general-purpose, high-performance computing. The first piece of this thesis work investigates the use of analog circuits at the microarchitecture level in the form of an analog neural branch predictor. The task of branch prediction can tolerate imprecision, as roll-back mechanisms correct for branch mispredictions, and application-level accuracy remains unaffected. We show that analog circuits enable the implementation of a highly-accurate, neural-prediction algorithm that is infeasible to implement in the digital domain. The second piece of this thesis work presents a neural accelerator that targets approximation-tolerant code. Analog neural acceleration provides application speedup of 3.3x and energy savings of 12.1x with a quality loss less than 10% for all except one approximation-tolerant benchmark. These results show that, using a neural approach, analog circuits can be applied to provide performance and energy efficiency in high-performance, general-purpose computing.Computer Science
- …