26,155 research outputs found
Control speculation for energy-efficient next-generation superscalar processors
Conventional front-end designs attempt to maximize the number of "in-flight" instructions in the pipeline. However, branch mispredictions cause the processor to fetch useless instructions that are eventually squashed, increasing front-end energy and issue queue utilization and, thus, wasting around 30 percent of the power dissipated by a processor. Furthermore, processor design trends lead to increasing clock frequencies by lengthening the pipeline, which puts more pressure on the branch prediction engine since branches take longer to be resolved. As next-generation high-performance processors become deeply pipelined, the amount of wasted energy due to misspeculated instructions will go up. The aim of this work is to reduce the energy consumption of misspeculated instructions. We propose selective throttling, which triggers different power-aware techniques (fetch throttling, decode throttling, or disabling the selection logic) depending on the branch prediction confidence level. Results show that combining fetch-bandwidth reduction along with select-logic disabling provides the best performance in terms of overall energy reduction and energy-delay product improvement (14 percent and 10 percent, respectively, for a processor with a 22-stage pipeline and 16 percent and 13 percent, respectively, for a processor with a 42-stage pipeline).Peer ReviewedPostprint (published version
Software trace cache
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance; the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized, codes have some special characteristics that make them more amenable for high-performance instruction fetch. They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.Peer ReviewedPostprint (published version
A hybrid neuro--wavelet predictor for QoS control and stability
For distributed systems to properly react to peaks of requests, their
adaptation activities would benefit from the estimation of the amount of
requests. This paper proposes a solution to produce a short-term forecast based
on data characterising user behaviour of online services. We use \emph{wavelet
analysis}, providing compression and denoising on the observed time series of
the amount of past user requests; and a \emph{recurrent neural network} trained
with observed data and designed so as to provide well-timed estimations of
future requests. The said ensemble has the ability to predict the amount of
future user requests with a root mean squared error below 0.06\%. Thanks to
prediction, advance resource provision can be performed for the duration of a
request peak and for just the right amount of resources, hence avoiding
over-provisioning and associated costs. Moreover, reliable provision lets users
enjoy a level of availability of services unaffected by load variations
Magnetic field experiment for Voyagers 1 and 2
The magnetic field experiment to be carried on the Voyager 1 and 2 missions consists of dual low field (LFM) and high field magnetometer (HFM) systems. The dual systems provide greater reliability and, in the case of the LFM's, permit the separation of spacecraft magnetic fields from the ambient fields. Additional reliability is achieved through electronics redundancy. The wide dynamic ranges of plus or minus 0.5G for the LFM's and plus or minus 20G for the HFM's, low quantization uncertainty of plus or minus 0.002 gamma in the most sensitive (plus or minus 8 gamma) LFM range, low sensor RMS noise level of 0.006 gamma, and use of data compaction schemes to optimize the experiment information rate all combine to permit the study of a broad spectrum of phenomena during the mission. Planetary fields at Jupiter, Saturn, and possibly Uranus; satellites of these planets; solar wind and satellite interactions with the planetary fields; and the large-scale structure and microscale characteristics of the interplanetary magnetic field are studied. The interstellar field may also be measured
A marker of biological ageing predicts adult risk preference in European starlings, Sturnus vulgaris
Why are some individuals more prone to gamble than others? Animals often show preferences between 2 foraging options with the same mean reward but different degrees of variability in the reward, and such risk preferences vary between individuals. Previous attempts to explain variation in risk preference have focused on energy budgets, but with limited empirical support. Here, we consider whether biological ageing, which affects mortality and residual reproductive value, predicts risk preference. We studied a cohort of European starlings (Sturnus vulgaris) in which we had previously measured developmental erythrocyte telomere attrition, an established integrative biomarker of biological ageing. We measured the adult birds’ preferences when choosing between a fixed amount of food and a variable amount with an equal mean. After controlling for change in body weight during the experiment (a proxy for energy budget), we found that birds that had undergone greater developmental telomere attrition were more risk averse as adults than were those whose telomeres had shortened less as nestlings. Developmental telomere attrition was a better predictor of adult risk preference than either juvenile telomere length or early-life food supply and begging effort. Our longitudinal study thus demonstrates that biological ageing, as measured via developmental telomere attrition, is an important source of lasting differences in adult risk preferences
Asynchronous Data Processing Platforms for Energy Efficiency, Performance, and Scalability
The global technology revolution is changing the integrated circuit industry from the one driven by performance to the one driven by energy, scalability and more-balanced design goals. Without clock-related issues, asynchronous circuits enable further design tradeoffs and in operation adaptive adjustments for energy efficiency. This dissertation work presents the design methodology of the asynchronous circuit using NULL Convention Logic (NCL) and multi-threshold CMOS techniques for energy efficiency and throughput optimization in digital signal processing circuits. Parallel homogeneous and heterogeneous platforms implementing adaptive dynamic voltage scaling (DVS) based on the observation of system fullness and workload prediction are developed for balanced control of the performance and energy efficiency. Datapath control logic with NULL Cycle Reduction (NCR) and arbitration network are incorporated in the heterogeneous platform for large scale cascading. The platforms have been integrated with the data processing units using the IBM 130 nm 8RF process and fabricated using the MITLL 90 nm FDSOI process. Simulation and physical testing results show the energy efficiency advantage of asynchronous designs and the effective of the adaptive DVS mechanism in balancing the energy and performance in both platforms
Recommended from our members
Evidence for the contribution of COMT gene Val158/108Met polymorphism (rs4680) to working memory training-related prefrontal plasticity.
BackgroundGenetic factors have been suggested to affect the efficacy of working memory training. However, few studies have attempted to identify the relevant genes.MethodsIn this study, we first performed a randomized controlled trial (RCT) to identify brain regions that were specifically affected by working memory training. Sixty undergraduate students were randomly assigned to either the adaptive training group (N = 30) or the active control group (N = 30). Both groups were trained for 20 sessions during 4 weeks and received fMRI scans before and after the training. Afterward, we combined the data from the 30 participants in the RCT study who received adaptive training with data from 71 additional participants who also received the same adaptive training but were not part of the RCT study (total N = 101) to test the contribution of the COMT Val158/108Met polymorphism to the interindividual difference in the training effect within the identified brain regions.ResultsIn the RCT study, we found that the adaptive training significantly decreased brain activation in the left prefrontal cortex (TFCE-FWE corrected p = .030). In the genetic study, we found that compared with the Val allele homozygotes, the Met allele carriers' brain activation decreased more after the training at the left prefrontal cortex (TFCE-FWE corrected p = .025).ConclusionsThis study provided evidence for the neural effect of a visual-spatial span training and suggested that genetic factors such as the COMT Val158/108Met polymorphism may have to be considered in future studies of such training
Aerospace Medicine and Biology. A continuing bibliography with indexes
This bibliography lists 244 reports, articles, and other documents introduced into the NASA scientific and technical information system in February 1981. Aerospace medicine and aerobiology topics are included. Listings for physiological factors, astronaut performance, control theory, artificial intelligence, and cybernetics are included
A classification of techniques for the compensation of time delayed processes. Part 2: Structurally optimised controllers
Following on from Part 1, Part 2 of the paper considers the use of structurally optimised controllers to compensate time delayed processes
- …