81,224 research outputs found
Efficient resources assignment schemes for clustered multithreaded processors
New feature sizes provide larger number of transistors per chip that architects could use in order to further exploit instruction level parallelism. However, these technologies bring also new challenges that complicate conventional monolithic processor designs. On the one hand, exploiting instruction level parallelism is leading us to diminishing returns and therefore exploiting other sources of parallelism like thread level parallelism is needed in order to keep raising performance with a reasonable hardware complexity. On the other hand, clustering architectures have been widely studied in order to reduce the inherent complexity of current monolithic processors. This paper studies the synergies and trade-offs between two concepts, clustering and simultaneous multithreading (SMT), in order to understand the reasons why conventional SMT resource assignment schemes are not so effective in clustered processors. These trade-offs are used to propose a novel resource assignment scheme that gets and average speed up of 17.6% versus Icount improving fairness in 24%.Peer ReviewedPostprint (published version
Frontend frequency-voltage adaptation for optimal energy-delay/sup 2/
In this paper, we present a clustered, multiple-clock domain (CMCD) microarchitecture that combines the benefits of both clustering and globally asynchronous locally synchronous (GALS) designs. We also present a mechanism for dynamically adapting the frequency and voltage of the frontend of the CMCD with the goal to optimize the energy-delay/sup 2/ product (ED2P). Our mechanism has minimal hardware cost, is entirely self-adjustable, does not depend on any thresholds, and achieves results close to optimal. We evaluate it on 16 SPEC 2000 applications and report 17.5% ED2P reduction on average (80% of the upper bound).Peer ReviewedPostprint (published version
The student evaluation of teaching and the competence of students as evaluators
When the college student satisfaction survey is considered in the promotion
and recognition of instructors, a usual complaint is related to the impact that
biased ratings have on the arithmetic mean (used as a measure of teaching
effectiveness). This is especially significant when the number of students
responding to the survey is small. In this work a new methodology, considering
student to student perceptions, is presented. Two different estimators of
student rating credibility, based on centrality properties of the student
social network, are proposed. This method is established on the idea that in
the case of on-site higher education, students often know which others are
competent in rating the teaching and learning process.Comment: 20 pages, 2 table
Control speculation for energy-efficient next-generation superscalar processors
Conventional front-end designs attempt to maximize the number of "in-flight" instructions in the pipeline. However, branch mispredictions cause the processor to fetch useless instructions that are eventually squashed, increasing front-end energy and issue queue utilization and, thus, wasting around 30 percent of the power dissipated by a processor. Furthermore, processor design trends lead to increasing clock frequencies by lengthening the pipeline, which puts more pressure on the branch prediction engine since branches take longer to be resolved. As next-generation high-performance processors become deeply pipelined, the amount of wasted energy due to misspeculated instructions will go up. The aim of this work is to reduce the energy consumption of misspeculated instructions. We propose selective throttling, which triggers different power-aware techniques (fetch throttling, decode throttling, or disabling the selection logic) depending on the branch prediction confidence level. Results show that combining fetch-bandwidth reduction along with select-logic disabling provides the best performance in terms of overall energy reduction and energy-delay product improvement (14 percent and 10 percent, respectively, for a processor with a 22-stage pipeline and 16 percent and 13 percent, respectively, for a processor with a 42-stage pipeline).Peer ReviewedPostprint (published version
Recommended from our members
Benson Snippets: Digitized Copies of Books from Latin American Collection Appear Online
Latin American Studie
Virtual-physical registers
A novel dynamic register renaming approach is proposed in this work. The key idea of the novel scheme is to delay the allocation of physical registers until a late stage in the pipeline, instead of doing it in the decode stage as conventional schemes do. In this way, the register pressure is reduced and the processor can exploit more instruction-level parallelism. Delaying the allocation of physical registers require some additional artifact to keep track of dependences. This is achieved by introducing the concept of virtual-physical registers, which do not require any storage location and are used to identify dependences among instructions that have not yet allocated a register to its destination operand. Two alternative allocation strategies have been investigated that differ in the stage where physical registers are allocated: issue or write-back. The experimental evaluation has confirmed the higher performance of the latter alternative. We have performed all evaluation of the novel scheme through a detailed simulation of a dynamically scheduled processor. The results show a significant improvement (e.g., 19% increase in IPC for a machine with 64 physical registers in each file) when compared with the traditional register renaming approach.Peer ReviewedPostprint (published version
Using MCD-DVS for dynamic thermal management performance improvement
With chip temperature being a major hurdle in microprocessor design, techniques to recover the performance loss due to thermal emergency mechanisms are crucial in order to sustain performance growth. Many techniques for power reduction in the past and some on thermal management more recently have contributed to alleviate this problem. Probably the most important thermal control technique is dynamic voltage and frequency scaling (DVS) which allows for almost cubic reduction in power with worst-case performance penalty only linear. So far, DVS techniques for temperature control have been studied at the chip level. Finer grain DVS is feasible if a globally-asynchronous locally-synchronous (GALS) design style is employed. GALS, also known as multiple-clock domain (MCD), allows for an independent voltage and frequency control for each one of the clock domains that are part of the chip. There are several studies on DVS for GALS that aim to improve energy and power efficiency but not temperature. This paper proposes and analyses the usage of DVS at the domain level to control temperature in a clustered MCD microarchitecture with the goal of improving the performance of applications that do not meet the thermal constraints imposed by the designers.Peer ReviewedPostprint (published version
Negotiation of meaning in outside of the classroom group assignments: accounting for the how to understand the what of future mathematics teachers' learning
In this paper we illustrate how Wenger’s theory of social learning can be used to account for phenomena of future teachers change in settings that are not usually studied, namely group work that future teachers do as they work on class assignments outside of class. We describe how we adapted Wenger’s theory to the exploration of future mathematics teachers’ learning and illustrate how the analysis of the audio taped interaction of a group of future teachers working out-side the classroom generated conjectures that help to explain their didactic knowledge development
X-ray/gamma-ray flux correlations in the BL Lacs Mrk 421 and 501 using HAWC data
The HAWC gamma ray observatory is located at the Sierra Negra Volcano in
Puebla, Mexico, at an altitude of 4,100 meters. HAWC is a wide field of view
array of 300 water Cherenkov detectors that are continuously surveying ~ 2sr of
the sky, operating since March 2015. The large collected data sample allows
HAWC to perform an unbiased monitoring of the BL Lac Mrk 421. This is the
closest and brightest known extragalactic high-synchrotron-peaked BL Lac in the
gamma-ray/X- ray bands and is extensively monitored by the Large Area Telescope
(LAT) on-board the Fermi satellite, and the BAT and XRT instruments of the
Swift satellite. In this work, we use 25 months of HAWC data together with
Swift-XRT data to characterize potential correlations between both wavelengths.
This analysis shows that HAWC and Swift-XRT data are correlated even stronger
than expected for quasi-simultaneous observations.Comment: Presented at the 35th International Cosmic Ray Conference (ICRC2017),
Bexco, Busan, Korea. See arXiv:1708.02572 for all HAWC contribution
- …