Search CORE

204 research outputs found

Synchronized progress in interconnection networks (SPIN), a new approach to address deadlocks

Author: Gratz Paul V.
Publication venue: Barcelona Supercomputing Center
Publication date: 01/01/2019
Field of study

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Method and apparatus for congestion-aware routing in a computer interconnection network

Author: Boris Robert Grot
Paul Gratz
Stephen W. Keckler
Publication venue: United States Patent and Trademark Office
Publication date: 07/04/2014
Field of study

The present disclosure relates to an example of a method for a first router to adaptively determine status within a network. The network may include the first router, a second router and a third router. The method for the first router may comprise determining status information regarding the second router located in the network, and transmitting the status information to the third router located in the network. The second router and the third router may be indirectly coupled to one another.Board of Regents, University of Texas Syste

Texas ScholarWorks

Machine Learning for Microprocessor Performance Bug Localization

Author: Barboza Erick Carvajal
Gratz Paul
Hu Jiang
Ketkar Mahesh
Kishinevsky Michael
Publication venue
Publication date: 27/03/2023
Field of study

The validation process for microprocessors is a very complex task that consumes substantial engineering time during the design process. Bugs that degrade overall system performance, without affecting its functional correctness, are particularly difficult to debug given the lack of a golden reference for bug-free performance. This work introduces two automated performance bug localization methodologies based on machine learning that aims to aid the debugging process. Our results show that, the evaluated microprocessor core performance bugs whose average IPC impact is greater than 1%, our best-performing technique is able to localize the exact microarchitectural unit of the bug

\sim

77\% of the time, while achieving a top-3 unit accuracy (out of 11 possible locations) of over 90% for bugs with the same average IPC impact. The proposed system in our simulation setup requires only a few seconds to perform a bug location inference, which leads to a reduced debugging time.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Page size aware cache prefetching

Author: Casas Guix Marc
Chacon Gino
Gratz Paul V.
Jiménez Daniel A.
Vavouliotis Georgios
Álvarez Martí Lluc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system per- formance due to the disparity between processor and memory speeds. Prefetching data blocks into the cache hierarchy ahead of demand accesses has proven successful at attenuating this bottleneck. However, spatial cache prefetchers operating in the physical address space leave significant performance on the table by limiting their pattern detection within 4KB physical page boundaries when modern systems use page sizes larger than 4KB to mitigate the address translation overheads. This paper exploits the high usage of large pages in modern systems to increase the effectiveness of spatial cache prefetch- ing. We design and propose the Page-size Propagation Module (PPM), a µarchitectural scheme that propagates the page size information to the lower-level cache prefetchers, enabling safe prefetching beyond 4KB physical page boundaries when the accessed blocks reside in large pages, at the cost of augmenting the first-level caches’ Miss Status Holding Register (MSHR) entries with one additional bit. PPM is compatible with any cache prefetcher without implying design modifications. We capitalize on PPM’s benefits by designing a module that consists of two page size aware prefetchers that inherently use different page sizes to drive prefetching. The composite module uses adaptive logic to dynamically enable the most appropriate page size aware prefetcher. Finally, we show that the proposed designs are transparent to which cache prefetcher is used. We apply the proposed page size exploitation techniques to four state-of-the-art spatial cache prefetchers. Our evalua- tion shows that our proposals improve single-core geomean performance by up to 8.1% (2.1% at minimum) over the original implementation of the considered prefetchers, across 80 memory-intensive workloads. In multi-core contexts, we report geomean speedups up to 7.7% across different cache prefetchers and core configurations.This work is supported by the Spanish Ministry of Science and Technology through the PID2019-107255GB project, the Generalitat de Catalunya (contract 2017-SGR-1414), the European Union Horizon 2020 research and innovation program under grant agreement No 955606 (DEEP-SEA EU project), the National Science Foundation through grants CNS-1938064 and CCF-1912617, and the Semiconductor Research Corporation project GRC 2936.001. Georgios Vavouliotis has been supported by the Spanish Ministry of Economy, Industry, and Competitiveness and the European Social Fund under the FPI fellowship No. PRE2018-087046. Marc Casas has been partially supported by the Grant RYC2017-23269 funded by MCIN/AEI/10.13039/501100011033 and ESF ‘Investing in your future’.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Accumulation of promutagenic DNA adducts in the mouse distal colon after consumption of heme does not induce colonic neoplasms in the western diet model of spontaneous colorectal cancer

Author: Conlon Michael A
Gratz Silvia W
Hu Ying
Le Leu Richard Kevin
Winter Jean
Young Graeme Paul
Publication venue: 'Wiley'
Publication date: 01/10/2013
Field of study

Author version made available in accordance with Publisher copyright policy.Scope: Red meat is considered a risk factor for colorectal cancer (CRC). Heme is considered to promote colonic hyperproliferation and cell damage. Resistant starch (RS) is a food that ferments in the colon with studies demonstrating protective effects against CRC. By utilizing the western diet model of spontaneous CRC, we determined if feeding heme (as hemin chloride) equivalent to a high red meat diet would increase colonic DNA adducts and CRC and whether RS could abrogate such effects. Methods and results: Four groups of mice: control, heme, RS and heme + RS were fed diets for 1 or 18 months. Colons were analyzed for apoptosis, proliferation, DNA adducts “8-hydroxy-2-deoxyguanosine” and “O6-methyl-2-deoxyguanosine” (O6MeG), and neoplasms. In the short term, heme increased cell proliferation (p < 0.05). Changes from 1 to 18 months showed increased cell proliferation (p<0.01) and 8-hydroxy-2-deoxyguanosine adducts (p < 0.05) in all groups, but only heme-fed mice showed reduced apoptosis (p < 0.01) and increasedO6MeGadducts (p<0.01). The incidence of colon neoplasms was not different between any interventions. Conclusion: We identified heme to increase proliferation in the short term, inhibit apoptosis over the long term, and increase O6MeG adducts in the colon over time although these changes did not affect colonic neoplasms within this mouse model.Funding for this project was provided by the National Health and Medical Research Council of Australia (Project number 535079).We would like to acknowledge the Royal Society of Edinburgh for funding a visit for Dr. SilviaGratz fromUKto Australia to carry out work associated with this project

Flinders Academic Commons

Spatial Locality Speculation to Reduce Energy in Chip-Multiprocessor Networks-on-Chip

Author: Gratz Paul V.
Grot Boris
Jimenez Daniel A.
Kim Hyungjun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2014
Field of study

As processor chips become increasingly parallel, an efficient communication substrate is critical for meeting performance and energy targets. In this work, we target the root cause of network energy consumption through techniques that reduce link and router-level switching activity. We specifically focus on memory subsystem traffic, as it comprises the bulk of NoC load in a CMP. By transmitting only the flits that contain words predicted useful using a novel spatial locality predictor, our scheme seeks to reduce network activity. We aim to further lower NoC energy through microarchitectural mechanisms that inhibit datapath switching activity for unused words in individual flits. Using simulation-based performance studies and detailed energy models based on synthesized router designs and different link wire types, we show that 1) the prediction mechanism achieves very high accuracy, with an average rate of false-unused prediction of just 2.5 percent; 2) the combined NoC energy savings enabled by the predictor and microarchitectural support is 36 percent, on average, and up to 57 percent in the best case; and 3) there is no system performance penalty as a result of this technique

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX