7 research outputs found
Predicting coherence communication by tracking synchronization points at run time.”
Abstract Predicting target processors that a coherence request must be delivered to can improve the miss handling latency in shared memory systems. In directory coherence protocols, directly communicating with the predicted processors avoids costly indirection to the directory. In snooping protocols, prediction relaxes the high bandwidth requirements by replacing broadcast with multicast. In this work, we propose a new run-time coherence target prediction scheme that exploits the inherent correlation between synchronization points in a program and coherence communication. Our workload-driven analysis shows that by exposing synchronization points to hardware and tracking them at run time, we can simply and effectively track stable and repetitive communication patterns. Based on this observation, we build a predictor that can improve the miss latency of a directory protocol by 13%. Compared with existing address-and instruction-based prediction techniques, our predictor achieves comparable performance using substantially smaller power and storage overheads
Progressive Hashing for Packet Processing Using Set Associative Memory
As the Internet grows, both the number of rules in packet filtering databases and the number of prefixes in IP lookup tables inside the router are growing. The packet processing engine is a critical part of the Internet router as it is used to perform packet forwarding (PF) and packet classification (PC). In both applications, processing has to be at wire speed. It is common to use hash-based schemes in packet processing engines; however, the downside of classic hashing techniques such as overflow and worst case memory access time, has to be dealt with. Implementing hash tables using set associative memory has the property that each bucket of a hash table can be searched in one memory cycle outperforming the conventional Ternary CAMs in terms of power and scalability. In this paper we present “Progressive Hashing ” (PH), a general open addressing hash-based packet processing scheme for Internet routers using the set associative memory architecture. Our scheme is an extension of the multiple hashing scheme and is amendable to high-performance hardware implementation with low overflow and low memory access latency. We show by experimenting with real IP lookup tables and synthetic packet filtering databases that PH reduces the overflow over the multiple hashing. The proposed PH processing engine is estimated to achieve an average processing speed of 160 Gbps for the PC application and 320 Gbps for the PF application
TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation
The slow speed of conventional execution-driven architecture simulators is a serious impediment to obtaining desirable research productivity. This paper proposes and evaluates a fast manycore processor simulation framework called Two-Phase Trace-driven Simulation(TPTS), which splits detailed timing simulation into a trace generation phase and a trace simulation phase. Much of the simulation overhead caused by uninteresting architectural events is only incurred once during the trace generation phase and can be omitted in the repeated trace-driven simulations. We design and implement tsim, an event-driven manycore processor simulator that models detailed memory hierarchy, interconnect, and coherence protocol models based on the proposed TPTS framework. By applying aggressive event filtering, tsim achieves an impressive simulation speed of 146 MIPS, when running 16-thread parallel applications. 1