Search CORE

1,273 research outputs found

Quantitative performance evaluation of SCI memory hierarchies

Author: Hexsel Roberto A.
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Backscatter from the Data Plane --- Threats to Stability and Security in Information-Centric Networking

Author: Schmidt Thomas C.
Vahlenkamp Markus
Wählisch Matthias
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Information-centric networking proposals attract much attention in the ongoing search for a future communication paradigm of the Internet. Replacing the host-to-host connectivity by a data-oriented publish/subscribe service eases content distribution and authentication by concept, while eliminating threats from unwanted traffic at an end host as are common in today's Internet. However, current approaches to content routing heavily rely on data-driven protocol events and thereby introduce a strong coupling of the control to the data plane in the underlying routing infrastructure. In this paper, threats to the stability and security of the content distribution system are analyzed in theory and practical experiments. We derive relations between state resources and the performance of routers and demonstrate how this coupling can be misused in practice. We discuss new attack vectors present in its current state of development, as well as possibilities and limitations to mitigate them.Comment: 15 page

arXiv.org e-Print Archive

REPOSIT

Discriminative Coherence: Balancing Performance and Latency Bounds in Data-Sharing Multi-Core Real-Time Systems

Author: Hassan Mohamed
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems

Author: Horn Webb H
Sanchez Daniel
Zhang Guowei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2015
Field of study

We present Coup, a technique to lower the cost of updates to shared data in cache-coherent systems. Coup exploits the insight that many update operations, such as additions and bitwise logical operations, are commutative: they produce the same final result regardless of the order they are performed in. Coup allows multiple private caches to simultaneously hold update-only permission to the same cache line. Caches with update-only permission can locally buffer and coalesce updates to the line, but cannot satisfy read requests. Upon a read request, Coup reduces the partial updates buffered in private caches to produce the final value. Coup integrates seamlessly into existing coherence protocols, requires inexpensive hardware, and does not affect the memory consistency model. We apply Coup to speed up single-word updates to shared data. On a simulated 128-core, 8-socket system, Coup accelerates state-of-the-art implementations of update-heavy algorithms by up to 2.4×.Center for Future Architectures ResearchNational Science Foundation (U.S.) (CAREER-1452994)Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Grier Presidential Fellowship)Microelectronics Advanced Research CorporationUnited States. Defense Advanced Research Projects Agenc

DSpace@MIT

Crossref

Efficient techniques to provide scalability for token-based cache coherence protocols

Author: Cuesta Sáez Blas Antonio
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 17/07/2009
Field of study

Cache coherence protocols based on tokens can provide low latency without relying on non-scalable interconnects thanks to the use of efficient requests that are unordered. However, when these unordered requests contend for the same memory block, they may cause protocols races. To resolve the races and ensure the completion of all the cache misses, token protocols use a starvation prevention mechanism that is inefficient and non-scalable in terms of required storage structures and generated traffic. Besides, token protocols use non-silent invalidations which increase the latency of write misses proportionally to the system size. All these problems make token protocols non-scalable. To overcome the main problems of token protocols and increase their scalability, we propose a new starvation prevention mechanism named Priority Requests. This mechanism resolves contention by an efficient, elegant, and flexible method based on ordered requests. Furthermore, thanks to Priority Requests, efficient techniques can be applied to limit the storage requirements of the starvation prevention mechanism, to reduce the total traffic generated for managing protocol races, and to reduce the latency of write misses. Thus, the main problems of token protocols can be solved, which, in turn, contributes to wide their efficiency and scalability.Cuesta Sáez, BA. (2009). Efficient techniques to provide scalability for token-based cache coherence protocols [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6024Palanci

Crossref

RiuNet

Flowtune: Flowlet Control for Datacenter Networks

Author: Balakrishnan Hari
Perry Jonathan
Shah Devavrat
Publication venue
Publication date: 15/08/2016
Field of study

Rapid convergence to a desired allocation of network resources to endpoint traffic has been a long-standing challenge for packet-switched networks. The reason for this is that congestion control decisions are distributed across the endpoints, which vary their offered load in response to changes in application demand and network feedback on a packet-by-packet basis. We propose a different approach for datacenter networks, flowlet control, in which congestion control decisions are made at the granularity of a flowlet, not a packet. With flowlet control, allocations have to change only when flowlets arrive or leave. We have implemented this idea in a system called Flowtune using a centralized allocator that receives flowlet start and end notifications from endpoints. The allocator computes optimal rates using a new, fast method for network utility maximization, and updates endpoint congestion-control parameters. Experiments show that Flowtune outperforms DCTCP, pFabric, sfqCoDel, and XCP on tail packet delays in various settings, converging to optimal rates within a few packets rather than over several RTTs. Our implementation of Flowtune handles 10.4x more throughput per core and scales to 8x more cores than Fastpass, for an 83-fold throughput gain

DSpace@MIT

Parallel and Distributed Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

Directory of Open Access Books (DOAB)

Doctor of Philosophy

Author: Awasthi Manu
Publication venue: University of Utah
Publication date: 01/12/2014
Field of study

dissertationIn recent years, a number of trends have started to emerge, both in microprocessor and application characteristics. As per Moore's law, the number of cores on chip will keep doubling every 18-24 months. International Technology Roadmap for Semiconductors (ITRS) reports that wires will continue to scale poorly, exacerbating the cost of on-chip communication. Cores will have to navigate an on-chip network to access data that may be scattered across many cache banks. The number of pins on the package, and hence available off-chip bandwidth, will at best increase at sublinear rate and at worst, stagnate. A number of disruptive memory technologies, e.g., phase change memory (PCM) have begun to emerge and will be integrated into the memory hierarchy sooner than later, leading to non-uniform memory access (NUMA) hierarchies. This will make the cost of accessing main memory even higher. In previous years, most of the focus has been on deciding the memory hierarchy level where data must be placed (L1 or L2 caches, main memory, disk, etc.). However, in modern and future generations, each level is getting bigger and its design is being subjected to a number of constraints (wire delays, power budget, etc.). It is becoming very important to make an intelligent decision about where data must be placed within a level. For example, in a large non-uniform access cache (NUCA), we must figure out the optimal bank. Similarly, in a multi-dual inline memory module (DIMM) non uniform memory access (NUMA) main memory, we must figure out the DIMM that is the optimal home for every data page. Studies have indicated that heterogeneous main memory hierarchies that incorporate multiple memory technologies are on the horizon. We must develop solutions for data management that take heterogeneity into account. For these memory organizations, we must again identify the appropriate home for data. In this dissertation, we attempt to verify the following thesis statement: "Can low-complexity hardware and OS mechanisms manage data placement within each memory hierarchy level to optimize metrics such as performance and/or throughput?" In this dissertation we argue for a hardware-software codesign approach to tackle the above mentioned problems at different levels of the memory hierarchy. The proposed methods utilize techniques like page coloring and shadow addresses and are able to handle a large number of problems ranging from managing wire-delays in large, shared NUCA caches to distributing shared capacity among different cores. We then examine data-placement issues in NUMA main memory for a many-core processor with a moderate number of on-chip memory controllers. Using codesign approaches, we achieve efficient data placement by modifying the operating system's (OS) page allocation algorithm for a wide variety of main memory architectures

The University of Utah: J. Willard Marriott Digital Library

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen