863 research outputs found

    Cycle-accurate evaluation of reconfigurable photonic networks-on-chip

    Get PDF
    There is little doubt that the most important limiting factors of the performance of next-generation Chip Multiprocessors (CMPs) will be the power efficiency and the available communication speed between cores. Photonic Networks-on-Chip (NoCs) have been suggested as a viable route to relieve the off- and on-chip interconnection bottleneck. Low-loss integrated optical waveguides can transport very high-speed data signals over longer distances as compared to on-chip electrical signaling. In addition, with the development of silicon microrings, photonic switches can be integrated to route signals in a data-transparent way. Although several photonic NoC proposals exist, their use is often limited to the communication of large data messages due to a relatively long set-up time of the photonic channels. In this work, we evaluate a reconfigurable photonic NoC in which the topology is adapted automatically (on a microsecond scale) to the evolving traffic situation by use of silicon microrings. To evaluate this system's performance, the proposed architecture has been implemented in a detailed full-system cycle-accurate simulator which is capable of generating realistic workloads and traffic patterns. In addition, a model was developed to estimate the power consumption of the full interconnection network which was compared with other photonic and electrical NoC solutions. We find that our proposed network architecture significantly lowers the average memory access latency (35% reduction) while only generating a modest increase in power consumption (20%), compared to a conventional concentrated mesh electrical signaling approach. When comparing our solution to high-speed circuit-switched photonic NoCs, long photonic channel set-up times can be tolerated which makes our approach directly applicable to current shared-memory CMPs

    Cost-aware caching: optimizing cache provisioning and object placement in ICN

    Full text link
    Caching is frequently used by Internet Service Providers as a viable technique to reduce the latency perceived by end users, while jointly offloading network traffic. While the cache hit-ratio is generally considered in the literature as the dominant performance metric for such type of systems, in this paper we argue that a critical missing piece has so far been neglected. Adopting a radically different perspective, in this paper we explicitly account for the cost of content retrieval, i.e. the cost associated to the external bandwidth needed by an ISP to retrieve the contents requested by its customers. Interestingly, we discover that classical cache provisioning techniques that maximize cache efficiency (i.e., the hit-ratio), lead to suboptimal solutions with higher overall cost. To show this mismatch, we propose two optimization models that either minimize the overall costs or maximize the hit-ratio, jointly providing cache sizing, object placement and path selection. We formulate a polynomial-time greedy algorithm to solve the two problems and analytically prove its optimality. We provide numerical results and show that significant cost savings are attainable via a cost-aware design

    Subversion Over OpenNetInf and CCNx

    Get PDF
    We describe experiences and insights from adapting the Subversion version control system to use the network service of two information-centric networking (ICN) prototypes: OpenNetInf and CCNx. The evaluation is done using a local collaboration scenario, common in our own project work where a group of people meet and share documents through a Subversion repository. The measurements show a performance benefit already with two clients in some of the studied scenarios, despite being done on un-optimised research prototypes. The conclusion is that ICN clearly is beneficial also for non mass-distribution applications. It was straightforward to adapt Subversion to fetch updated files from the repository using the ICN network service. The adaptation however neglected access control which will need a different approach in ICN than an authenticated SSL tunnel. Another insight from the experiments is that care needs to be taken when implementing the heavy ICN hash and signature calculations. In the prototypes, these are done serially, but we see an opportunity for parallelisation, making use of current multi-core processors

    Software and hardware methods for memory access latency reduction on ILP processors

    Get PDF
    While microprocessors have doubled their speed every 18 months, performance improvement of memory systems has continued to lag behind. to address the speed gap between CPU and memory, a standard multi-level caching organization has been built for fast data accesses before the data have to be accessed in DRAM core. The existence of these caches in a computer system, such as L1, L2, L3, and DRAM row buffers, does not mean that data locality will be automatically exploited. The effective use of the memory hierarchy mainly depends on how data are allocated and how memory accesses are scheduled. In this dissertation, we propose several novel software and hardware techniques to effectively exploit the data locality and to significantly reduce memory access latency.;We first presented a case study at the application level that reconstructs memory-intensive programs by utilizing program-specific knowledge. The problem of bit-reversals, a set of data reordering operations extensively used in scientific computing program such as FFT, and an application with a special data access pattern that can cause severe cache conflicts, is identified in this study. We have proposed several software methods, including padding and blocking, to restructure the program to reduce those conflicts. Our methods outperform existing ones on both uniprocessor and multiprocessor systems.;The access latency to DRAM core has become increasingly long relative to CPU speed, causing memory accesses to be an execution bottleneck. In order to reduce the frequency of DRAM core accesses to effectively shorten the overall memory access latency, we have conducted three studies at this level of memory hierarchy. First, motivated by our evaluation of DRAM row buffer\u27s performance roles and our findings of the reasons of its access conflicts, we propose a simple and effective memory interleaving scheme to reduce or even eliminate row buffer conflicts. Second, we propose a fine-grain priority scheduling scheme to reorder the sequence of data accesses on multi-channel memory systems, effectively exploiting the available bus bandwidth and access concurrency. In the final part of the dissertation, we first evaluate the design of cached DRAM and its organization alternatives associated with ILP processors. We then propose a new memory hierarchy integration that uses cached DRAM to construct a very large off-chip cache. We show that this structure outperforms a standard memory system with an off-level L3 cache for memory-intensive applications.;Memory access latency has become a major performance bottleneck for memory-intensive applications. as long as DRAM technology remains its most cost-effective position for making main memory, the memory performance problem will continue to exist. The studies conducted in this dissertation attempt to address this important issue. Our proposed software and hardware schemes are effective and applicable, which can be directly used in real-world memory system designs and implementations. Our studies also provide guidance for application programmers to understand memory performance implications, and for system architects to optimize memory hierarchies

    Distributed Selfish Coaching

    Full text link
    Although cooperation generally increases the amount of resources available to a community of nodes, thus improving individual and collective performance, it also allows for the appearance of potential mistreatment problems through the exposition of one node's resources to others. We study such concerns by considering a group of independent, rational, self-aware nodes that cooperate using on-line caching algorithms, where the exposed resource is the storage at each node. Motivated by content networking applications -- including web caching, CDNs, and P2P -- this paper extends our previous work on the on-line version of the problem, which was conducted under a game-theoretic framework, and limited to object replication. We identify and investigate two causes of mistreatment: (1) cache state interactions (due to the cooperative servicing of requests) and (2) the adoption of a common scheme for cache management policies. Using analytic models, numerical solutions of these models, as well as simulation experiments, we show that on-line cooperation schemes using caching are fairly robust to mistreatment caused by state interactions. To appear in a substantial manner, the interaction through the exchange of miss-streams has to be very intense, making it feasible for the mistreated nodes to detect and react to exploitation. This robustness ceases to exist when nodes fetch and store objects in response to remote requests, i.e., when they operate as Level-2 caches (or proxies) for other nodes. Regarding mistreatment due to a common scheme, we show that this can easily take place when the "outlier" characteristics of some of the nodes get overlooked. This finding underscores the importance of allowing cooperative caching nodes the flexibility of choosing from a diverse set of schemes to fit the peculiarities of individual nodes. To that end, we outline an emulation-based framework for the development of mistreatment-resilient distributed selfish caching schemes. Our framework utilizes a simple control-theoretic approach to dynamically parameterize the cache management scheme. We show performance evaluation results that quantify the benefits from instantiating such a framework, which could be substantial under skewed demand profiles.National Science Foundation (CNS Cybertrust 0524477, CNS NeTS 0520166, CNS ITR 0205294, EIA RI 0202067); EU IST (CASCADAS and E-NEXT); Marie Curie Outgoing International Fellowship of the EU (MOIF-CT-2005-007230

    NDN content store and caching policies: performance evaluation

    Get PDF
    Among various factors contributing to performance of named data networking (NDN), the organization of caching is a key factor and has benefited from intense studies by the networking research community. The performed studies aimed at (1) finding the best strategy to adopt for content caching; (2) specifying the best location, and number of content stores (CS) in the network; and (3) defining the best cache replacement policy. Accessing and comparing the performance of the proposed solutions is as essential as the development of the proposals themselves. The present work aims at evaluating and comparing the behavior of four caching policies (i.e., random, least recently used (LRU), least frequently used (LFU), and first in first out (FIFO)) applied to NDN. Several network scenarios are used for simulation (2 topologies, varying the percentage of nodes of the content stores (5–100), 1 and 10 producers, 32 and 41 consumers). Five metrics are considered for the performance evaluation: cache hit ratio (CHR), network traffic, retrieval delay, interest re-transmissions, and the number of upstream hops. The content request follows the Zipf–Mandelbrot distribution (with skewness factor α=1.1 and α=0.75). LFU presents better performance in all considered metrics, except on the NDN testbed, with 41 consumers, 1 producer and a content request rate of 100 packets/s. For the level of content store from 50% to 100%, LRU presents a notably higher performance. Although the network behavior is similar for both skewness factors, when α=0.75, the CHR is significantly reduced, as expected.This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020
    • …
    corecore