20 research outputs found

    Hierarchical Content Stores in High-speed ICN Routers: Emulation and Prototype Implementation

    Get PDF
    Recent work motivates the design of Information-centric rou-ters that make use of hierarchies of memory to jointly scale in the size and speed of content stores. The present paper advances this understanding by (i) instantiating a general purpose two-layer packet-level caching system, (ii) investigating the solution design space via emulation, and (iii) introducing a proof-of-concept prototype. The emulation-based study reveals insights about the broad design space, the expected impact of workload, and gains due to multi-threaded execution. The full-blown system prototype experimentally confirms that, by exploiting both DRAM and SSD memory technologies, ICN routers can sustain cache operations in excess of 10Gbps running on off-the-shelf hardware

    Paralelizando unidades de cache hierárquicas para roteadores ICN

    Get PDF
    Um desafio fundamental em ICN (do inglês Information-Centric Networking) é desenvolver Content Stores (ou seja, unidades de cache) que satisfaçam três requisitos: espaço de armazenamento grande, velocidade de operação rápida e custo acessível. A chamada Hierarchical Content Store (HCS) é uma abordagem promissora para atender a esses requisitos. Ela explora a correlação temporal entre requisições para prever futuras solicitações. Por exemplo, assume-se que um usuário que solicita o primeiro minuto de um filme também solicitará o segundo minuto. Teoricamente, essa premissa permitiria transferir proativamente conteúdos de uma área de cache relativamente grande, mas lenta (Layer 2 - L2), para uma área de cache mais rápida, porém menor (Layer 1 - L1). A estrutura hierárquica tem potencial para incrementar o desempenho da CS em uma ordem de grandeza tanto em termos de vazão como de tamanho, mantendo o custo. Contudo, o desenvolvimento de HCS apresenta diversos desafios práticos. É necessário acoplar as hierarquias de memória L2 e L1 considerando as suas taxas de transferência e tamanhos, que dependem tanto de aspectos de hardware (por exemplo, taxa de leitura da L2, uso de múltiplos SSD físicos em paralelo, velocidade de barramento, etc.), como de software (por exemplo, controlador do SSD, gerenciamento de memória, etc.). Nesse contexto, esta tese apresenta duas contribuições principais. Primeiramente, é proposta uma arquitetura para superar os gargalos inerentes ao sistema através da paralelização de múltiplas HCS. Em resumo, o esquema proposto supera desafios inerentes à concorrência (especificamente, sincronismo) através do particionamento determinístico das requisições de conteúdos entre múltiplas threads. Em segundo lugar, é proposta uma metodologia para investigar o desenvolvimento de HCS explorando técnicas de emulação e modelagem analítica conjuntamente. A metodologia proposta apresenta vantagens em relação a metodologias baseadas em prototipação e simulação. A L2 é emulada para viabilizar a investigação de uma variedade de cenários de contorno (tanto em termos de hardware como de software) maior do que seria possível através de prototipação (considerando as tecnologias atuais). Além disso, a emulação emprega código real de um protótipo para os outros componentes do HCS (por exemplo L1, gerência das camadas e API) para fornecer resultados mais realistas do que seriam obtidos através de simulação.A key challenge in Information Centric Networking (ICN) is to develop cache units (also called Content Store - CS) that meet three requirements: large storage space, fast operation, and affordable cost. The so-called HCS (Hierarchical Content Store) is a promising approach to satisfy these requirements jointly. It explores the correlation between content requests to predict future demands. Theoretically, this idea would enable proactively content transfers from a relatively large but slow cache area (Layer 2 - L2) to a faster but smaller cache area (Layer 1 - L1). Thereby, it would be possible to increase the throughput and size of CS in one order of magnitude, while keeping the cost. However, the development of HCS introduces several practical challenges. HCS requires a careful coupling of L2 and L1 memory levels considering their transfer rates and sizes. This requirement depends on both hardware specifications (e.g., read rate L2, use of multiple physical SSD in parallel, bus speed, etc.), and software aspects (e.g., the SSD controller, memory management, etc.). In this context, this thesis presents two main contributions. First, we propose an architecture for overcoming the HCS bottlenecks by parallelizing multiple HCS. In summary, the proposed scheme overcomes racing condition related challenges through deterministic partitioning of content requests among multiple threads. Second, we propose a methodology to investigate the development of HCS exploiting emulation techniques and analytical modeling jointly. The proposed methodology offers advantages over prototyping and simulation-based methods. We emulate the L2 to enable the investigation of a variety of boundary scenarios that are richer (regarding both hardware and software aspects) than would be possible through prototyping (considering current technologies). Moreover, the emulation employs real code from a prototype for the other components of the HCS (e.g., L1, layers management and API) to provide more realistic results than would be obtained through simulation

    On the design of efficient caching systems

    Get PDF
    Content distribution is currently the prevalent Internet use case, accounting for the majority of global Internet traffic and growing exponentially. There is general consensus that the most effective method to deal with the large amount of content demand is through the deployment of massively distributed caching infrastructures as the means to localise content delivery traffic. Solutions based on caching have been already widely deployed through Content Delivery Networks. Ubiquitous caching is also a fundamental aspect of the emerging Information-Centric Networking paradigm which aims to rethink the current Internet architecture for long term evolution. Distributed content caching systems are expected to grow substantially in the future, in terms of both footprint and traffic carried and, as such, will become substantially more complex and costly. This thesis addresses the problem of designing scalable and cost-effective distributed caching systems that will be able to efficiently support the expected massive growth of content traffic and makes three distinct contributions. First, it produces an extensive theoretical characterisation of sharding, which is a widely used technique to allocate data items to resources of a distributed system according to a hash function. Based on the findings unveiled by this analysis, two systems are designed contributing to the abovementioned objective. The first is a framework and related algorithms for enabling efficient load-balanced content caching. This solution provides qualitative advantages over previously proposed solutions, such as ease of modelling and availability of knobs to fine-tune performance, as well as quantitative advantages, such as 2x increase in cache hit ratio and 19-33% reduction in load imbalance while maintaining comparable latency to other approaches. The second is the design and implementation of a caching node enabling 20 Gbps speeds based on inexpensive commodity hardware. We believe these contributions advance significantly the state of the art in distributed caching systems

    Future of networking is the future of Big Data, The

    Get PDF
    2019 Summer.Includes bibliographical references.Scientific domains such as Climate Science, High Energy Particle Physics (HEP), Genomics, Biology, and many others are increasingly moving towards data-oriented workflows where each of these communities generates, stores and uses massive datasets that reach into terabytes and petabytes, and projected soon to reach exabytes. These communities are also increasingly moving towards a global collaborative model where scientists routinely exchange a significant amount of data. The sheer volume of data and associated complexities associated with maintaining, transferring, and using them, continue to push the limits of the current technologies in multiple dimensions - storage, analysis, networking, and security. This thesis tackles the networking aspect of big-data science. Networking is the glue that binds all the components of modern scientific workflows, and these communities are becoming increasingly dependent on high-speed, highly reliable networks. The network, as the common layer across big-science communities, provides an ideal place for implementing common services. Big-science applications also need to work closely with the network to ensure optimal usage of resources, intelligent routing of requests, and data. Finally, as more communities move towards data-intensive, connected workflows - adopting a service model where the network provides some of the common services reduces not only application complexity but also the necessity of duplicate implementations. Named Data Networking (NDN) is a new network architecture whose service model aligns better with the needs of these data-oriented applications. NDN's name based paradigm makes it easier to provide intelligent features at the network layer rather than at the application layer. This thesis shows that NDN can push several standard features to the network. This work is the first attempt to apply NDN in the context of large scientific data; in the process, this thesis touches upon scientific data naming, name discovery, real-world deployment of NDN for scientific data, feasibility studies, and the designs of in-network protocols for big-data science

    Data Structures and Algorithms for Scalable NDN Forwarding

    Get PDF
    Named Data Networking (NDN) is a recently proposed general-purpose network architecture that aims to address the limitations of the Internet Protocol (IP), while maintaining its strengths. NDN takes an information-centric approach, focusing on named data rather than computer addresses. In NDN, the content is identified by its name, and each NDN packet has a name that specifies the content it is fetching or delivering. Since there are no source and destination addresses in an NDN packet, it is forwarded based on a lookup of its name in the forwarding plane, which consists of the Forwarding Information Base (FIB), Pending Interest Table (PIT), and Content Store (CS). In addition, as an in-network caching element, a scalable Repository (Repo) design is needed to provide large-scale long-term content storage in NDN networks. Scalable NDN forwarding is a challenge. Compared to the well-understood approaches to IP forwarding, NDN forwarding performs lookups on packet names, which have variable and unbounded lengths, increasing the lookup complexity. The lookup tables are larger than in IP, requiring more memory space. Moreover, NDN forwarding has a read-write data plane, requiring per-packet updates at line rates. Designing and evaluating a scalable NDN forwarding node architecture is a major effort within the overall NDN research agenda. The goal of this dissertation is to demonstrate that scalable NDN forwarding is feasible with the proposed data structures and algorithms. First, we propose a FIB lookup design based on the binary search of hash tables that provides a reliable longest name prefix lookup performance baseline for future NDN research. We have demonstrated 10 Gbps forwarding throughput with 256-byte packets and one billion synthetic forwarding rules, each containing up to seven name components. Second, we explore data structures and algorithms to optimize the FIB design based on the specific characteristics of real-world forwarding datasets. Third, we propose a fingerprint-only PIT design that reduces the memory requirements in the core routers. Lastly, we discuss the Content Store design issues and demonstrate that the NDN Repo implementation can leverage many of the existing databases and storage systems to improve performance

    Doctor of Philosophy

    Get PDF
    dissertationAs the base of the software stack, system-level software is expected to provide ecient and scalable storage, communication, security and resource management functionalities. However, there are many computationally expensive functionalities at the system level, such as encryption, packet inspection, and error correction. All of these require substantial computing power. What's more, today's application workloads have entered gigabyte and terabyte scales, which demand even more computing power. To solve the rapidly increased computing power demand at the system level, this dissertation proposes using parallel graphics pro- cessing units (GPUs) in system software. GPUs excel at parallel computing, and also have a much faster development trend in parallel performance than central processing units (CPUs). However, system-level software has been originally designed to be latency-oriented. GPUs are designed for long-running computation and large-scale data processing, which are throughput-oriented. Such mismatch makes it dicult to t the system-level software with the GPUs. This dissertation presents generic principles of system-level GPU computing developed during the process of creating our two general frameworks for integrating GPU computing in storage and network packet processing. The principles are generic design techniques and abstractions to deal with common system-level GPU computing challenges. Those principles have been evaluated in concrete cases including storage and network packet processing applications that have been augmented with GPU computing. The signicant performance improvement found in the evaluation shows the eectiveness and eciency of the proposed techniques and abstractions. This dissertation also presents a literature survey of the relatively young system-level GPU computing area, to introduce the state of the art in both applications and techniques, and also their future potentials

    GPU Accelerated protocol analysis for large and long-term traffic traces

    Get PDF
    This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark

    Profiling Large-scale Live Video Streaming and Distributed Applications

    Get PDF
    PhDToday, distributed applications run at data centre and Internet scales, from intensive data analysis, such as MapReduce; to the dynamic demands of a worldwide audience, such as YouTube. The network is essential to these applications at both scales. To provide adequate support, we must understand the full requirements of the applications, which are revealed by the workloads. In this thesis, we study distributed system applications at different scales to enrich this understanding. Large-scale Internet applications have been studied for years, such as social networking service (SNS), video on demand (VoD), and content delivery networks (CDN). An emerging type of video broadcasting on the Internet featuring crowdsourced live video streaming has garnered attention allowing platforms such as Twitch to attract over 1 million concurrent users globally. To better understand Twitch, we collected real-time popularity data combined with metadata about the contents and found the broadcasters rather than the content drives its popularity. Unlike YouTube and Netflix where content can be cached, video streaming on Twitch is generated instantly and needs to be delivered to users immediately to enable real-time interaction. Thus, we performed a large-scale measurement of Twitchs content location revealing the global footprint of its infrastructure as well as discovering the dynamic stream hosting and client redirection strategies that helped Twitch serve millions of users at scale. We next consider applications that run inside the data centre. Distributed computing applications heavily rely on the network due to data transmission needs and the scheduling of resources and tasks. One successful application, called Hadoop, has been widely deployed for Big Data processing. However, little work has been devoted to understanding its network. We found the Hadoop behaviour is limited by hardware resources and processing jobs presented. Thus, after characterising the Hadoop traffic on our testbed with a set of benchmark jobs, we built a simulator to reproduce Hadoops job traffic With the simulator, users can investigate the connections between Hadoop traffic and network performance without additional hardware cost. Different network components can be added to investigate the performance, such as network topologies, queue policies, and transport layer protocols. In this thesis, we extended the knowledge of networking by investigated two widelyused applications in the data centre and at Internet scale. We (i)studied the most popular live video streaming platform Twitch as a new type of Internet-scale distributed application revealing that broadcaster factors drive the popularity of such platform, and we (ii)discovered the footprint of Twitch streaming infrastructure and the dynamic stream hosting and client redirection strategies to provide an in-depth example of video streaming delivery occurring at the Internet scale, also we (iii)investigated the traffic generated by a distributed application by characterising the traffic of Hadoop under various parameters, (iv)with such knowledge, we built a simulation tool so users can efficiently investigate the performance of different network components under distributed applicationQueen Mary University of Londo

    Fourth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994
    corecore