82 research outputs found

    Scaling Up Concurrent Analytical Workloads on Multi-Core Servers

    Get PDF
    Today, an ever-increasing number of researchers, businesses, and data scientists collect and analyze massive amounts of data in database systems. The database system needs to process the resulting highly concurrent analytical workloads by exploiting modern multi-socket multi-core processor systems with non-uniform memory access (NUMA) architectures and increasing memory sizes. Conventional execution engines, however, are not designed for many cores, and neither scale nor perform efficiently on modern multi-core NUMA architectures. Firstly, their query-centric approach, where each query is optimized and evaluated independently, can result in unnecessary contention for hardware resources due to redundant work found across queries in highly concurrent workloads. Secondly, they are unaware of the non-uniform memory access costs and the underlying hardware topology, incurring unnecessarily expensive memory accesses and bandwidth saturation. In this thesis, we show how these scalability and performance impediments can be solved by exploiting sharing among concurrent queries and incorporating NUMA-aware adaptive task scheduling and data placement strategies in the execution engine. Regarding sharing, we identify and categorize state-of-the-art techniques for sharing data and work across concurrent queries at run-time into two categories: reactive sharing, which shares intermediate results across common query sub-plans, and proactive sharing, which builds a global query plan with shared operators to evaluate queries. We integrate the original research prototypes that introduce reactive and proactive sharing, perform a sensitivity analysis, and show how and when each technique benefits performance. Our most significant finding is that reactive and proactive sharing can be combined to exploit the advantages of both sharing techniques for highly concurrent analytical workloads. Regarding NUMA-awareness, we identify, implement, and compare various combinations of task scheduling and data placement strategies under a diverse set of highly concurrent analytical workloads. We develop a prototype based on a commercial main-memory column-store database system. Our most significant finding is that there is no single strategy for task scheduling and data placement that is best for all workloads. In specific, inter-socket stealing of memory-intensive tasks can hurt overall performance, and unnecessary partitioning of data across sockets involves an overhead. For this reason, we implement algorithms that adapt task scheduling and data placement to the workload at run-time. Our experiments show that both sharing and NUMA-awareness can significantly improve the performance and scalability of highly concurrent analytical workloads on modern multi-core servers. Thus, we argue that sharing and NUMA-awareness are key factors for supporting faster processing of big data analytical applications, fully exploiting the hardware resources of modern multi-core servers, and for more responsive user experience

    Towards Scalable Network Traffic Measurement With Sketches

    Get PDF
    Driven by the ever-increasing data volume through the Internet, the per-port speed of network devices reached 400 Gbps, and high-end switches are capable of processing 25.6 Tbps of network traffic. To improve the efficiency and security of the network, network traffic measurement becomes more important than ever. For fast and accurate traffic measurement, managing an accurate working set of active flows (WSAF) at line rates is a key challenge. WSAF is usually located in high-speed but expensive memories, such as TCAM or SRAM, and thus their capacity is quite limited. To scale up the per-flow measurement, we pursue three thrusts. In the first thrust, we propose to use In-DRAM WSAF and put a compact data structure (i.e., sketch) called FlowRegulator before WSAF to compensate for DRAM\u27s slow access time. Per our results, FlowRegulator can substantially reduce massive influxes to WSAF without compromising measurement accuracy. In the second thrust, we integrate our sketch into a network system and propose an SDN-based WLAN monitoring and management framework called RFlow+, which can overcome the limitations of existing traffic measurement solutions (e.g., OpenFlow and sFlow), such as a limited view, incomplete flow statistics, and poor trade-off between measurement accuracy and CPU/network overheads. In the third thrust, we introduce a novel sampling scheme to deal with the poor trade-off that is provided by the standard simple random sampling (SRS). Even though SRS has been widely used in practice because of its simplicity, it provides non-uniform sampling rates for different flows, because it samples packets over an aggregated data flow. Starting with a simple idea that independent per-flow packet sampling provides the most accurate estimation of each flow, we introduce a new concept of per-flow systematic sampling, aiming to provide the same sampling rate across all flows. In addition, we provide a concrete sampling method called SketchFlow, which approximates the idea of the per-flow systematic sampling using a sketch saturation event

    BloomCasting for publish/subscribe networks

    Get PDF
    Publish/subscribe has been proposed as a way of addressing information as the primary named entity in the network. In this thesis, we develop and explore a network architecture based on publish/subscribe primitives, based on our work on PSIRP project. Our work is divided into two areas: rendezvous and Bloomcasting, i.e. fast Bloom filter-based forwarding architecture for source-specific multicast. Taken together these are combined as a publish/subscribe architecture, where publisher and subscriber matching is done by the rendezvous and Bloom filter-based forwarding fabric is used for multicasting the published content. Our work on the inter-domain rendezvous shows that a combination of policy routing at edges and an overlay based on hierarchical distributed hash tables can overcome problems related to incremental deployment while keeping the stretch of queries small and that it can solve some policy related problems that arise from using distributed hash tables in inter-domain setting. Bloom filters can cause false positives. We show that false positives can cause network anomalies, when Bloom filters are used for packet forwarding. We found three such anomalies: packet storms, packet loops, and flow duplication. They can severely disrupt the network infrastructure and be used for denial-of-service attacks against the network or target services. These security and reliability problems can be solved by using the combination of three techniques. Cryptographically computed edge pair-labels ensure that an attacker cannot construct Bloom filter-based path identifiers for chosen path. Varying the Bloom filter parameters locally at each router prevents packet storms and using bit permutations on the Bloom filter locally at each router prevent accidental and malicious loops and flow duplications.Yksi Internetin puutteista on se, ettei ole mitään kaikille sovelluksille yhteistä tapaa nimetä informaatiota. Julkaisija/tilaaja-malli on yksi ehdotus, jolla Internet-arkkitehtuuria voisi muuttaa tämän puutteen korvaamiseksi. Väitöskirjassani kehitän julkaisija/tilaaja-malliin pohjautuvan verkkoarkkitehtuurin, joka pohjautuu työlleni PSRIP-projektissa. Arkkitehtuuri koostuu kohtaamisjärjestelmästä, joka yhdistää julkaisijat ja tilaajat, ja Bloom-suodattimiin pohjautuvasta monen vastaanottajan viestintäkanavasta, jolla julkaistu sisältö toimitetaan tilaajille. Internetin kattavalla kohtaamisjärjestelmällä on korkeat vaatimukset. Tutkin kahta erilaista menetelmää: paikallisiin reitityspolitiikoihin pohjautuvaa järjestelmää ja toinen hajautettuihin hajautustauluihin pohjautuvaa järjestelmää. Ensimmäisen haasteena on skaalautuvuus erityisesti silloin, kun kaikki Internetin verkot eivät osallistu järjestelmän ylläpitoon. Jälkimmäinen on ongelmallinen, sillä siihen pohjautuvat järjestelmät eivät voi taata, mitä reittiä julkaisu ja tilaus -viestit kulkevat järjestelmässä. Näin viesti saattaa kulkea myös julkaisijan tai tilaajan kilpailijan verkon kautta. Ehdotan väitöskirjassani menetelmää, joka yhdistää reunoilla politiikkaan pohjautuvan julkaisu/tilaaja reitityksen ja verkon keskellä yhdistää nämä erilliset saarekkeet hierarkista hajautettua hajautustaulua hyödyntäen. Julkaisujen toimittamiseen tilaajille käytän Bloom-suodattimiin pohjautuvaa järjestelmää. Osoitan väitöskirjassani, että Bloom-suodattimien käyttö pakettien reitittämiseen voi aiheuttaa verkossa merkittäviä vikatilanteita, esimerkiksi pakettiräjähdyksen, silmukan, tai samaan vuohon kuuluvien pakettien moninkertaistumisen. Nämä ongelmat aiheuttavat verkolle turvallisuus- ja luotettavuusongelmia, jotka voidaan ratkaista kolmen tekniikan yhdistelmällä. Ensinnäkin, Bloom-suodattimiin laitettavat polun osia merkitsevät nimet lasketaan kryptografiaa hyödyntäen, ettei hyökkääjä kykene laskemaan Bloom-suodatinta haluamalleen polulle ilman verkon apua. Toisekseen, reitittimet määrittävät Bloom suodatinparametrit paikallisesti siten, ettei pakkettiräjähdyksiä tapahdu. Kolmannekseen, kukin reititin uudelleen järjestelee Bloom-suodattimen bitit varmistaen, ettei suodatin ole enää sama, jos paketti kulkee esimerkiksi silmukan läpi ja palaa samalle takaisin samalle reitittimelle.

    Distributed services across the network from edge to core

    Get PDF
    The current internet architecture is evolving from a simple carrier of bits to a platform able to provide multiple complex services running across the entire Network Service Provider (NSP) infrastructure. This calls for increased flexibility in resource management and allocation to provide dedicated, on-demand network services, leveraging a distributed infrastructure consisting of heterogeneous devices. More specifically, NSPs rely on a plethora of low-cost Customer Premise Equipment (CPE), as well as more powerful appliances at the edge of the network and in dedicated data-centers. Currently a great research effort is spent to provide this flexibility through Fog computing, Network Functions Virtualization (NFV), and data plane programmability. Fog computing or Edge computing extends the compute and storage capabilities to the edge of the network, closer to the rapidly growing number of connected devices and applications that consume cloud services and generate massive amounts of data. A complementary technology is NFV, a network architecture concept targeting the execution of software Network Functions (NFs) in isolated Virtual Machines (VMs), potentially sharing a pool of general-purpose hosts, rather than running on dedicated hardware (i.e., appliances). Such a solution enables virtual network appliances (i.e., VMs executing network functions) to be provisioned, allocated a different amount of resources, and possibly moved across data centers in little time, which is key in ensuring that the network can keep up with the flexibility in the provisioning and deployment of virtual hosts in today’s virtualized data centers. Moreover, recent advances in networking hardware have introduced new programmable network devices that can efficiently execute complex operations at line rate. As a result, NFs can be (partially or entirely) folded into the network, speeding up the execution of distributed services. The work described in this Ph.D. thesis aims at showing how various network services can be deployed throughout the NSP infrastructure, accommodating to the different hardware capabilities of various appliances, by applying and extending the above-mentioned solutions. First, we consider a data center environment and the deployment of (virtualized) NFs. In this scenario, we introduce a novel methodology for the modelization of different NFs aimed at estimating their performance on different execution platforms. Moreover, we propose to extend the traditional NFV deployment outside of the data center to leverage the entire NSP infrastructure. This can be achieved by integrating native NFs, commonly available in low-cost CPEs, with an existing NFV framework. This facilitates the provision of services that require NFs close to the end user (e.g., IPsec terminator). On the other hand, resource-hungry virtualized NFs are run in the NSP data center, where they can take advantage of the superior computing and storage capabilities. As an application, we also present a novel technique to deploy a distributed service, specifically a web filter, to leverage both the low latency of a CPE and the computational power of a data center. We then show that also the core network, today dedicated solely to packet routing, can be exploited to provide useful services. In particular, we propose a novel method to provide distributed network services in core network devices by means of task distribution and a seamless coordination among the peers involved. The aim is to transform existing network nodes (e.g., routers, switches, access points) into a highly distributed data acquisition and processing platform, which will significantly reduce the storage requirements at the Network Operations Center and the packet duplication overhead. Finally, we propose to use new programmable network devices in data center networks to provide much needed services to distributed applications. By offloading part of the computation directly to the networking hardware, we show that it is possible to reduce both the network traffic and the overall job completion time

    FPGA-based architectures for next generation communications networks

    Get PDF
    This engineering doctorate concerns the application of Field Programmable Gate Array (FPGA) technology to some of the challenges faced in the design of next generation communications networks. The growth and convergence of such networks has fuelled demand for higher bandwidth systems, and a requirement to support a diverse range of payloads across the network span. The research which follows focuses on the development of FPGA-based architectures for two important paradigms in contemporary networking - Forward Error Correction and Packet Classification. The work seeks to combine analysis of the underlying algorithms and mathematical techniques which drive these applications, with an informed approach to the design of efficient FPGA-based circuits

    Efficient tree-based content-based routing schemes

    Get PDF
    This thesis is about routing and forwarding for inherently multicast communication such as the communication typical of information-centric networks. The notion of Information-Centric Networking (ICN) is an evolution of the Internet from the current host-centric architecture to a new architecture in which communication is based on “named information”. The ambitious goal of ICN is to effectively support the exchange and use of information in an ever more connected world, with billions of devices, many of which are mobile, producing and consuming large amounts of data. ICN is intended to support scalable content distribution, mobility, and security, for such applications as video on demand and networks of sensors or the so-called Internet of Things. Many ICN architectures have emerged in the past decade, and the ICN community has made significant progress in terms of infrastructure, test-bed deployments, and application case studies. And yet, despite the impressive research effort, the fundamental problems of routing and forwarding remain open. In particular, none of the proposed architectures has developed truly scalable name-based routing schemes and efficient name-based forwarding algorithms. This is not surprising, since the problem of routing based on names, in its most general formulation, is known to be fundamentally difficult. In general, one would want to support application-defined names (as opposed to network-defined addresses) with a compact routing scheme (small routing tables) that uses optimal paths and minimizes congestion, and that admits to a fast forwarding algorithm. Furthermore, one would want to construct this routing scheme with a decentralized and incremental protocol for administrative autonomy and efficient dynamic updates. However, there are clear theoretical limits that simply make it impossible to achieve all these goals. In this thesis we explore the design space of routing and forwarding in an information-centric network. Our purpose is to develop routing schemes and forwarding algorithms that combine many desirable properties. We consider two forms of addressing, one tied to network locations, and one based on more expressive content descriptors. We then consider trees as basic routing structures, and with those we develop routing schemes that are intended to minimize path lengths and congestion, separately or together. For one of these schemes based on expressive content descriptors, we also develop a fast forwarding algorithm specialized for massively parallel architectures such as GPUs. In summary, this thesis presents two efficient and scalable routing algorithms for two different types of networks, plus one scalable forwarding algorithm. We summarize each individual contribution below: Low-congestion geographic routing for wireless networks. We develop a low-congestion, multicast routing scheme designed specifically for wireless networks. The scheme supports geographical multicast routing, meaning routing to a set of nodes addressed by their physical position. The scheme builds a geometric minimum spanning tree connecting the source to all the destinations. Then, for each edge in this tree, the scheme routes a message through a random intermediate node, chosen independently of the set of multicast requests. The intermediate node is chosen in the vicinity of the corresponding edge such that congestion is reduced without stretching routes by more than a constant factor. Multi-tree scheme for content-based routing in ICN. We develop a tree-based routing scheme designed for large-scale wired networks such as the Internet. The scheme supports two forms of addresses: application-defined content descriptors, and network-defined locators. We first show that the scheme is effective in terms of stretch and congestion on the current AS-level Internet graph even with only a few spanning trees. Then we show that our content descriptors, which consist of sets of tags and that are more expressive than the name prefixes used in mainstream ICN, aggregate well in practice under our scheme. We also explain in detail how to use descriptors and locators, together with unique content identifiers, to support the efficient transmission and sharing of information through scalable and loop-free routes. Tag-based forwarding (partial matching) algorithm on GPUs. To accompany our ICN routing scheme, we develop a fast forwarding algorithm that matches incoming packets against forwarding tables with tens of millions of entries. To achieve high performance, we develop a practical solution for the partial matching problem that lies at the heart of this forwarding scheme. This solution amounts to a massively parallel algorithm specifically designed for a hybrid CPU/GPU architecture

    A Peer-to-Peer Network Framework Utilising the Public Mobile Telephone Network

    Get PDF
    P2P (Peer-to-Peer) technologies are well established and have now become accepted as a mainstream networking approach. However, the explosion of participating users has not been replicated within the mobile networking domain. Until recently the lack of suitable hardware and wireless network infrastructure to support P2P activities was perceived as contributing to the problem. This has changed with ready availability of handsets having ample processing resources utilising an almost ubiquitous mobile telephone network. Coupled with this has been a proliferation of software applications written for the more capable `smartphone' handsets. P2P systems have not naturally integrated and evolved into the mobile telephone ecosystem in a way that `client-server' operating techniques have. However as the number of clients for a particular mobile application increase, providing the `server side' data storage infrastructure becomes more onerous. P2P systems offer mobile telephone applications a way to circumvent this data storage issue by dispersing it across a network of the participating users handsets. The main goal of this work was to produce a P2P Application Framework that supports developers in creating mobile telephone applications that use distributed storage. Effort was assigned to determining appropriate design requirements for a mobile handset based P2P system. Some of these requirements are related to the limitations of the host hardware, such as power consumption. Others relate to the network upon which the handsets operate, such as connectivity. The thesis reviews current P2P technologies to assess which was viable to form the technology foundations for the framework. The aim was not to re-invent a P2P system design, rather to adopt an existing one for mobile operation. Built upon the foundations of a prototype application, the P2P framework resulting from modifications and enhancements grants access via a simple API (Applications Programmer Interface) to a subset of Nokia `smartphone' devices. Unhindered operation across all mobile telephone networks is possible through a proprietary application implementing NAT (Network Address Translation) traversal techniques. Recognising that handsets operate with limited resources, further optimisation of the P2P framework was also investigated. Energy consumption was a parameter chosen for further examination because of its impact on handset participation time. This work has proven that operating applications in conjunction with a P2P data storage framework, connected via the mobile telephone network, is technically feasible. It also shows that opportunity remains for further research to realise the full potential of this data storage technique

    Content addressable memory project

    Get PDF
    A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks
    corecore