14 research outputs found

    Octopus: A Heterogeneous In-network Computing Accelerator Enabling Deep Learning for network

    Full text link
    Deep learning (DL) for network models have achieved excellent performance in the field and are becoming a promising component in future intelligent network system. Programmable in-network computing device has great potential to deploy DL for network models, however, existing device cannot afford to run a DL model. The main challenges of data-plane supporting DL-based network models lie in computing power, task granularity, model generality and feature extracting. To address above problems, we propose Octopus: a heterogeneous in-network computing accelerator enabling DL for network models. A feature extractor is designed for fast and efficient feature extracting. Vector accelerator and systolic array work in a heterogeneous collaborative way, offering low-latency-highthroughput general computing ability for packet-and-flow-based tasks. Octopus also contains on-chip memory fabric for storage and connecting, and Risc-V core for global controlling. The proposed Octopus accelerator design is implemented on FPGA. Functionality and performance of Octopus are validated in several use-cases, achieving performance of 31Mpkt/s feature extracting, 207ns packet-based computing latency, and 90kflow/s flow-based computing throughput

    Distributed joint optimization of traffic engineering and server selection

    Full text link
    Internet service providers (ISP) apply traffic engineering (TE) in the underlay network to avoid congestion. On the other hand, content providers (CP) use different server selection (SS) strategies in the overlay network to reduce delay. It has been shown that a joint optimization of TE and SS is beneficial to the performance from both ISP's and CP's perspectives. One challenging issue in such a network is to design a distributed protocol which achieves optimality while revealing as little information as possible between ISP and CP. To address this problem, we propose a distributed protocol termed PETS, in which each router of ISP makes independent traffic engineering decision and each server of CP makes independent server selection decision. We prove that PETS can achieve optimality for the joint optimization of TE and SS. We also show that PETS can significantly reduce message passing and enables ISP to hide important underlay network information (e.g., topology) from CP. Furthermore, PETS can be easily extended to handle the case of multiple CPs in the network

    Optimal bandwidth assignment for multiple-description-coded video

    No full text
    In video streaming over multicast network, user bandwidth requirement is often heterogeneous possibly with orders of magnitude difference (say, from hundreds of kb/s for mobile devices to tens of Mb/s for high definition TV). Multiple description coding (MDC) can be used to address this bandwidth heterogeneity issue. In MDC, the video source is encoded into multiple independent descriptions. A receiver, depending on its available bandwidth, joins different descriptions to meet their bandwidth requirements. An important but challenging problem for MDC video multicast is how to assign bandwidth to each description in order to maximize overall user satisfaction. In this paper, we investigate this issue by formulating it as an optimization problem, with the objective to maximize user bandwidth experience by taking into account the encoding inefficiency due to MDC. We prove that the optimization problem is in general NP-hard. However, if the description number is larger than or equal to a certain threshold (for a bandwidth heterogeneity of a factor of 100, such threshold is 7 descriptions), there is an exact and simple solution to achieve maximum user satisfaction, i.e. meeting all the receiving bandwidth requirements. For the case when the description number is smaller, we present an efficient heuristic called SAMBA (Simulated Annealing for MDC Bandwidth Assignment) to assign bandwidth to each description given the distribution of user bandwidth requirement. We evaluate our algorithm using simulations. SAMBA achieves virtually the same optimal performance based on exhaustive search. By comparing with other assignment algorithms, SAMBA significantly improves user satisfaction. We also show that, if the coding efficiency decreases with the number of descriptions, there is an optimal description number to achieve maximal user satisfaction

    NCP: Finishing Flows Even More Quickly

    Get PDF
    The transmission control protocol (TCP) is the major trans- port layer protocol in the Internet today. TCP and its vari- ants have the drawback of not knowing the explicit rate share of flows at bottleneck links. The Rate Control Proto- col (RCP) is a major clean slate congestion control protocol which has been recently proposed to address these draw- backs. RCP tries to get explicit knowledge of flow shares at bottleneck links. However, RCP under or over estimates the number of active flows which it needs to obtain the flow fair rate share. This causes under or over utilization of bot- tleneck link capacity. This in turn can result in very high queue length and packet drops which translate into a high average file completion time (AFCT). In this paper we present the design and analysis of a Network congestion Control Protocol (NCP). NCP can give flows their fair share rates and hence resulting in the minimum AFCT. Unlike RCP, NCP can also use accurate formula to calculate the number of flows sharing a network link. This enables NCP to assign fair share rates to flows without over or under-utilization of bottleneck link capacities. Simula- tion results confirm the design goals of NCP in achieving minimum AFCT when compared with RCP.unpublishednot peer reviewe

    A Vision-based Compression and Dissemination Framework for 3D Tele-immersive System

    Get PDF
    3D Tele-Immersion (3DTI) system brings 3D data of people from geographically distributed locations into the same virtual space to enable interaction in 3D space. One main obstacle of designing 3DTI system is to overcome its high bandwidth requirement when disseminating the 3D data over the network. In this work, we present a novel compression and dissemination framework to reduce the bandwidth usage. The main idea is to decompose the 3D scene into seperable objects and schedule the dissemination of each object independent of each other based on the object's importance to the quality of experience. Through real implementation on the 3DTI testbed, our framework is shown to reduce the bandwidth usage by 30% to 50% with reasonable degradation of visual quality.unpublishednot peer reviewe

    Optimal Bandwidth Assignment for Multiple-Description-Coded Video

    No full text

    Joint Optimization of Content Replication and Server Selection for Video-On-Demand

    No full text
    We study providing large-scale video-on-demand (VoD) service to distributed users. In order to achieve scalability in user capacity and reduce the load of the core network, local servers with heterogeneous storage are deployed. Each server replicates the movie segments depending on their access probabilities. Considering the realistic scenario that underlay delay is a function of the total traffic in the link (including cross-traffic), we address two important problems to achieve low user interactive delay: 1) Which segments should each server replicate under the constraints of their capacities to achieve network-wide good locality effect? This is the so-called content replication (CR) problem; and 2) Given a number of remote servers with the requested segment, which one should serve the user? This is the so-called server selection (SS) problem. CR and SS problems couple with each other. In this paper, we propose a simple and distributed algorithm which seeks to jointly optimize CR and SS. The algorithm, termed CR-SS, achieves good caching locality by adaptively replacing segments and selecting servers with a simple lookup. Simulation results on Internet-like topologies show that CR-SS outperforms existing and state-of-the-art approaches by a wide margin, achieving substantially lower user delay
    corecore