14 research outputs found
Octopus: A Heterogeneous In-network Computing Accelerator Enabling Deep Learning for network
Deep learning (DL) for network models have achieved excellent performance in
the field and are becoming a promising component in future intelligent network
system. Programmable in-network computing device has great potential to deploy
DL for network models, however, existing device cannot afford to run a DL
model. The main challenges of data-plane supporting DL-based network models lie
in computing power, task granularity, model generality and feature extracting.
To address above problems, we propose Octopus: a heterogeneous in-network
computing accelerator enabling DL for network models. A feature extractor is
designed for fast and efficient feature extracting. Vector accelerator and
systolic array work in a heterogeneous collaborative way, offering
low-latency-highthroughput general computing ability for packet-and-flow-based
tasks. Octopus also contains on-chip memory fabric for storage and connecting,
and Risc-V core for global controlling. The proposed Octopus accelerator design
is implemented on FPGA. Functionality and performance of Octopus are validated
in several use-cases, achieving performance of 31Mpkt/s feature extracting,
207ns packet-based computing latency, and 90kflow/s flow-based computing
throughput
Distributed joint optimization of traffic engineering and server selection
Internet service providers (ISP) apply traffic engineering (TE) in the underlay network to avoid congestion. On the other hand, content providers (CP) use different server selection (SS) strategies in the overlay network to reduce delay. It has been shown that a joint optimization of TE and SS is beneficial to the performance from both ISP's and CP's perspectives. One challenging issue in such a network is to design a distributed protocol which achieves optimality while revealing as little information as possible between ISP and CP. To address this problem, we propose a distributed protocol termed PETS, in which each router of ISP makes independent traffic engineering decision and each server of CP makes independent server selection decision. We prove that PETS can achieve optimality for the joint optimization of TE and SS. We also show that PETS can significantly reduce message passing and enables ISP to hide important underlay network information (e.g., topology) from CP. Furthermore, PETS can be easily extended to handle the case of multiple CPs in the network
Optimal bandwidth assignment for multiple-description-coded video
In video streaming over multicast network, user bandwidth requirement is often heterogeneous possibly with orders of magnitude difference (say, from hundreds of kb/s for mobile devices to tens of Mb/s for high definition TV). Multiple description coding (MDC) can be used to address this bandwidth heterogeneity issue. In MDC, the video source is encoded into multiple independent descriptions. A receiver, depending on its available bandwidth, joins different descriptions to meet their bandwidth requirements. An important but challenging problem for MDC video multicast is how to assign bandwidth to each description in order to maximize overall user satisfaction. In this paper, we investigate this issue by formulating it as an optimization problem, with the objective to maximize user bandwidth experience by taking into account the encoding inefficiency due to MDC. We prove that the optimization problem is in general NP-hard. However, if the description number is larger than or equal to a certain threshold (for a bandwidth heterogeneity of a factor of 100, such threshold is 7 descriptions), there is an exact and simple solution to achieve maximum user satisfaction, i.e. meeting all the receiving bandwidth requirements. For the case when the description number is smaller, we present an efficient heuristic called SAMBA (Simulated Annealing for MDC Bandwidth Assignment) to assign bandwidth to each description given the distribution of user bandwidth requirement. We evaluate our algorithm using simulations. SAMBA achieves virtually the same optimal performance based on exhaustive search. By comparing with other assignment algorithms, SAMBA significantly improves user satisfaction. We also show that, if the coding efficiency decreases with the number of descriptions, there is an optimal description number to achieve maximal user satisfaction
NCP: Finishing Flows Even More Quickly
The transmission control protocol (TCP) is the major trans-
port layer protocol in the Internet today. TCP and its vari-
ants have the drawback of not knowing the explicit rate
share of flows at bottleneck links. The Rate Control Proto-
col (RCP) is a major clean slate congestion control protocol
which has been recently proposed to address these draw-
backs. RCP tries to get explicit knowledge of flow shares
at bottleneck links. However, RCP under or over estimates
the number of active flows which it needs to obtain the flow
fair rate share. This causes under or over utilization of bot-
tleneck link capacity. This in turn can result in very high
queue length and packet drops which translate into a high
average file completion time (AFCT).
In this paper we present the design and analysis of a Network
congestion Control Protocol (NCP). NCP can give flows
their fair share rates and hence resulting in the minimum
AFCT. Unlike RCP, NCP can also use accurate formula to
calculate the number of flows sharing a network link. This
enables NCP to assign fair share rates to flows without over
or under-utilization of bottleneck link capacities. Simula-
tion results confirm the design goals of NCP in achieving
minimum AFCT when compared with RCP.unpublishednot peer reviewe
A Vision-based Compression and Dissemination Framework for 3D Tele-immersive System
3D Tele-Immersion (3DTI) system brings 3D data of people
from geographically distributed locations into the same virtual
space to enable interaction in 3D space. One main obstacle
of designing 3DTI system is to overcome its high bandwidth
requirement when disseminating the 3D data over the
network. In this work, we present a novel compression and
dissemination framework to reduce the bandwidth usage.
The main idea is to decompose the 3D scene into seperable
objects and schedule the dissemination of each object
independent of each other based on the object's importance
to the quality of experience. Through real implementation
on the 3DTI testbed, our framework is shown to reduce the
bandwidth usage by 30% to 50% with reasonable degradation
of visual quality.unpublishednot peer reviewe
Joint Optimization of Content Replication and Server Selection for Video-On-Demand
We study providing large-scale video-on-demand (VoD) service to distributed users. In order to achieve scalability in user capacity and reduce the load of the core network, local servers with heterogeneous storage are deployed. Each server replicates the movie segments depending on their access probabilities. Considering the realistic scenario that underlay delay is a function of the total traffic in the link (including cross-traffic), we address two important problems to achieve low user interactive delay: 1) Which segments should each server replicate under the constraints of their capacities to achieve network-wide good locality effect? This is the so-called content replication (CR) problem; and 2) Given a number of remote servers with the requested segment, which one should serve the user? This is the so-called server selection (SS) problem. CR and SS problems couple with each other. In this paper, we propose a simple and distributed algorithm which seeks to jointly optimize CR and SS. The algorithm, termed CR-SS, achieves good caching locality by adaptively replacing segments and selecting servers with a simple lookup. Simulation results on Internet-like topologies show that CR-SS outperforms existing and state-of-the-art approaches by a wide margin, achieving substantially lower user delay