48 research outputs found

    Performance Optimization and Dynamics Control for Large-scale Data Transfer in Wide-area Networks

    Get PDF
    Transport control plays an important role in the performance of large-scale scientific and media streaming applications involving transfer of large data sets, media streaming, online computational steering, interactive visualization, and remote instrument control. In general, these applications have two distinctive classes of transport requirements: large-scale scientific applications require high bandwidths to move bulk data across wide-area networks, while media streaming applications require stable bandwidths to ensure smooth media playback. Unfortunately, the widely deployed Transmission Control Protocol is inadequate for such tasks due to its performance limitations. The purpose of this dissertation is to conduct rigorous analytical study of the design and performance of transport solutions, and develop an integrated transport solution in a systematical way to overcome the limitations of current transport methods. One of the primary challenges is to explore and compose a set of feasible route options with multiple constraints. Another challenge essentially arises from the randomness inherent in wide-area networks, particularly the Internet. This randomness must be explicitly accounted for to achieve both goodput maximization and stabilization over the constructed routes by suitably adjusting the source rate in response to both network and host dynamics.The superior and robust performance of the proposed transport solution is extensively evaluated in a simulated environment and further verified through real-life implementations and deployments over both Internet and dedicated connections under disparate network conditions in comparison with existing transport methods

    Workload Prediction for Efficient Performance Isolation and System Reliability

    Get PDF
    In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key to effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications. For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two pro- posed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently. This thesis also proposes an accurate while efficient neural networks-based pre- diction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE. The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to work- loads to avoid VM performance tickets. Moreover, driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers

    Discrete-time queueing model for responsive network traffic and bottleneck queues

    Get PDF
    The Internet has been more and more intensively used in recent years. Although network infrastructure has been regularly upgraded, and the ability to manage heavy traffic greatly increased, especially on the core networks, congestion never ceases to appear, as the amount of traffic that flow on the Internet seems to be increasing at an even faster rate. Thus, congestion control mechanisms play a vital role in the functioning of the Internet. Active Queue Management (AQM) is a popular type of congestion control mechanism that is implemented on gateways (most notably routers), which can predict and avoid the congestion before it happens. When properly configured, AQMs can effectively reduce the congestion, and alleviate some of the problems such as global synchronisation and unfairness to bursty traffic. However, there are still many problems regarding AQMs. Most of the AQM schemes are quite sensitive to their parameters setting, and these parameters may be heavily dependent on the network traffic profile, which the administrator may not have intensive knowledge of, and is likely to change over time. When poorly configured, many AQMs perform no better than the basic drop-tail queue. There is currently no effective method to compare the performance of these AQM algorithms, caused by the parameter configuration problem. In this research, the aim is to propose a new analytical model, which mainly uses discrete-time queueing theory. A novel transient modification to the conventional equilibrium-based method is proposed, and it is utilised to further develop a dynamic interactive model of responsive traffic and bottleneck queues. Using step-by-step analysis, it represents the bursty traffic and oscillating queue length behaviour in practical network more accurately. It also provides an effective way of predicting the behaviour of a TCP-AQM system, allowing easier parameter optimisation for AQM schemes. Numerical solution using MATLAB and software simulation using NS-2 are used to extensively validate the proposed models, theories and conclusions

    Proceedings of the Third Edition of the Annual Conference on Wireless On-demand Network Systems and Services (WONS 2006)

    Get PDF
    Ce fichier regroupe en un seul documents l'ensemble des articles accéptés pour la conférences WONS2006/http://citi.insa-lyon.fr/wons2006/index.htmlThis year, 56 papers were submitted. From the Open Call submissions we accepted 16 papers as full papers (up to 12 pages) and 8 papers as short papers (up to 6 pages). All the accepted papers will be presented orally in the Workshop sessions. More precisely, the selected papers have been organized in 7 session: Channel access and scheduling, Energy-aware Protocols, QoS in Mobile Ad-Hoc networks, Multihop Performance Issues, Wireless Internet, Applications and finally Security Issues. The papers (and authors) come from all parts of the world, confirming the international stature of this Workshop. The majority of the contributions are from Europe (France, Germany, Greece, Italy, Netherlands, Norway, Switzerland, UK). However, a significant number is from Australia, Brazil, Canada, Iran, Korea and USA. The proceedings also include two invited papers. We take this opportunity to thank all the authors who submitted their papers to WONS 2006. You helped make this event again a success

    Le contrĂ´le de congestion dans les applications Pair-Ă -Pair : le cas de LEDBAT

    Get PDF
    In the last years, Internet delays are considerably growing, causing a performance deterioration of interactive applications. This phenomenon is getting worse with the increasing popularity of bandwidth-intensive applications, as video streaming, remote backup and P2P systems. The cause of these delays has been identified with the excess buffering inside the network, called “bufferbloat”. Research efforts in this direction head toward active queue management techniques and end-to-end congestion control. In this context, we investigated LEDBAT, a low-priority delay-based transport protocol introduced by BitTorrent. This protocol is designed to transfer large amount of data without affecting the delay experienced by other applications or users. First we analysed transport-level performance of LEDBAT using experimental measurement, simulation and analytical model. Specifically, we evaluated LEDBAT as is, comparing its performance to standard TCP or to other low priority protocols. We then identified a later-comer advantage and we proposed fLEDBAT, which re-introduces intra-protocol fairness maintaining the original LEDBAT objectives. Finally we studied the impact of the LEDBAT protocol on BitTorrent performance. Through simulations and real network experiments, we analysed how BitTorrent impacts on the buffer occupancy of the access node. BitTorrent performance was evaluated in terms of completion time, the main metric to assess the user quality of experience. Results showed that LEDBAT decreases the completion time with respect to standard TCP and significantly reduces the buffer occupancy, that translates in lower delays experienced by competing interactive applications.Durant ces dernières années, les délais de transmission sur Internet ont augmenté de manière considérable, causant une détérioration de performances des applications interactives. La cause de ces augmentations de délais est l’excès de mémoire tampon à l’intérieur du réseau, appelé "bufferbloat". Les efforts de recherche dans cette direction vont vers des techniques de gestion des files d’attente actives et des techniques de contrôle de congestion de bout-à-bout. Dans ce contexte, nous avons examiné LEDBAT, un protocole introduit par BitTorrent qui se base sur le délai au niveau transport, et conçu pour transférer grandes quantités de données sans affecter les délais expérimentés par d’autres applications ou utilisateurs. Nous avons analysé la performance de niveau de transport de LEDBAT avec de mesures expérimentales, de simulations et de modèles analytiques, en comparant ses performances au standard TCP ou à d’autre protocoles de failbe priorité. Nous avons ensuite identifié un problème d’iniquité, et nous avons proposé fLEDBAT, qui ré-introduit l’équité intra-protocole. Dans un deuxième temps, nous avons étudié l’impact du protocole LEDBAT sur la performance de BitTorrent. Par des simulations et des expérimentations sur réseaux réelles, nous avons analysé les effets de LEDBAT sur le remplissage des tampons des noeuds d’accès. Les performances de BitTorrent ont été évaluées en termes de temps d’exécution, qui reflète la qualité de l’expérience utilisateur. Dans les deux cas, les résultats ont montré que LEDBAT diminue le temps de traitement par rapport à TCP et réduit de manière significative l’utilisation de tampons, ce qui se traduit par une baisse des délais

    Buffer De-bloating in Wireless Access Networks

    Get PDF
    PhDExcessive buffering brings a new challenge into the networks which is known as Bufferbloat, which is harmful to delay sensitive applications. Wireless access networks consist of Wi-Fi and cellular networks. In the thesis, the performance of CoDel and RED are investigated in Wi-Fi networks with different types of traffic. Results show that CoDel and RED work well in Wi-Fi networks, due to the similarity of protocol structures of Wi-Fi and wired networks. It is difficult for RED to tune parameters in cellular networks because of the time-varying channel. CoDel needs modifications as it drops the first packet of queue and the head packet in cellular networks will be segmented. The major contribution of this thesis is that three new AQM algorithms tailored to cellular networks are proposed to alleviate large queuing delays. A channel quality aware AQM is proposed using the CQI. The proposed algorithm is tested with a single cell topology and simulation results show that the proposed algorithm reduces the average queuing delay for each user by 40% on average with TCP traffic compared to CoDel. A QoE aware AQM is proposed for VoIP traffic. Drops and delay are monitored and turned into QoE by mathematical models. The proposed algorithm is tested in NS3 and compared with CoDel, and it enhances the QoE of VoIP traffic and the average endto- end delay is reduced by more than 200 ms when multiple users with different CQI compete for the wireless channel. A random back-off AQM is proposed to alleviate the queuing delay created by video in cellular networks. The proposed algorithm monitors the play-out buffer and postpones the request of the next packet. The proposed algorithm is tested in various scenarios and it outperforms CoDel by 18% in controlling the average end-to-end delay when users have different channel conditions

    High performance communication on reconfigurable clusters

    Get PDF
    High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface. Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications. We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm. We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud
    corecore