9 research outputs found
Data-aware workflow scheduling in heterogeneous distributed systems
Data transferring in scientific workflows gradually attracts more attention due to large amounts of data generated by complex scientific workflows will significantly increase the turnaround time of the whole workflow. It is almost impossible to make an optimal or approximate optimal scheduling for the end-to-end workflow without considering the intermediate data movement. In order to reduce the complexity of the workflow-scheduling problem, most researches done so far are constrained by many unrealistic assumptions, which result in non-optimal scheduling in practice. A constraint imposed by most researchers in their algorithms is that a computation site can only start the execution of other tasks after it has completed the execution of the current task and delivered the data generated by this task. We relax this constraint and allow overlap of execution and data movement in order to improve the parallelism of the tasks in the workflow. Furthermore, we generalize the conventional workflow to allow data to be staged in(out) from(to) remote data centers, design and implement an efficient data-aware scheduling strategy. The experimental results show that the turnaround time is reduced significantly in heterogeneous distributed systems by applying our scheduling strategy. To reduce the end-to-end workflow turnaround time, it is crucial to deliver the input, output and intermediate data as fast as possible. However, it is quite often that the throughput is much lower than expected while using single TCP stream to transfer data when the bandwidth of the network is not fully utilized. Multiple TCP streams will benefit the throughput. However, the throughput does not increase monotonically when increasing the number of parallel streams. Based on this observation, we propose to improve the existing throughput prediction models, design and implement a TCP throughput estimation and optimization service in the distributed systems to figure out the optimal configurations of TCP parallel streams. Experimental results show that the proposed estimation and optimization service can predict the throughput dynamically with high accuracy and the throughput can be increased significantly. Throughput optimization along with data-aware workflow scheduling allows us to minimize the end-to-end workflow turnaround time successfully
Application-level optimization of end-to-end data transfer throughput
For large-scale distributed applications, effective use of available network throughput and optimization of data transfer speed is crucial for end-to-end application performance. Today, many regional and national optical networking initiatives such as LONI, ESnet and Teragrid provide high speed network connectivity to their users. However, majority of the users fail to obtain even a fraction of the theoretical speeds promised by these networks due to issues such as sub-optimal protocol tuning, disk bottleneck on the sending and/or receiving ends, and processor limitations. This implies that having high speed networks in place is important but not sufficient for the improvement of end-to-end data transfer throughput. Being able to effectively use these high speed networks is becoming more and more important. Optimization of the underlying protocol parameters at the application layer (i.e. opening multiple parallel TCP streams, tuning the TCP buffer size and I/O block size) is one way of improving the network transfer throughput. On the other hand, end-to-end data transfer throughput bottleneck on high performance networking systems occur mostly at the participating storage systems rather than the network. The performance of a storage system heavily depends on the speed of its disk and CPU subsystems. Thus, it is critical to estimate the storage system\u27s bandwidth at both endpoints in addition to the network bandwidth. Disk bottleneck can be eliminated by the use of multiple disks (data striping), and CPU bottleneck can be eliminated by the use of multiple processors (parallelism). In this dissertation, we develop application-level models to predict the best combination of protocol parameters for optimal network performance, including the number of parallel data streams, protocol buffer size; and integration of disk and CPU speed parameters into the performance model to predict the optimal number of disk and CPU striping for the best end-to-end data throughput. These models will be made available to the community for use in data transfer tools, schedulers, and high-level planners
Recommended from our members
Exploring Computational Intelligence to Improve Network Performance
Several network protocols, services, and applications adjust their operation dynamically based on current network conditions. Consequently, keeping accurate estimates of network conditions and performance as they fluctuate over time is critical. In this thesis, we explore the use of computational intelligence, in particular machine learning techniques to estimate "near-future" network performance based on past network conditions. We call our approach to network performance estimation SENSE for Smart Experts for Network State Estimation. SENSE is able to respond to network dynamics at different time scales, i.e., long- and medium-term fluctuations as well as short-lived variations. Then, by applying SENSE, we proposed a novel algorithm to dynamically enable and disable IEEE 802.11 DCF's RTS/CTS handshake. Our algorithm uses current packet size and transmission rate, as well as an estimate of network contention to dynamically decide whether to use RTS/CTS. To the best of our knowledge, the proposed algorithm is the first to enable and disable the RTS/CTS handshake based on a set of current network conditions, and automatically adapt as these conditions change. Simulation results using a variety of WLAN- as well as wireless multi-hop ad-hoc network scenarios, including synthetic and real traffic traces, demonstrate that the proposed approach consistently outperforms current best practices, such as never enabling RTS/CTS or using a pre-specified threshold to decide whether to switch RTS/CTS on or off. We also propose a modified version of a simple, yet effective machine learning technique called "Fixed-Share" algorithm to optimize IEEE 802.11's backoff algorithm. To the best of our knowledge, this is the first approach that uses machine learning to dynamically set the IEEE 802.11's contention window based on past performance. Through simulations using a variety of network scenarios, we show that our method outperforms IEEE 802.11's original exponential back off algorithm as well as an approach that adapts based on a few recent data transmission events
Characterizing And Predicting Tcp Throughput On The Wide Area Network
... This paper addresses this issue. We begin by statistically characterizing the TCP throughput on the Internet, exploring the strong correlation between TCP flow size and throughput, and the transient end-to-end throughput distribution. We then analyze why benchmarking fails to predict large transfers, and propose a novel yet simple prediction model based on our observations. Our prototype