36 research outputs found

    Experimental Analysis of Algorithms for Coflow Scheduling

    Full text link
    Modern data centers face new scheduling challenges in optimizing job-level performance objectives, where a significant challenge is the scheduling of highly parallel data flows with a common performance goal (e.g., the shuffle operations in MapReduce applications). Chowdhury and Stoica introduced the coflow abstraction to capture these parallel communication patterns, and Chowdhury et al. proposed effective heuristics to schedule coflows efficiently. In our previous paper, we considered the strongly NP-hard problem of minimizing the total weighted completion time of coflows with release dates, and developed the first polynomial-time scheduling algorithms with O(1)-approximation ratios. In this paper, we carry out a comprehensive experimental analysis on a Facebook trace and extensive simulated instances to evaluate the practical performance of several algorithms for coflow scheduling, including the approximation algorithms developed in our previous paper. Our experiments suggest that simple algorithms provide effective approximations of the optimal, and that the performance of our approximation algorithms is relatively robust, near optimal, and always among the best compared with the other algorithms, in both the offline and online settings.Comment: 29 pages, 8 figures, 11 table

    A Study on the Improvement of Data Collection in Data Centers and Its Analysis on Deep Learning-based Applications

    Get PDF
    Big data are usually stored in data center networks for processing and analysis through various cloud applications. Such applications are a collection of data-intensive jobs which often involve many parallel flows and are network bound in the distributed environment. The recent networking abstraction, coflow, for data parallel programming paradigm to express the communication requirements has opened new opportunities to network scheduling for such applications. Therefore, I propose coflow based network scheduling algorithm, Coflourish, to enhance the job completion time for such data-parallel applications, in the presence of the increased background traffic to mimic the cloud environment infrastructure. It outperforms Varys, the state-of-the-art coflow scheduling technique, by 75.5% under various workload conditions. However, such technique often requires customized operating systems, customized computing frameworks or external proprietary software-defined networking (SDN) switches. Consequently, in order to achieve the minimal application completion time, through coflow scheduling, coflow routing, and per-rate per-flow scheduling paradigm with minimum customization to the hosts and switches, I propose another scheduling technique, MinCOF which exploits the OpenFlow SDN. MinCOF provides faster deployability and no proprietary system requirements. It also decreases the average coflow completion time by 12.94% compared to the latest OpenFlow-based coflow scheduling and routing framework. Although the challenges related to analysis and processing of big data can be handled effectively through addressing the network issues. Sometimes, there are also challenges to analyze data effectively due to the limited data size. To further analyze such collected data, I use various deep learning approaches. Specifically, I design a framework to collect Twitter data during natural disaster events and then deploy deep learning model to detect the fake news spreading during such crisis situations. The wide-spread of fake news during disaster events disrupts the rescue missions and recovery activities, costing human lives and delayed response. My deep learning model classifies such fake events with 91.47% accuracy and F1 score of 90.89 to help the emergency managers during crisis. Therefore, this study focuses on providing network solutions to decrease the application completion time in the cloud environment, in addition to analyze the data collected using the deployed network framework to further use it to solve the real-world problems using the various deep learning approaches

    A Study of Application-awareness in Software-defined Data Center Networks

    Get PDF
    A data center (DC) has been a fundamental infrastructure for academia and industry for many years. Applications in DC have diverse requirements on communication. There are huge demands on data center network (DCN) control frameworks (CFs) for coordinating communication traffic. Simultaneously satisfying all demands is difficult and inefficient using existing traditional network devices and protocols. Recently, the agile software-defined Networking (SDN) is introduced to DCN for speeding up the development of the DCNCF. Application-awareness preserves the application semantics including the collective goals of communications. Previous works have illustrated that application-aware DCNCFs can much more efficiently allocate network resources by explicitly considering applications needs. A transfer application task level application-aware software-defined DCNCF (SDDCNCF) for OpenFlow software-defined DCN (SDDCN) for big data exchange is designed. The SDDCNCF achieves application-aware load balancing, short average transfer application task completion time, and high link utilization. The SDDCNCF is immediately deployable on SDDCN which consists of OpenFlow 1.3 switches. The Big Data Research Integration with Cyberinfrastructure for LSU (BIC-LSU) project adopts the SDDCNCF to construct a 40Gb/s high-speed storage area network to efficiently transfer big data for accelerating big data related researches at Louisiana State University. On the basis of the success of BIC-LSU, a coflow level application-aware SD- DCNCF for OpenFlow-based storage area networks, MinCOF, is designed. MinCOF incorporates all desirable features of existing coflow scheduling and routing frame- works and requires minimal changes on hosts. To avoid the architectural limitation of the OpenFlow SDN implementation, a coflow level application-aware SDDCNCF using fast packet processing library, Coflourish, is designed. Coflourish exploits congestion feedback assistances from SDN switches in the DCN to schedule coflows and can smoothly co-exist with arbitrary applications in a shared DCN. Coflourish is implemented using the fast packet processing library on an SDN switch, Open vSwitch with DPDK. Simulation and experiment results indicate that Coflourish effectively shortens average application completion time

    Matroid Coflow Scheduling

    Get PDF
    We consider the matroid coflow scheduling problem, where each job is comprised of a set of flows and the family of sets that can be scheduled at any time form a matroid. Our main result is a polynomial-time algorithm that yields a 2-approximation for the objective of minimizing the weighted completion time. This result is tight assuming P != NP. As a by-product we also obtain the first (2+epsilon)-approximation algorithm for the preemptive concurrent open shop scheduling problem

    Fair Coflow Scheduling via Controlled Slowdown

    Get PDF
    The average coflow completion time (CCT) is the standard performance metric in coflow scheduling. However, standard CCT minimization may introduce unfairness between the data transfer phase of different computing jobs. Thus, while progress guarantees have been introduced in the literature to mitigate this fairness issue, the trade-off between fairness and efficiency of data transfer is hard to control. This paper introduces a fairness framework for coflow scheduling based on the concept of slowdown, i.e., the performance loss of a coflow compared to isolation. By controlling the slowdown it is possible to enforce a target coflow progress while minimizing the average CCT. In the proposed framework, the minimum slowdown for a batch of coflows can be determined in polynomial time. By showing the equivalence with Gaussian elimination, slowdown constraints are introduced into primal-dual iterations of the CoFair algorithm. The algorithm extends the class of the σ-order schedulers to solve the fair coflow scheduling problem in polynomial time. It provides a 4-approximation of the average CCT w.r.t. an optimal scheduler. Extensive numerical results demonstrate that this approach can trade off average CCT for slowdown more efficiently than existing state of the art schedulers
    corecore