2,081 research outputs found
CloudJet4BigData: Streamlining Big Data via an Accelerated Socket Interface
Big data needs to feed users with fresh processing results and cloud platforms can be used to speed up big data applications. This paper describes a new data communication protocol (CloudJet) for long distance and large volume big data accessing operations to alleviate the large latencies encountered in sharing big data resources in the clouds. It encapsulates a dynamic multi-stream/multi-path engine at the socket level, which conforms to Portable Operating System Interface (POSIX) and thereby can accelerate any POSIX-compatible applications across IP based networks. It was demonstrated that CloudJet accelerates typical big data applications such as very large database (VLDB), data mining, media streaming and office applications by up to tenfold in real-world tests
Distributed Training Large-Scale Deep Architectures
Scale of data and scale of computation infrastructures together enable the
current deep learning renaissance. However, training large-scale deep
architectures demands both algorithmic improvement and careful system
configuration. In this paper, we focus on employing the system approach to
speed up large-scale training. Via lessons learned from our routine
benchmarking effort, we first identify bottlenecks and overheads that hinter
data parallelism. We then devise guidelines that help practitioners to
configure an effective system and fine-tune parameters to achieve desired
speedup. Specifically, we develop a procedure for setting minibatch size and
choosing computation algorithms. We also derive lemmas for determining the
quantity of key components such as the number of GPUs and parameter servers.
Experiments and examples show that these guidelines help effectively speed up
large-scale deep learning training
Performance Analysis of Multiple Virtualized Servers
Server virtualization is considered as one of the most significant changes in IT operations in the past decade, making it possible to manage groups of servers with a greater degree of reliability at a lower cost. It is driven by the goal of reducing the total number of physical servers in an organization by consolidating multiple applications on shared servers. In this paper we construct several x86_64 servers based on VMware vSphere, and then analyze their performances using open source analyzing tools Pylot and Curl-loader. The results show that despite the enormous potential benefits of virtualization techniques, the efficiency decreased by increasing the number of virtual machines. So, a trade-off is needed between number of virtual machines and expected efficiency of servers
- …