1,555 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Application of Linear Programming in Scheduling

    Get PDF
    Distributed computing virtually combines the scattered interconnected computing resources and satisfies the demand of compute-bound and data-hungry applications. The paper highlights various Distributed Computing Environments (DCE), scheduling techniques and need of incorporation of Dynamic Load Balancing (DLB) in scheduling. The paper also opens a new area of research by introducing Linear Programming as a scheduling technique in DCE

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Adaptive Multimedia Content Delivery for Scalable Web Servers

    Get PDF
    The phenomenal growth in the use of the World Wide Web often places a heavy load on networks and servers, threatening to increase Web server response time and raising scalability issues for both the network and the server. With the advances in the field of optical networking and the increasing use of broadband technologies like cable modems and DSL, the server and not the network, is more likely to be the bottleneck. Many clients are willing to receive a degraded, less resource intensive version of the requested content as an alternative to connection failures. In this thesis, we present an adaptive content delivery system that transparently switches content depending on the load on the server in order to serve more clients. Our system is designed to work for dynamic Web pages and streaming multimedia traffic, which are not currently supported by other adaptive content approaches. We have designed a system which is capable of quantifying the load on the server and then performing the necessary adaptation. We designed a streaming MPEG server and client which can react to the server load by scaling the quality of frames transmitted. The main benefits of our approach include: transparent content switching for content adaptation, alleviating server load by a graceful degradation of server performance and no requirement of modification to existing server software, browsers or the HTTP protocol. We experimentally evaluate our adaptive server system and compare it with an unadaptive server. We find that adaptive content delivery can support as much as 25% more static requests, 15% more dynamic requests and twice as many multimedia requests as a non-adaptive server. Our, client-side experiments performed on the Internet show that the response time savings from our system are quite significant

    A New Efficient Cloud Model for Data Intensive Application

    Get PDF
    Cloud computing play an important role in data intensive application since it provide a consistent performance over time and it provide scalability and good fault tolerant mechanism Hadoop provide a scalable data intensive map reduce architecture Hadoop map task are executed on large cluster and consumes lot of energy and resources Executing these tasks requires lot of resource and energy which are expensive so minimizing the cost and resource is critical for a map reduce application So here in this paper we propose a new novel efficient cloud structure algorithm for data processing or computation on azure cloud Here we propose an efficient BSP based dynamic scheduling algorithm for iterative MapReduce for data intensive application on Microsoft azure cloud platform Our framework can be used on different domain application such as data analysis medical research dataminining etc Here we analyze the performance of our system by using a co-located cashing on the worker role and how it is improving the performance of data intensive application over Hadoop map reduce data intrinsic application The experimental result shows that our proposed framework properly utilizes cloud infrastructure service management overheads bandwith bottleneck and it is high scalable fault tolerant and efficien

    Improving the Multi-Channel Hybrid Data Dissemination System

    Get PDF
    A major problem with the Internet and web-based applications is the scalable delivery of data. Lack of scalability can hinder performance and decrease the ability of a system to perform as originally designed. One of the most promising solutions to this scalability problem is to use a multiple channel hybrid data dissemination server to deliver requested information to users. This solution provides the high scalability found in multicast, with the low latency found in unicast. A multiple channel hybrid server works by using a push-based multicast channel to deliver the most popular data to users, and reserves the pull-based unicast channel for user requests and delivery of less popular data.The adoption of a multiple channel hybrid data dissemination server, however, introduces a variety of data management problems. In this dissertation, we propose an improved multiple channel hybrid data dissemination model, and propose solutions to three fundamental data management problems that arise in any multiple channel hybrid scheme. In particular, we address the push popularity problem, the document classification problem, and the bandwidth division problem. We also propose a multicast pull channel to the common two-channel hybrid scheme. Our hypothesis that this new channel both improves scalability, and decreases variances in response times, is confirmed by our extensive experimental results. We develop a fully functioning architecture for our three-channel hybrid scheme. In a real world environment, our middleware is shown to provide high scalability for overloaded web servers, while keeping the response times experienced by clients at a minimum. Further, we demonstrate that the practical impact of this work extends to other broadcast-based environments, such as a wireless network
    • …
    corecore