71 research outputs found

    Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library

    Full text link
    Remote data access for data analysis in high performance computing is commonly done with specialized data access protocols and storage systems. These protocols are highly optimized for high throughput on very large datasets, multi-streams, high availability, low latency and efficient parallel I/O. The purpose of this paper is to describe how we have adapted a generic protocol, the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative for high performance I/O and data analysis applications in a global computing grid: the Worldwide LHC Computing Grid. In this work, we first analyze the design differences between the HTTP protocol and the most common high performance I/O protocols, pointing out the main performance weaknesses of HTTP. Then, we describe in detail how we solved these issues. Our solutions have been implemented in a toolkit called davix, available through several recent Linux distributions. Finally, we describe the results of our benchmarks where we compare the performance of davix against a HPC specific protocol for a data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho

    Performance improvement of an optical network providing services based on multicast

    Full text link
    Operators of networks covering large areas are confronted with demands from some of their customers who are virtual service providers. These providers may call for the connectivity service which fulfils the specificity of their services, for instance a multicast transition with allocated bandwidth. On the other hand, network operators want to make profit by trading the connectivity service of requested quality to their customers and to limit their infrastructure investments (or do not invest anything at all). We focus on circuit switching optical networks and work on repetitive multicast demands whose source and destinations are {\em \`a priori} known by an operator. He may therefore have corresponding trees "ready to be allocated" and adapt his network infrastructure according to these recurrent transmissions. This adjustment consists in setting available branching routers in the selected nodes of a predefined tree. The branching nodes are opto-electronic nodes which are able to duplicate data and retransmit it in several directions. These nodes are, however, more expensive and more energy consuming than transparent ones. In this paper we are interested in the choice of nodes of a multicast tree where the limited number of branching routers should be located in order to minimize the amount of required bandwidth. After formally stating the problem we solve it by proposing a polynomial algorithm whose optimality we prove. We perform exhaustive computations to show an operator gain obtained by using our algorithm. These computations are made for different methods of the multicast tree construction. We conclude by giving dimensioning guidelines and outline our further work.Comment: 16 pages, 13 figures, extended version from Conference ISCIS 201

    Comparative Analysis of Cloud Simulators and Authentication Techniques in Cloud Computing

    Get PDF
    Cloud computing is the concern of computer hardware and software resources above the internet so that anyone who is connected to the internet can access it as a service or provision in a seamless way. As we are moving more and more towards the application of this newly emerging technology, it is essential to study, evaluate and analyze the performance, security and other related problems that might be encountered in cloud computing. Since, it is not a practicable way to directly examine the behavior of cloud on such problems using the real hardware and software resources due to its high costs, modeling and simulation has become an essential tool to withstand with these issues. In this paper, we retrospect, analyse and compare features of the existing cloud computing simulators and various location based authentication and simulation tools

    A survey of general-purpose experiment management tools for distributed systems

    Get PDF
    International audienceIn the field of large-scale distributed systems, experimentation is particularly difficult. The studied systems are complex, often nondeterministic and unreliable, software is plagued with bugs, whereas the experiment workflows are unclear and hard to reproduce. These obstacles led many independent researchers to design tools to control their experiments, boost productivity and improve quality of scientific results. Despite much research in the domain of distributed systems experiment management, the current fragmentation of efforts asks for a general analysis. We therefore propose to build a framework to uncover missing functionality of these tools, enable meaningful comparisons be-tween them and find recommendations for future improvements and research. The contribution in this paper is twofold. First, we provide an extensive list of features offered by general-purpose experiment management tools dedicated to distributed systems research on real platforms. We then use it to assess existing solutions and compare them, outlining possible future paths for improvements

    ISOGA: Integrated Services Optical Grid Architecture for Emerging E-Science Collaborative Applications

    Full text link

    Throughput Optimal On-Line Algorithms for Advanced Resource Reservation in Ultra High-Speed Networks

    Full text link
    Advanced channel reservation is emerging as an important feature of ultra high-speed networks requiring the transfer of large files. Applications include scientific data transfers and database backup. In this paper, we present two new, on-line algorithms for advanced reservation, called BatchAll and BatchLim, that are guaranteed to achieve optimal throughput performance, based on multi-commodity flow arguments. Both algorithms are shown to have polynomial-time complexity and provable bounds on the maximum delay for 1+epsilon bandwidth augmented networks. The BatchLim algorithm returns the completion time of a connection immediately as a request is placed, but at the expense of a slightly looser competitive ratio than that of BatchAll. We also present a simple approach that limits the number of parallel paths used by the algorithms while provably bounding the maximum reduction factor in the transmission throughput. We show that, although the number of different paths can be exponentially large, the actual number of paths needed to approximate the flow is quite small and proportional to the number of edges in the network. Simulations for a number of topologies show that, in practice, 3 to 5 parallel paths are sufficient to achieve close to optimal performance. The performance of the competitive algorithms are also compared to a greedy benchmark, both through analysis and simulation.Comment: 9 pages, 8 figure

    Data Avenue: Remote Storage Resource Management in WS-PGRADE/gUSE

    Get PDF
    corecore