71 research outputs found
Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library
Remote data access for data analysis in high performance computing is
commonly done with specialized data access protocols and storage systems. These
protocols are highly optimized for high throughput on very large datasets,
multi-streams, high availability, low latency and efficient parallel I/O. The
purpose of this paper is to describe how we have adapted a generic protocol,
the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative
for high performance I/O and data analysis applications in a global computing
grid: the Worldwide LHC Computing Grid. In this work, we first analyze the
design differences between the HTTP protocol and the most common high
performance I/O protocols, pointing out the main performance weaknesses of
HTTP. Then, we describe in detail how we solved these issues. Our solutions
have been implemented in a toolkit called davix, available through several
recent Linux distributions. Finally, we describe the results of our benchmarks
where we compare the performance of davix against a HPC specific protocol for a
data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho
Recommended from our members
ISOGA: Integrated Services Optical Grid Architecture for Emerging E-Science Collaborative Applications
This final report describes the accomplishments in the ISOGA (Integrated Services Optical Grid Architecture) project. ISOGA enables efficient deployment of existing and emerging collaborative grid applications with increasingly diverse multimedia communication requirements over a wide-area multi-domain optical network grid; and enables collaborative scientists with fast retrieval and seamless browsing of distributed scientific multimedia datasets over a wide-area optical network grid. The project focuses on research and development in the following areas: the polymorphic optical network control planes to enable multiple switching and communication services simultaneously; the intelligent optical grid user-network interface to enable user-centric network control and monitoring; and the seamless optical grid dataset browsing interface to enable fast retrieval of local/remote dataset for visualization and manipulation
Performance improvement of an optical network providing services based on multicast
Operators of networks covering large areas are confronted with demands from
some of their customers who are virtual service providers. These providers may
call for the connectivity service which fulfils the specificity of their
services, for instance a multicast transition with allocated bandwidth. On the
other hand, network operators want to make profit by trading the connectivity
service of requested quality to their customers and to limit their
infrastructure investments (or do not invest anything at all).
We focus on circuit switching optical networks and work on repetitive
multicast demands whose source and destinations are {\em \`a priori} known by
an operator. He may therefore have corresponding trees "ready to be allocated"
and adapt his network infrastructure according to these recurrent
transmissions. This adjustment consists in setting available branching routers
in the selected nodes of a predefined tree. The branching nodes are
opto-electronic nodes which are able to duplicate data and retransmit it in
several directions. These nodes are, however, more expensive and more energy
consuming than transparent ones.
In this paper we are interested in the choice of nodes of a multicast tree
where the limited number of branching routers should be located in order to
minimize the amount of required bandwidth. After formally stating the problem
we solve it by proposing a polynomial algorithm whose optimality we prove. We
perform exhaustive computations to show an operator gain obtained by using our
algorithm. These computations are made for different methods of the multicast
tree construction. We conclude by giving dimensioning guidelines and outline
our further work.Comment: 16 pages, 13 figures, extended version from Conference ISCIS 201
Comparative Analysis of Cloud Simulators and Authentication Techniques in Cloud Computing
Cloud computing is the concern of computer hardware and software resources above the internet so that anyone who is connected to the internet can access it as a service or provision in a seamless way. As we are moving more and more towards the application of this newly emerging technology, it is essential to study, evaluate and analyze the performance, security and other related problems that might be encountered in cloud computing. Since, it is not a practicable way to directly examine the behavior of cloud on such problems using the real hardware and software resources due to its high costs, modeling and simulation has become an essential tool to withstand with these issues. In this paper, we retrospect, analyse and compare features of the existing cloud computing simulators and various location based authentication and simulation tools
A survey of general-purpose experiment management tools for distributed systems
International audienceIn the field of large-scale distributed systems, experimentation is particularly difficult. The studied systems are complex, often nondeterministic and unreliable, software is plagued with bugs, whereas the experiment workflows are unclear and hard to reproduce. These obstacles led many independent researchers to design tools to control their experiments, boost productivity and improve quality of scientific results. Despite much research in the domain of distributed systems experiment management, the current fragmentation of efforts asks for a general analysis. We therefore propose to build a framework to uncover missing functionality of these tools, enable meaningful comparisons be-tween them and find recommendations for future improvements and research. The contribution in this paper is twofold. First, we provide an extensive list of features offered by general-purpose experiment management tools dedicated to distributed systems research on real platforms. We then use it to assess existing solutions and compare them, outlining possible future paths for improvements
Throughput Optimal On-Line Algorithms for Advanced Resource Reservation in Ultra High-Speed Networks
Advanced channel reservation is emerging as an important feature of ultra
high-speed networks requiring the transfer of large files. Applications include
scientific data transfers and database backup. In this paper, we present two
new, on-line algorithms for advanced reservation, called BatchAll and BatchLim,
that are guaranteed to achieve optimal throughput performance, based on
multi-commodity flow arguments. Both algorithms are shown to have
polynomial-time complexity and provable bounds on the maximum delay for
1+epsilon bandwidth augmented networks. The BatchLim algorithm returns the
completion time of a connection immediately as a request is placed, but at the
expense of a slightly looser competitive ratio than that of BatchAll. We also
present a simple approach that limits the number of parallel paths used by the
algorithms while provably bounding the maximum reduction factor in the
transmission throughput. We show that, although the number of different paths
can be exponentially large, the actual number of paths needed to approximate
the flow is quite small and proportional to the number of edges in the network.
Simulations for a number of topologies show that, in practice, 3 to 5 parallel
paths are sufficient to achieve close to optimal performance. The performance
of the competitive algorithms are also compared to a greedy benchmark, both
through analysis and simulation.Comment: 9 pages, 8 figure
- …