10,079 research outputs found
Throughput Optimal On-Line Algorithms for Advanced Resource Reservation in Ultra High-Speed Networks
Advanced channel reservation is emerging as an important feature of ultra
high-speed networks requiring the transfer of large files. Applications include
scientific data transfers and database backup. In this paper, we present two
new, on-line algorithms for advanced reservation, called BatchAll and BatchLim,
that are guaranteed to achieve optimal throughput performance, based on
multi-commodity flow arguments. Both algorithms are shown to have
polynomial-time complexity and provable bounds on the maximum delay for
1+epsilon bandwidth augmented networks. The BatchLim algorithm returns the
completion time of a connection immediately as a request is placed, but at the
expense of a slightly looser competitive ratio than that of BatchAll. We also
present a simple approach that limits the number of parallel paths used by the
algorithms while provably bounding the maximum reduction factor in the
transmission throughput. We show that, although the number of different paths
can be exponentially large, the actual number of paths needed to approximate
the flow is quite small and proportional to the number of edges in the network.
Simulations for a number of topologies show that, in practice, 3 to 5 parallel
paths are sufficient to achieve close to optimal performance. The performance
of the competitive algorithms are also compared to a greedy benchmark, both
through analysis and simulation.Comment: 9 pages, 8 figure
Data transfer scheduling with advance reservation and provisioning
Over the years, scientific applications have become more complex and more data intensive. Although through the use of distributed resources the institutions and organizations gain access to the resources needed for their large-scale applications, complex middleware is required to orchestrate the use of these storage and network resources between collaborating parties, and to manage the end-to-end processing of data. We present a new data scheduling paradigm with advance reservation and provisioning. Our methodology provides a basis for provisioning end-to-end high performance data transfers which require integration between system, storage and network resources, and coordination between reservation managers and data transfer nodes. This allows researchers/users and higher level meta-schedulers to use data placement as a service where they can plan ahead and reserve time and resources for their data movement operations. We present a novel approach for evaluating time-dependent structures with bandwidth guaranteed paths. We present a practical online scheduling model using advance reservation in dynamic network with time constraints. In addition, we report a new polynomial algorithm presenting possible reservation options and alternatives for earliest completion and shortest transfer duration. We enhance the advance network reservation system by extending the underlying mechanism to provide a new service in which users submit their constraints and the system suggests possible reservation requests satisfying users\u27 requirements. We have studied scheduling data transfer operation with resource and time conflicts. We have developed a new scheduling methodology considering resource allocation in client sites and bandwidth allocation on network link connecting resources. Some other major contributions of our study include enhanced reliability, adaptability, and performance optimization of distributed data placement tasks. While designing this new data scheduling architecture, we also developed other important methodologies such as early error detection, failure awareness, job aggregation, and dynamic adaptation of distributed data placement tasks. The adaptive tuning includes dynamically setting data transfer parameters and controlling utilization of available network capacity. Our research aims to provide a middleware to improve the data bottleneck in high performance computing systems
Improving Real-Time Data Dissemination Performance by Multi Path Data Scheduling in Data Grids
The performance of data grids for data intensive, real-time applications is highly dependent on the data dissemination algorithm employed in the system. Motivated by this fact, this study first formally defines the real-time splittable data dissemination problem (RTS/DDP) where data transfer requests can be routed over multiple paths to maximize the number of data transfers to be completed before their deadlines. Since RTS/DDP is proved to be NP-hard, four different heuristic algorithms, namely kSP/ESMP, kSP/BSMP, kDP/ESMP, and kDP/BSMP are proposed. The performance of these heuristic algorithms is analyzed through an extensive set of data grid system simulation scenarios. The simulation results reveal that a performance increase up to 8 % as compared to a very competitive single path data dissemination algorithm is possible
Energy-efficient bandwidth reservation for bulk data transfers in dedicated wired networks
International audienceThe ever increasing number of Internet connected end-hosts call for high performance end-to-end networks leading to an increase in the energy consumed by the networks. Our work deals with the energy consumption issue in dedicated network with bandwidth provisionning and in-advance reservations of network equipments and bandwidth for Bulk Data transfers. First, we propose an end-to-end energy cost model of such networks which described the energy consumed by a transfer for all the crossed equipments. This model is then used to develop a new energy-aware framework adapted to Bulk Data Transfers over dedicated networks. This framework enables switching off unused network portions during certain periods of time to save energy. This framework is also endowed with prediction algorithms to avoid useless switching off and with adaptive scheduling management to optimize the energy used by the transfers. 1 Introductio
Future of networking is the future of Big Data, The
2019 Summer.Includes bibliographical references.Scientific domains such as Climate Science, High Energy Particle Physics (HEP), Genomics, Biology, and many others are increasingly moving towards data-oriented workflows where each of these communities generates, stores and uses massive datasets that reach into terabytes and petabytes, and projected soon to reach exabytes. These communities are also increasingly moving towards a global collaborative model where scientists routinely exchange a significant amount of data. The sheer volume of data and associated complexities associated with maintaining, transferring, and using them, continue to push the limits of the current technologies in multiple dimensions - storage, analysis, networking, and security. This thesis tackles the networking aspect of big-data science. Networking is the glue that binds all the components of modern scientific workflows, and these communities are becoming increasingly dependent on high-speed, highly reliable networks. The network, as the common layer across big-science communities, provides an ideal place for implementing common services. Big-science applications also need to work closely with the network to ensure optimal usage of resources, intelligent routing of requests, and data. Finally, as more communities move towards data-intensive, connected workflows - adopting a service model where the network provides some of the common services reduces not only application complexity but also the necessity of duplicate implementations. Named Data Networking (NDN) is a new network architecture whose service model aligns better with the needs of these data-oriented applications. NDN's name based paradigm makes it easier to provide intelligent features at the network layer rather than at the application layer. This thesis shows that NDN can push several standard features to the network. This work is the first attempt to apply NDN in the context of large scientific data; in the process, this thesis touches upon scientific data naming, name discovery, real-world deployment of NDN for scientific data, feasibility studies, and the designs of in-network protocols for big-data science
Single-path versus multi-path advance reservation in media production networks
In media production, a set of actors works simultaneously on video content from different sources. If the actors are geographically spread, the use of a shared substrate network can improve their collaborative efficiency. In such a network traffic consists mostly of large video files, which need to be transferred respecting strict deadlines. Restrictions on the underlying network can force the use of single-path routing mechanisms over multi-path approaches. In this paper, we investigate the influence of using single-path routing compared to multi-path routing in deadline-aware advance reservation (AR) systems for media production networks. We have modified our previously designed optimal multi-path advance reservation model to incorporate the single-path mechanism and heuristic alternatives are presented and thoroughly evaluated. The experimental results show that the single-path optimal model can only provide satisfactory performance when the network is not in contention. With the heuristic approach, when adequate bandwidth is provided, the multi-path approach outperforms the single-path by up to 7.3%
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Architectural approaches to a science network software-defined exchange
To interconnect research facilities across wide geographic areas, network operators deploy science networks, also referred to as Research and Education (R&E) networks. These networks allow experimenters to establish dedicated circuits between research facilities for transferring large amounts of data, by using advanced reservation systems. Intercontinental dedicated circuits typically require coordination between multiple administrative domains, which need to reach an agreement on a suitable advance reservation. To enhance provisioning capabilities of multi-domain advance reservations, we propose an architecture for end-to-end service orchestration in multi-domain science networks that leverages software-defined networking (SDN) and software-defined exchanges (SDX) for providing multi-path, multi-domain advance reservations. Our simulations show our orchestration architecture increases the reservation success rate. We evaluate our solution using GridFTP, one of the most popular tools for data transfers in the scientific community. Additionally, we propose an interface that domain scientists can use to request science network services from our orchestration framework. Furthermore, we propose a federated auditing framework (FAS) that allows an SDX to verify whether the configurations requested by a user are correctly enforced by participating SDN domains, whether the configurations requested are correctly removed after their expiration time, and whether configurations exist that are performing non-requested actions. We also propose an architecture for advance reservation access control using SDN and tokens.Ph.D
QoS Provisioning by Meta-Scheduling in Advance within SLA-Based Grid Environments
The establishment of agreements between users and the entities which manage the Grid resources is still a challenging task. On the one hand, an entity in charge of dealing with the communication with the users is needed, with the aim of signing resource usage contracts and also implementing some renegotiation techniques, among others. On the other hand, some mechanisms should be implemented which decide if the QoS requested could be achieved and, in such case, ensuring that the QoS agreement is provided. One way of increasing the probability of achieving the agreed QoS is by performing meta-scheduling of jobs in advance, that is, jobs are scheduled some time before they are actually executed. In this way, it becomes more likely that the appropriate resources are available to run the jobs when needed. So, this paper presents a framework built on top of Globus and the GridWay meta-scheduler to provide QoS by means of performing meta-scheduling in advance. Thanks to this, QoS requirements of jobs are met (i.e. jobs are finished within a deadline). Apart from that, the mechanisms needed to manage the communication between the users and the system are presented and implemented through SLA contracts based on the WS-Agreement specification
- …