3,604 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Computing server power modeling in a data center: survey,taxonomy and performance evaluation
Data centers are large scale, energy-hungry infrastructure serving the
increasing computational demands as the world is becoming more connected in
smart cities. The emergence of advanced technologies such as cloud-based
services, internet of things (IoT) and big data analytics has augmented the
growth of global data centers, leading to high energy consumption. This upsurge
in energy consumption of the data centers not only incurs the issue of surging
high cost (operational and maintenance) but also has an adverse effect on the
environment. Dynamic power management in a data center environment requires the
cognizance of the correlation between the system and hardware level performance
counters and the power consumption. Power consumption modeling exhibits this
correlation and is crucial in designing energy-efficient optimization
strategies based on resource utilization. Several works in power modeling are
proposed and used in the literature. However, these power models have been
evaluated using different benchmarking applications, power measurement
techniques and error calculation formula on different machines. In this work,
we present a taxonomy and evaluation of 24 software-based power models using a
unified environment, benchmarking applications, power measurement technique and
error formula, with the aim of achieving an objective comparison. We use
different servers architectures to assess the impact of heterogeneity on the
models' comparison. The performance analysis of these models is elaborated in
the paper
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
High-performance and fault-tolerant techniques for massive data distribution in online communities
The amount of digital information produced and consumed is increasing each day.
This rapid growth is motivated by the advances in computing power, hardware technologies,
and the popularization of user generated content networks. New hardware
is able to process larger quantities of data, which permits to obtain finer results, and
as a consequence more data is generated. In this respect, scientific applications have
evolved benefiting from the new hardware capabilities. This type of application is
characterized by requiring large amounts of information as input, generating a significant
amount of intermediate data resulting in large files. This increase not only
appears in terms of volume, but also in terms of size, we need to provide methods
that permit a efficient and reliable data access mechanism. Producing such a method
is a challenging task due to the amount of aspects involved. However, we can leverage
the knowledge found in social networks to improve the distribution process. In
this respect, the advent of the Web 2.0 has popularized the concept of social network,
which provides valuable knowledge about the relationships among users, and
the users with the data. However, extracting the knowledge and defining ways to
actively use it to increase the performance of a system remains an open research
direction.
Additionally, we must also take into account other existing limitations. In particular,
the interconnection between different elements of the system is one of the key
aspects. The availability of new technologies such as the mass-production of multicore
chips, large storage media, better sensors, etc. contributed to the increase of
data being produced. However, the underlying interconnection technologies have
not improved with the same speed as the others. This leads to a situation where
vast amounts of data can be produced and need to be consumed by a large number
of geographically distributed users, but the interconnection between both ends does
not match the required needs.
In this thesis, we address the problem of efficient and reliable data distribution in
a geographically distributed systems. In this respect, we focus on providing a solution
that 1) optimizes the use of existing resources, 2) does not requires changes in
the underlying interconnection, and 3) provides fault-tolerant capabilities. In order
to achieve this objectives, we define a generic data distribution architecture composed
of three main components: community detection module, transfer scheduling
module, and distribution controller. The community detection module leverages the
information found in the social network formed by the users requesting files and
produces a set of virtual communities grouping entities with similar interests. The
transfer scheduling module permits to produce a plan to efficiently distribute all requested
files improving resource utilization. For this purpose, we model the distribution
problem using linear programming and offer a method to permit a distributed
solving of the problem. Finally, the distribution controller manages the distribution
process using the aforementioned schedule, controls the available server infrastructure,
and launches new on-demand resources when necessary
- …