3,428 research outputs found
GraphH: High Performance Big Graph Analytics in Small Clusters
It is common for real-world applications to analyze big graphs using
distributed graph processing systems. Popular in-memory systems require an
enormous amount of resources to handle big graphs. While several out-of-core
approaches have been proposed for processing big graphs on disk, the high disk
I/O overhead could significantly reduce performance. In this paper, we propose
GraphH to enable high-performance big graph analytics in small clusters.
Specifically, we design a two-stage graph partition scheme to evenly divide the
input graph into partitions, and propose a GAB (Gather-Apply-Broadcast)
computation model to make each worker process a partition in memory at a time.
We use an edge cache mechanism to reduce the disk I/O overhead, and design a
hybrid strategy to improve the communication performance. GraphH can
efficiently process big graphs in small clusters or even a single commodity
server. Extensive evaluations have shown that GraphH could be up to 7.8x faster
compared to popular in-memory systems, such as Pregel+ and PowerGraph when
processing generic graphs, and more than 100x faster than recently proposed
out-of-core systems, such as GraphD and Chaos when processing big graphs
Dynamic Physiological Partitioning on a Shared-nothing Database Cluster
Traditional DBMS servers are usually over-provisioned for most of their daily
workloads and, because they do not show good-enough energy proportionality,
waste a lot of energy while underutilized. A cluster of small (wimpy) servers,
where its size can be dynamically adjusted to the current workload, offers
better energy characteristics for these workloads. Yet, data migration,
necessary to balance utilization among the nodes, is a non-trivial and
time-consuming task that may consume the energy saved. For this reason, a
sophisticated and easy to adjust partitioning scheme fostering dynamic
reorganization is needed. In this paper, we adapt a technique originally
created for SMP systems, called physiological partitioning, to distribute data
among nodes, that allows to easily repartition data without interrupting
transactions. We dynamically partition DB tables based on the nodes'
utilization and given energy constraints and compare our approach with physical
partitioning and logical partitioning methods. To quantify possible energy
saving and its conceivable drawback on query runtimes, we evaluate our
implementation on an experimental cluster and compare the results w.r.t.
performance and energy consumption. Depending on the workload, we can
substantially save energy without sacrificing too much performance
Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments
Data centres that use consumer-grade disks drives and distributed
peer-to-peer systems are unreliable environments to archive data without enough
redundancy. Most redundancy schemes are not completely effective for providing
high availability, durability and integrity in the long-term. We propose alpha
entanglement codes, a mechanism that creates a virtual layer of highly
interconnected storage devices to propagate redundant information across a
large scale storage system. Our motivation is to design flexible and practical
erasure codes with high fault-tolerance to improve data durability and
availability even in catastrophic scenarios. By flexible and practical, we mean
code settings that can be adapted to future requirements and practical
implementations with reasonable trade-offs between security, resource usage and
performance. The codes have three parameters. Alpha increases storage overhead
linearly but increases the possible paths to recover data exponentially. Two
other parameters increase fault-tolerance even further without the need of
additional storage. As a result, an entangled storage system can provide high
availability, durability and offer additional integrity: it is more difficult
to modify data undetectably. We evaluate how several redundancy schemes perform
in unreliable environments and show that alpha entanglement codes are flexible
and practical codes. Remarkably, they excel at code locality, hence, they
reduce repair costs and become less dependent on storage locations with poor
availability. Our solution outperforms Reed-Solomon codes in many disaster
recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially
supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018
48th Annual IEEE/IFIP International Conference on Dependable Systems and
Networks (DSN
A review of High Performance Computing foundations for scientists
The increase of existing computational capabilities has made simulation
emerge as a third discipline of Science, lying midway between experimental and
purely theoretical branches [1, 2]. Simulation enables the evaluation of
quantities which otherwise would not be accessible, helps to improve
experiments and provides new insights on systems which are analysed [3-6].
Knowing the fundamentals of computation can be very useful for scientists, for
it can help them to improve the performance of their theoretical models and
simulations. This review includes some technical essentials that can be useful
to this end, and it is devised as a complement for researchers whose education
is focused on scientific issues and not on technological respects. In this
document we attempt to discuss the fundamentals of High Performance Computing
(HPC) [7] in a way which is easy to understand without much previous
background. We sketch the way standard computers and supercomputers work, as
well as discuss distributed computing and discuss essential aspects to take
into account when running scientific calculations in computers.Comment: 33 page
Energy-aware Load Balancing Policies for the Cloud Ecosystem
The energy consumption of computer and communication systems does not scale
linearly with the workload. A system uses a significant amount of energy even
when idle or lightly loaded. A widely reported solution to resource management
in large data centers is to concentrate the load on a subset of servers and,
whenever possible, switch the rest of the servers to one of the possible sleep
states. We propose a reformulation of the traditional concept of load balancing
aiming to optimize the energy consumption of a large-scale system: {\it
distribute the workload evenly to the smallest set of servers operating at an
optimal energy level, while observing QoS constraints, such as the response
time.} Our model applies to clustered systems; the model also requires that the
demand for system resources to increase at a bounded rate in each reallocation
interval. In this paper we report the VM migration costs for application
scaling.Comment: 10 Page
- âŠ