3,428 research outputs found

    GraphH: High Performance Big Graph Analytics in Small Clusters

    Full text link
    It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high-performance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (Gather-Apply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs

    Dynamic Physiological Partitioning on a Shared-nothing Database Cluster

    Full text link
    Traditional DBMS servers are usually over-provisioned for most of their daily workloads and, because they do not show good-enough energy proportionality, waste a lot of energy while underutilized. A cluster of small (wimpy) servers, where its size can be dynamically adjusted to the current workload, offers better energy characteristics for these workloads. Yet, data migration, necessary to balance utilization among the nodes, is a non-trivial and time-consuming task that may consume the energy saved. For this reason, a sophisticated and easy to adjust partitioning scheme fostering dynamic reorganization is needed. In this paper, we adapt a technique originally created for SMP systems, called physiological partitioning, to distribute data among nodes, that allows to easily repartition data without interrupting transactions. We dynamically partition DB tables based on the nodes' utilization and given energy constraints and compare our approach with physical partitioning and logical partitioning methods. To quantify possible energy saving and its conceivable drawback on query runtimes, we evaluate our implementation on an experimental cluster and compare the results w.r.t. performance and energy consumption. Depending on the workload, we can substantially save energy without sacrificing too much performance

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    A review of High Performance Computing foundations for scientists

    Full text link
    The increase of existing computational capabilities has made simulation emerge as a third discipline of Science, lying midway between experimental and purely theoretical branches [1, 2]. Simulation enables the evaluation of quantities which otherwise would not be accessible, helps to improve experiments and provides new insights on systems which are analysed [3-6]. Knowing the fundamentals of computation can be very useful for scientists, for it can help them to improve the performance of their theoretical models and simulations. This review includes some technical essentials that can be useful to this end, and it is devised as a complement for researchers whose education is focused on scientific issues and not on technological respects. In this document we attempt to discuss the fundamentals of High Performance Computing (HPC) [7] in a way which is easy to understand without much previous background. We sketch the way standard computers and supercomputers work, as well as discuss distributed computing and discuss essential aspects to take into account when running scientific calculations in computers.Comment: 33 page

    Energy-aware Load Balancing Policies for the Cloud Ecosystem

    Full text link
    The energy consumption of computer and communication systems does not scale linearly with the workload. A system uses a significant amount of energy even when idle or lightly loaded. A widely reported solution to resource management in large data centers is to concentrate the load on a subset of servers and, whenever possible, switch the rest of the servers to one of the possible sleep states. We propose a reformulation of the traditional concept of load balancing aiming to optimize the energy consumption of a large-scale system: {\it distribute the workload evenly to the smallest set of servers operating at an optimal energy level, while observing QoS constraints, such as the response time.} Our model applies to clustered systems; the model also requires that the demand for system resources to increase at a bounded rate in each reallocation interval. In this paper we report the VM migration costs for application scaling.Comment: 10 Page
    • 

    corecore