394 research outputs found

    Window-based Streaming Graph Partitioning Algorithm

    Full text link
    In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task but it also creates memory bottlenecks. These issues of memory limitation and enormous time complexity can be resolved using stream-based graph partitioning. A streaming graph partitioning algorithm reads vertices once and assigns that vertex to a partition accordingly. This is also called an one-pass algorithm. This paper proposes an efficient window-based streaming graph partitioning algorithm called WStream. The WStream algorithm is an edge-cut partitioning algorithm, which distributes a vertex among the partitions. Our results suggest that the WStream algorithm is able to partition large graph data efficiently while keeping the load balanced across different partitions, and communication to a minimum. Evaluation results with real workloads also prove the effectiveness of our proposed algorithm, and it achieves a significant reduction in load imbalance and edge-cut with different ranges of dataset

    New Method to Generate Balanced 2n-PSK STTCs

    Get PDF
    The first theoretical bases of the space-time trellis codes (STTCs) have been established by Tarokh (1998). Many STTCs have been published using several criteria proposed by Tarokh (1998) and Chen (2001) for slow and fast Rayleigh fading channels. More recently, a new class of codes called balanced codes, which contains all the best STTCs has been presented by Ngo (2007, 2008). These balanced codes have the same property: if the data are generated by a binary memoryless source with equally probable symbols, the used points of the MIMO constellation are generated with the same probability. Therefore, the systematic search for the best codes can be reduced to this class. Thus, the time to find the best codes is reduced. The new and general method is proposed to design balanced 2^n-PSK STTCs for several transmit antennas. This new method is simpler and faster than the previous one presented by Ngo 2007 in particular when the number of transmit antennas and n increase

    THE ROLE OF HUMAN RESOURCES IN SUSTAINABLE DEVELOPMENT OF THE ENERGY SECTOR

    Get PDF
    Sustainable development highlights the importance of energy sector in any economy by establishing specific targets in the field. Also, for a good human resources management were implemented strategies at national and international level being shown in this way their importance in sustainable development of states and global organizations. The role of human resources in sustainable development of the energy sector can be viewed from two points of view, on the one hand, is presented the influence of the energy sector on the social dimension and, on the other hand, is presented the influence of human resources on know-how, technologies and innovation in the energy field. This paper analysis some indicators for pointing out the role of human resources in energy sector and the results show that Romania has an increase in labor productivity while the active employed population in energy industry and supply is decreasing. We conclude by emphasizing the need for educational reform to recognize both human role in economy and the importance of the energy sector

    Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially.

    Get PDF
    Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-center variant which, given a set S of points from some metric space and a parameter k0, the algorithms yield solutions whose approximation ratios are a mere additive term \u3f5 away from those achievable by the best known polynomial-time sequential algorithms, a result that substantially improves upon the state of the art. Our algorithms are rather simple and adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Specifically, our analysis shows that the algorithms become very space-efficient for the important case of small (constant) D. These theoretical results are complemented with a set of experiments on real-world and synthetic datasets of up to over a billion points, which show that our algorithms yield better quality solutions over the state of the art while featuring excellent scalability, and that they also lend themselves to sequential implementations much faster than existing ones

    Low latency via redundancy

    Full text link
    Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks

    Using Trusted Execution Environments for Secure Stream Processing of Medical Data

    Full text link
    Processing sensitive data, such as those produced by body sensors, on third-party untrusted clouds is particularly challenging without compromising the privacy of the users generating it. Typically, these sensors generate large quantities of continuous data in a streaming fashion. Such vast amount of data must be processed efficiently and securely, even under strong adversarial models. The recent introduction in the mass-market of consumer-grade processors with Trusted Execution Environments (TEEs), such as Intel SGX, paves the way to implement solutions that overcome less flexible approaches, such as those atop homomorphic encryption. We present a secure streaming processing system built on top of Intel SGX to showcase the viability of this approach with a system specifically fitted for medical data. We design and fully implement a prototype system that we evaluate with several realistic datasets. Our experimental results show that the proposed system achieves modest overhead compared to vanilla Spark while offering additional protection guarantees under powerful attackers and threat models.Comment: 19th International Conference on Distributed Applications and Interoperable System

    Computable bounds in fork-join queueing systems

    Get PDF
    In a Fork-Join (FJ) queueing system an upstream fork station splits incoming jobs into N tasks to be further processed by N parallel servers, each with its own queue; the response time of one job is determined, at a downstream join station, by the maximum of the corresponding tasks' response times. This queueing system is useful to the modelling of multi-service systems subject to synchronization constraints, such as MapReduce clusters or multipath routing. Despite their apparent simplicity, FJ systems are hard to analyze. This paper provides the first computable stochastic bounds on the waiting and response time distributions in FJ systems. We consider four practical scenarios by combining 1a) renewal and 1b) non-renewal arrivals, and 2a) non-blocking and 2b) blocking servers. In the case of non blocking servers we prove that delays scale as O(logN), a law which is known for first moments under renewal input only. In the case of blocking servers, we prove that the same factor of log N dictates the stability region of the system. Simulation results indicate that our bounds are tight, especially at high utilizations, in all four scenarios. A remarkable insight gained from our results is that, at moderate to high utilizations, multipath routing 'makes sense' from a queueing perspective for two paths only, i.e., response times drop the most when N = 2; the technical explanation is that the resequencing (delay) price starts to quickly dominate the tempting gain due to multipath transmissions

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load
    corecore