394 research outputs found
Window-based Streaming Graph Partitioning Algorithm
In the recent years, the scale of graph datasets has increased to such a
degree that a single machine is not capable of efficiently processing large
graphs. Thereby, efficient graph partitioning is necessary for those large
graph applications. Traditional graph partitioning generally loads the whole
graph data into the memory before performing partitioning; this is not only a
time consuming task but it also creates memory bottlenecks. These issues of
memory limitation and enormous time complexity can be resolved using
stream-based graph partitioning. A streaming graph partitioning algorithm reads
vertices once and assigns that vertex to a partition accordingly. This is also
called an one-pass algorithm. This paper proposes an efficient window-based
streaming graph partitioning algorithm called WStream. The WStream algorithm is
an edge-cut partitioning algorithm, which distributes a vertex among the
partitions. Our results suggest that the WStream algorithm is able to partition
large graph data efficiently while keeping the load balanced across different
partitions, and communication to a minimum. Evaluation results with real
workloads also prove the effectiveness of our proposed algorithm, and it
achieves a significant reduction in load imbalance and edge-cut with different
ranges of dataset
New Method to Generate Balanced 2n-PSK STTCs
The first theoretical bases of the space-time trellis codes (STTCs) have been established by Tarokh (1998). Many STTCs have been published using several criteria proposed by Tarokh (1998) and Chen (2001) for slow and fast Rayleigh fading channels. More recently, a new class of codes called balanced codes, which contains all the best STTCs has been presented by Ngo (2007, 2008). These balanced codes have the same property: if the data are generated by a binary memoryless source with equally probable symbols, the used points of the MIMO constellation are generated with the same probability. Therefore, the systematic search for the best codes can be reduced to this class. Thus, the time to find the best codes is reduced. The new and general method is proposed to design balanced 2^n-PSK STTCs for several transmit antennas. This new method is simpler and faster than the previous one presented by Ngo 2007 in particular when the number of transmit antennas and n increase
THE ROLE OF HUMAN RESOURCES IN SUSTAINABLE DEVELOPMENT OF THE ENERGY SECTOR
Sustainable development highlights the importance of energy sector in any economy by establishing specific targets in the field. Also, for a good human resources management were implemented strategies at national and international level being shown in this way their importance in sustainable development of states and global organizations. The role of human resources in sustainable development of the energy sector can be viewed from two points of view, on the one hand, is presented the influence of the energy sector on the social dimension and, on the other hand, is presented the influence of human resources on know-how, technologies and innovation in the energy field. This paper analysis some indicators for pointing out the role of human resources in energy sector and the results show that Romania has an increase in labor productivity while the active employed population in energy industry and supply is decreasing. We conclude by emphasizing the need for educational reform to recognize both human role in economy and the importance of the energy sector
Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially.
Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-center variant which, given a set S of points from some metric space and a parameter k0, the algorithms yield solutions whose approximation ratios are a mere additive term \u3f5 away from those achievable by the best known polynomial-time sequential algorithms, a result that substantially improves upon the state of the art. Our algorithms are rather simple and adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Specifically, our analysis shows that the algorithms become very space-efficient for the important case of small (constant) D. These theoretical results are complemented with a set of experiments on real-world and synthetic datasets of up to over a billion points, which show that our algorithms yield better quality solutions over the state of the art while featuring excellent scalability, and that they also lend themselves to sequential implementations much faster than existing ones
Low latency via redundancy
Low latency is critical for interactive networked applications. But while we
know how to scale systems to increase capacity, reducing latency --- especially
the tail of the latency distribution --- can be much more difficult. In this
paper, we argue that the use of redundancy is an effective way to convert extra
capacity into reduced latency. By initiating redundant operations across
diverse resources and using the first result which completes, redundancy
improves a system's latency even under exceptional conditions. We study the
tradeoff with added system utilization, characterizing the situations in which
replicating all tasks reduces mean latency. We then demonstrate empirically
that replicating all operations can result in significant mean and tail latency
reduction in real-world systems including DNS queries, database servers, and
packet forwarding within networks
Using Trusted Execution Environments for Secure Stream Processing of Medical Data
Processing sensitive data, such as those produced by body sensors, on
third-party untrusted clouds is particularly challenging without compromising
the privacy of the users generating it. Typically, these sensors generate large
quantities of continuous data in a streaming fashion. Such vast amount of data
must be processed efficiently and securely, even under strong adversarial
models. The recent introduction in the mass-market of consumer-grade processors
with Trusted Execution Environments (TEEs), such as Intel SGX, paves the way to
implement solutions that overcome less flexible approaches, such as those atop
homomorphic encryption. We present a secure streaming processing system built
on top of Intel SGX to showcase the viability of this approach with a system
specifically fitted for medical data. We design and fully implement a prototype
system that we evaluate with several realistic datasets. Our experimental
results show that the proposed system achieves modest overhead compared to
vanilla Spark while offering additional protection guarantees under powerful
attackers and threat models.Comment: 19th International Conference on Distributed Applications and
Interoperable System
Computable bounds in fork-join queueing systems
In a Fork-Join (FJ) queueing system an upstream fork station splits incoming jobs into N tasks to be further processed by N parallel servers, each with its own queue; the response time of one job is determined, at a downstream join station, by the maximum of the corresponding tasks' response times. This queueing system is useful to the modelling of multi-service systems subject to synchronization constraints, such as MapReduce clusters or multipath routing. Despite their apparent simplicity, FJ systems are hard to analyze.
This paper provides the first computable stochastic bounds on the waiting and response time distributions in FJ systems. We consider four practical scenarios by combining 1a) renewal and 1b) non-renewal arrivals, and 2a) non-blocking and 2b) blocking servers. In the case of non blocking servers we prove that delays scale as O(logN), a law which is known for first moments under renewal input only. In the case of blocking servers, we prove that the same factor of log N dictates the stability region of the system. Simulation results indicate that our bounds are tight, especially at high utilizations, in all four scenarios. A remarkable insight gained from our results is that, at moderate to high utilizations, multipath routing 'makes sense' from a queueing perspective for two paths only, i.e., response times drop the most when N = 2; the technical explanation is that the resequencing (delay) price starts to quickly dominate the tempting gain due to multipath transmissions
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
- …