261 research outputs found
Shared-object System Equilibria: Delay and Throughput Analysis
We consider shared-object systems that require their threads to fulfill the
system jobs by first acquiring sequentially the objects needed for the jobs and
then holding on to them until the job completion. Such systems are in the core
of a variety of shared-resource allocation and synchronization systems. This
work opens a new perspective to study the expected job delay and throughput
analytically, given the possible set of jobs that may join the system
dynamically.
We identify the system dependencies that cause contention among the threads
as they try to acquire the job objects. We use these observations to define the
shared-object system equilibria. We note that the system is in equilibrium
whenever the rate in which jobs arrive at the system matches the job completion
rate. These equilibria consider not only the job delay but also the job
throughput, as well as the time in which each thread blocks other threads in
order to complete its job. We then further study in detail the thread work
cycles and, by using a graph representation of the problem, we are able to
propose procedures for finding and estimating equilibria, i.e., discovering the
job delay and throughput, as well as the blocking time.
To the best of our knowledge, this is a new perspective, that can provide
better analytical tools for the problem, in order to estimate performance
measures similar to ones that can be acquired through experimentation on
working systems and simulations, e.g., as job delay and throughput in
(distributed) shared-object systems
MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is very useful in optimization problems
with high-dimensional non-convex target functions, and hence constitutes an
important component of several Machine Learning and Data Analytics methods.
Recently there have been significant works on understanding the parallelism
inherent to SGD, and its convergence properties. Asynchronous, parallel SGD
(AsyncPSGD) has received particular attention, due to observed performance
benefits. On the other hand, asynchrony implies inherent challenges in
understanding the execution of the algorithm and its convergence, stemming from
the fact that the contribution of a thread might be based on an old (stale)
view of the state. In this work we aim to deepen the understanding of AsyncPSGD
in order to increase the statistical efficiency in the presence of stale
gradients. We propose new models for capturing the nature of the staleness
distribution in a practical setting. Using the proposed models, we derive a
staleness-adaptive SGD framework, MindTheStep-AsyncPSGD, for adapting the step
size in an online-fashion, which provably reduces the negative impact of
asynchrony. Moreover, we provide general convergence time bounds for a wide
class of staleness-adaptive step size strategies for convex target functions.
We also provide a detailed empirical study, showing how our approach implies
faster convergence for deep learning applications.Comment: 12 pages, 3 figures, accepted in IEEE BigData 201
Evaluating passive neighborhood discovery for Low Power Listening MAC protocols
Low Power Listening (LPL) MAC protocols are widely used in today's sensors networks for duty cycling. Their simplicity and power efficiency ensures a long network life when nodes are battery driven and their easy deployment and lower cost of maintenance makes them suitable to be used in hard-to-access places and harsh conditions. We argue that to fully utilize energy efficiency provided by LPL, other protocols in the protocol stack should be aware of mechanisms. In this paper, we focus on neighborhood discovery protocols and discuss their energy efficient integration with LPL. Then, we study the possibility of using a completely passive approach for neighborhood discovery in such networks and provide an analytical model for its performance characteristics. We verify our performance model both by simulation and implementation in TinyOS. Our evaluation results confirm the efficiency of our proposed method in duty-cycled sensor networks
Multiple pattern matching for network security applications: Acceleration through vectorization (pre-print version)
As both new network attacks emerge and network traffic increases in volume, the need to perform network traffic inspection at high rates is ever increasing. The core of many security applications that inspect network traffic (such as Network Intrusion Detection) is pattern matching. At the same time, pattern matching is a major performance bottleneck for those applications: indeed, it is shown to contribute to more than 70% of the total running time of Intrusion Detection Systems. Although numerous efficient approaches to this problem have been proposed on custom hardware, it is challenging for pattern matching algorithms to gain benefit from the advances in commodity hardware. This becomes even more relevant with the adoption of Network Function Virtualization, that moves network services, such as Network Intrusion Detection, to the cloud, where scaling on commodity hardware is key for performance. In this paper, we tackle the problem of pattern matching and show how to leverage the architecture features found in commodity platforms. We present efficient algorithmic designs that achieve good cache locality and make use of modern vectorization techniques to utilize data parallelism within each core. We first identify properties of pattern matching that make it fit for vectorization and show how to use them in the algorithmic design. Second, we build on an earlier, cache-aware algorithmic design and show how we apply cache-locality combined with SIMD gather instructions to pattern matching. Third, we complement our algorithms with an analytical model that predicts their performance and that can be used to easily evaluate alternative designs. We evaluate our algorithmic design with open data sets of real-world network traffic: Our results on two different platforms, Haswell and Xeon-Phi, show a speedup of 1.8x and 3.6x, respectively, over Direct Filter Classification (DFC), a recently proposed algorithm by Choi et al. for pattern matching exploiting cache locality, and a speedup of more than 2.3x over Aho–Corasick, a widely used algorithm in today\u27s Intrusion Detection Systems. Finally, we utilize highly parallel hardware platforms, evaluate the scalability of our algorithms and compare it to parallel implementations of DFC and Aho–Corasick, achieving processing throughput of up to 45Gbps and close to 2 times higher throughput than Aho–Corasick
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
Geographical Peer Matching for P2P Energy Sharing
Significant cost reductions attract ever more households to invest in
small-scale renewable electricity generation and storage. Such distributed
resources are not used in the most effective way when only used individually,
as sharing them provides even greater cost savings. Energy Peer-to-Peer (P2P)
systems have thus been shown to be beneficial for prosumers and consumers
through reductions in energy cost while also being attractive to grid or
service providers. However, many practical challenges have to be overcome
before all players could gain in having efficient and automated local energy
communities; such challenges include the inherent complexity of matching
together geographically distributed peers and the significant computations
required to calculate the local matching preferences. Hence dedicated
algorithms are required to be able to perform a cost-efficient matching of
thousands of peers in a computational-efficient fashion. We define and analyze
in this work a precise mathematical modelling of the geographical peer matching
problem and several heuristics solving it. Our experimental study, based on
real-world energy data, demonstrates that our solutions are efficient both in
terms of cost savings achieved by the peers and in terms of communication and
computing requirements. Our scalable algorithms thus provide one core building
block for practical and data-efficient peer-to-peer energy sharing communities
within large-scale optimization systems
TinTiN: Travelling in time (if necessary) to deal with out-of-order data in streaming aggregation
Cyber-Physical Systems (CPS) rely on data stream processing for high-throughput, low-latency analysis with correctness and accuracy guarantees (building on deterministic execution) for monitoring, safety or security applications.The trade-offs in processing performance and results\u27 accuracy are nonetheless application-dependent. While some applications need strict deterministic execution, others can value fast (but possibly approximated) answers.Despite the existing literature on how to relax and trade strict determinism for efficiency or deadlines, we lack a formal characterization of levels of determinism, needed by industries to assess whether or not such trade-offs are acceptable.To bridge the gap, we introduce the notion of D-bounded eventual determinism, where D is the maximum out-of-order delay of the input data.We design and implement TinTiN, a streaming middleware that can be used in combination with user-defined streaming applications, to provably enforce D-bounded eventual determinism.We evaluate TinTiN with a real-world streaming application for Advanced Metering Infrastructure (AMI) monitoring, showing it provides an order of magnitude improvement in processing performance, while minimizing delays in output generation, compared to a state-of-the-art strictly deterministic solution that waits for time proportional to D, for each input tuple, before generating output that depends on it
Managing your Trees: Insights from a Metropolitan-Scale Low-Power Wireless Network
Low-power wireless, such as IEEE 802.15.4, is envisioned as one key technology for wireless control and communication. In the context of the Advanced Metering Infrastructure (AMI), it serves as an energy-efficient communication technology for both communications at building-scale networks and city-scale networks. Understanding real-world challenges and key properties of 802.15.4 based networks is an essential requirement for both the research community and practitioners: When deploying and operating low-power wireless networks at metropolitan-scale, a deep knowledge is essential to ensure network availability and performance at production-level quality. Similarly, researchers require realistic network models when developing new algorithms and protocols.
In this paper, we present new and real-world insights from a deployed metropolitan-scale low-power wireless network: It includes 300,000 individual wireless connected meters and covers a city with roughly 600,000 inhabitants. Our findings, for example, help to estimate real-world parameters such as the typical size of routing trees, their balance, and their dynamics over time. Moreover, these insights facilitate the understanding and the realistic calibration of simulation models in key properties such as reliability and throughput
- …