15,236 research outputs found
Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning
Fine tuning distributed systems is considered to be a craftsmanship, relying
on intuition and experience. This becomes even more challenging when the
systems need to react in near real time, as streaming engines have to do to
maintain pre-agreed service quality metrics. In this article, we present an
automated approach that builds on a combination of supervised and reinforcement
learning methods to recommend the most appropriate lever configurations based
on previous load. With this, streaming engines can be automatically tuned
without requiring a human to determine the right way and proper time to deploy
them. This opens the door to new configurations that are not being applied
today since the complexity of managing these systems has surpassed the
abilities of human experts. We show how reinforcement learning systems can find
substantially better configurations in less time than their human counterparts
and adapt to changing workloads
Group-In: Group Inference from Wireless Traces of Mobile Devices
This paper proposes Group-In, a wireless scanning system to detect static or
mobile people groups in indoor or outdoor environments. Group-In collects only
wireless traces from the Bluetooth-enabled mobile devices for group inference.
The key problem addressed in this work is to detect not only static groups but
also moving groups with a multi-phased approach based only noisy wireless
Received Signal Strength Indicator (RSSIs) observed by multiple wireless
scanners without localization support. We propose new centralized and
decentralized schemes to process the sparse and noisy wireless data, and
leverage graph-based clustering techniques for group detection from short-term
and long-term aspects. Group-In provides two outcomes: 1) group detection in
short time intervals such as two minutes and 2) long-term linkages such as a
month. To verify the performance, we conduct two experimental studies. One
consists of 27 controlled scenarios in the lab environments. The other is a
real-world scenario where we place Bluetooth scanners in an office environment,
and employees carry beacons for more than one month. Both the controlled and
real-world experiments result in high accuracy group detection in short time
intervals and sampling liberties in terms of the Jaccard index and pairwise
similarity coefficient.Comment: This work has been funded by the EU Horizon 2020 Programme under
Grant Agreements No. 731993 AUTOPILOT and No.871249 LOCUS projects. The
content of this paper does not reflect the official opinion of the EU.
Responsibility for the information and views expressed therein lies entirely
with the authors. Proc. of ACM/IEEE IPSN'20, 202
Managing Uncertainty: A Case for Probabilistic Grid Scheduling
The Grid technology is evolving into a global, service-orientated
architecture, a universal platform for delivering future high demand
computational services. Strong adoption of the Grid and the utility computing
concept is leading to an increasing number of Grid installations running a wide
range of applications of different size and complexity. In this paper we
address the problem of elivering deadline/economy based scheduling in a
heterogeneous application environment using statistical properties of job
historical executions and its associated meta-data. This approach is motivated
by a study of six-month computational load generated by Grid applications in a
multi-purpose Grid cluster serving a community of twenty e-Science projects.
The observed job statistics, resource utilisation and user behaviour is
discussed in the context of management approaches and models most suitable for
supporting a probabilistic and autonomous scheduling architecture
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
- …