2,448 research outputs found

    Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning

    Get PDF
    Fine tuning distributed systems is considered to be a craftsmanship, relying on intuition and experience. This becomes even more challenging when the systems need to react in near real time, as streaming engines have to do to maintain pre-agreed service quality metrics. In this article, we present an automated approach that builds on a combination of supervised and reinforcement learning methods to recommend the most appropriate lever configurations based on previous load. With this, streaming engines can be automatically tuned without requiring a human to determine the right way and proper time to deploy them. This opens the door to new configurations that are not being applied today since the complexity of managing these systems has surpassed the abilities of human experts. We show how reinforcement learning systems can find substantially better configurations in less time than their human counterparts and adapt to changing workloads

    MLPerf Inference Benchmark

    Full text link
    Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

    Optimization towards Efficiency and Stateful of dispel4py

    Full text link
    Scientific workflows bridge scientific challenges with computational resources. While dispel4py, a stream-based workflow system, offers mappings to parallel enactment engines like MPI or Multiprocessing, its optimization primarily focuses on dynamic process-to-task allocation for improved performance. An efficiency gap persists, particularly with the growing emphasis on conserving computing resources. Moreover, the existing dynamic optimization lacks support for stateful applications and grouping operations. To address these issues, our work introduces a novel hybrid approach for handling stateful operations and groupings within workflows, leveraging a new Redis mapping. We also propose an auto-scaling mechanism integrated into dispel4py's dynamic optimization. Our experiments showcase the effectiveness of auto-scaling optimization, achieving efficiency while upholding performance. In the best case, auto-scaling reduces dispel4py's runtime to 87% compared to the baseline, using only 76% of process resources. Importantly, our optimized stateful dispel4py demonstrates a remarkable speedup, utilizing just 32% of the runtime compared to the contender.Comment: 13 pages, 13 figure

    Resource optimization of edge servers dealing with priority-based workloads by utilizing service level objective-aware virtual rebalancing

    Get PDF
    IoT enables profitable communication between sensor/actuator devices and the cloud. Slow network causing Edge data to lack Cloud analytics hinders real-time analytics adoption. VRebalance solves priority-based workload performance for stream processing at the Edge. BO is used in VRebalance to prioritize workloads and find optimal resource configurations for efficient resource management. Apache Storm platform was used with RIoTBench IoT benchmark tool for real-time stream processing. Tools were used to evaluate VRebalance. Study shows VRebalance is more effective than traditional methods, meeting SLO targets despite system changes. VRebalance decreased SLO violation rates by almost 30% for static priority-based workloads and 52.2% for dynamic priority-based workloads compared to hill climbing algorithm. Using VRebalance decreased SLO violations by 66.1% compared to Apache Storm\u27s default allocation

    Resource provisioning and scheduling algorithms for hybrid workflows in edge cloud computing

    Get PDF
    In recent years, Internet of Things (IoT) technology has been involved in a wide range of application domains to provide real-time monitoring, tracking and analysis services. The worldwide number of IoT-connected devices is projected to increase to 43 billion by 2023, and IoT technologies are expected to engaged in 25% of business sector. Latency-sensitive applications in scope of intelligent video surveillance, smart home, autonomous vehicle, augmented reality, are all emergent research directions in industry and academia. These applications are required connecting large number of sensing devices to attain the desired level of service quality for decision accuracy in a sensitive timely manner. Moreover, continuous data stream imposes processing large amounts of data, which adds a huge overhead on computing and network resources. Thus, latency-sensitive and resource-intensive applications introduce new challenges for current computing models, i.e, batch and stream. In this thesis, we refer to the integrated application model of stream and batch applications as a hybrid work ow model. The main challenge of the hybrid model is achieving the quality of service (QoS) requirements of the two computation systems. This thesis provides a systemic and detailed modeling for hybrid workflows which describes the internal structure of each application type for purposes of resource estimation, model systems tuning, and cost modeling. For optimizing the execution of hybrid workflows, this thesis proposes algorithms, techniques and frameworks to serve resource provisioning and task scheduling on various computing systems including cloud, edge cloud and cooperative edge cloud. Overall, experimental results provided in this thesis demonstrated strong evidences on the responsibility of proposing different understanding and vision on the applications of integrating stream and batch applications, and how edge computing and other emergent technologies like 5G networks and IoT will contribute on more sophisticated and intelligent solutions in many life disciplines for more safe, secure, healthy, smart and sustainable society

    BDEv 3.0: energy efficiency and microarchitectural characterization of Big Data processing frameworks

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Future Generation Computer Systems. The final authenticated version is available online at: https://doi.org/10.1016/j.future.2018.04.030[Abstract] As the size of Big Data workloads keeps increasing, the evaluation of distributed frameworks becomes a crucial task in order to identify potential performance bottlenecks that may delay the processing of large datasets. While most of the existing works generally focus only on execution time and resource utilization, analyzing other important metrics is key to fully understanding the behavior of these frameworks. For example, microarchitecture-level events can bring meaningful insights to characterize the interaction between frameworks and hardware. Moreover, energy consumption is also gaining increasing attention as systems scale to thousands of cores. This work discusses the current state of the art in evaluating distributed processing frameworks, while extending our Big Data Evaluator tool (BDEv) to extract energy efficiency and microarchitecture-level metrics from the execution of representative Big Data workloads. An experimental evaluation using BDEv demonstrates its usefulness to bring meaningful information from popular frameworks such as Hadoop, Spark and Flink.Ministerio de EconomĂ­a, Industria y Competitividad; TIN2016-75845-PMinisterio de EducaciĂłn; FPU14/02805Ministerio de EducaciĂłn; FPU15/0338

    Energy Aware Runtime Systems for Elastic Stream Processing Platforms

    Get PDF
    Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oförĂ€nderlig tillvĂ€xt i prestandakrav hos processorer, började den flerkĂ€rniga processor-revolutionen för ungefĂ€r 20 Ă„r sedan. Denna revolution skedde till största del som en lösning till begrĂ€nsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kĂ€rniga processorer. Den flerkĂ€rniga processor-revolutionen medförde inte enbart utmaningen gĂ€llande parallellprogrammering, m.a.o. förmĂ„gan att utveckla mjukvara som anvĂ€nder sig av alla delelement i de flerkĂ€rniga processorerna, men ocksĂ„ utmaningen med programmering av heterogena plattformar. FrĂ„gestĂ€llningen ”pĂ„ vilken processorelement skall en viss berĂ€kning utföras?” Ă€r vĂ€l kĂ€nt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmĂ€nna berĂ€kningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkĂ€rniga processorer pĂ„ olika system-on-chip plattformar, Ă€r heterogena parallella plattformar idag omfattande inom mĂ„nga domĂ€ner, frĂ„n konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera berĂ€kningarna pĂ„ en passande hĂ„rdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process Ă€r mycket utmanande för heterogena flerkĂ€rniga arkitekturer, och kan medföra fördelar nĂ€r det gĂ€ller energieffektivitet. Det största problemet Ă€r att de olika valmöjligheterna i designrymden kan vĂ€xa exponentiellt. Enligt den nuvarande trenden som förespĂ„r ökad heterogeniska aspekter i processorerna Ă€r utmaningen att hitta den mest passande placeringen av berĂ€kningarna pĂ„ hĂ„rdvaran Ă€nnu en forskningsfrĂ„ga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemalĂ€ggaren i Linux operativsystemet Ă€r inkapabel att hitta en effektiv placering av berĂ€kningarna pĂ„ den underliggande hĂ„rdvaran. Det enda mĂ€tsĂ€ttet som anvĂ€nds Ă€r processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. AnvĂ€ndning av detta mĂ€tsĂ€tt vid resursallokering resulterar i slöseri med energi. Denna forskning hĂ€rstammar frĂ„n observationerna att dessa tillvĂ€gagĂ„ngssĂ€tt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmĂ€nna mĂ„let att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) pĂ„ heterogena flerkĂ€rniga hĂ„rdvaruplattformar. Ström-applikationer Ă€r numera mycket vanliga i mjukvarudomĂ€n. Deras distinkta karaktĂ€r Ă€r inlĂ€sning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sĂ€tt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus pĂ„ energi- och prestandahantering

    A Survey on Automatic Parameter Tuning for Big Data Processing Systems

    Get PDF
    Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

    Dynamic data placement and discovery in wide-area networks

    Get PDF
    The workloads of online services and applications such as social networks, sensor data platforms and web search engines have become increasingly global and dynamic, setting new challenges to providing users with low latency access to data. To achieve this, these services typically leverage a multi-site wide-area networked infrastructure. Data access latency in such an infrastructure depends on the network paths between users and data, which is determined by the data placement and discovery strategies. Current strategies are static, which offer low latencies upon deployment but worse performance under a dynamic workload. We propose dynamic data placement and discovery strategies for wide-area networked infrastructures, which adapt to the data access workload. We achieve this with data activity correlation (DAC), an application-agnostic approach for determining the correlations between data items based on access pattern similarities. By dynamically clustering data according to DAC, network traffic in clusters is kept local. We utilise DAC as a key component in reducing access latencies for two application scenarios, emphasising different aspects of the problem: The first scenario assumes the fixed placement of data at sites, and thus focusses on data discovery. This is the case for a global sensor discovery platform, which aims to provide low latency discovery of sensor metadata. We present a self-organising hierarchical infrastructure consisting of multiple DAC clusters, maintained with an online and distributed split-and-merge algorithm. This reduces the number of sites visited, and thus latency, during discovery for a variety of workloads. The second scenario focusses on data placement. This is the case for global online services that leverage a multi-data centre deployment to provide users with low latency access to data. We present a geo-dynamic partitioning middleware, which maintains DAC clusters with an online elastic partition algorithm. It supports the geo-aware placement of partitions across data centres according to the workload. This provides globally distributed users with low latency access to data for static and dynamic workloads.Open Acces
    • 

    corecore