Search CORE

63 research outputs found

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management

Author: Fernandez RC
Kalyvianaki E
Migliavacca M
Pietzuch P
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2013
Field of study

As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM

CiteSeerX

City Research Online

Crossref

Spiral - Imperial College Digital Repository

Kent Academic Repository

Recommended from our members

The Design and Implementation of Low-Latency Prediction Serving Systems

Author: Crankshaw Daniel
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints

eScholarship - University of California

Architecting Energy Efficient Servers.

Author: Kgil Tae Ho
Publication venue
Publication date: 01/01/2007
Field of study

This dissertation investigates how energy efficient servers can be architected using current and future technology. We leverage recent trends in packaging and device technology to deliver low power and high throughput. Specifically at the package level, this dissertation looks at 3D stacking technology that has emerged as a promising solution in achieving energy efficiency by delivering high throughput at a low cost. It shows how one would leverage this new technology into a datacenter. 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture leveraging this technology, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple memory dies sufficient for a primary memory. The multiple memory dies are composed of DRAM. 3D stacking technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency along with the integration of non-volatile memory in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied. The PicoServer architecture targets server applications,which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. At the memory device level, this dissertation investigates how the system memory could be re-architected to reduce the rising power consumption of system memory and disk drives. Flash memory has emerged as a strong candidate to reduce system memory power while remaining cost effective than conventional system memory. This dissertation discusses how Flash could be integrated at the system level and provides insights on the architectural support for Flash in servers. Our architecture uses a two level disk cache composed of a relatively small DRAM, which includes a primary disk cache, and a Flash based secondary disk cache. Further, based on our observations, we found that the Flash based disk caches should be split into a read optimized disk cache and write optimized disk cache.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57602/2/tkgil_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Satellite Networks: Architectures, Applications, and Technologies

Author: Bhasin Kul
Publication venue
Publication date
Field of study

Since global satellite networks are moving to the forefront in enhancing the national and global information infrastructures due to communication satellites' unique networking characteristics, a workshop was organized to assess the progress made to date and chart the future. This workshop provided the forum to assess the current state-of-the-art, identify key issues, and highlight the emerging trends in the next-generation architectures, data protocol development, communication interoperability, and applications. Presentations on overview, state-of-the-art in research, development, deployment and applications and future trends on satellite networks are assembled

NASA Technical Reports Server

Cloud-computing strategies for sustainable ICT utilization : a decision-making framework for non-expert Smart Building managers

Author: Mualla Karmin Jamil
Publication venue: Energy, Geoscience, Infrastructure and Society
Publication date: 01/08/2016
Field of study

Virtualization of processing power, storage, and networking applications via cloud-computing allows Smart Buildings to operate heavy demand computing resources off-premises. While this approach reduces in-house costs and energy use, recent case-studies have highlighted complexities in decision-making processes associated with implementing the concept of cloud-computing. This complexity is due to the rapid evolution of these technologies without standardization of approach by those organizations offering cloud-computing provision as a commercial concern. This study defines the term Smart Building as an ICT environment where a degree of system integration is accomplished. Non-expert managers are highlighted as key users of the outcomes from this project given the diverse nature of Smart Buildings’ operational objectives. This research evaluates different ICT management methods to effectively support decisions made by non-expert clients to deploy different models of cloud-computing services in their Smart Buildings ICT environments. The objective of this study is to reduce the need for costly 3rd party ICT consultancy providers, so non-experts can focus more on their Smart Buildings’ core competencies rather than the complex, expensive, and energy consuming processes of ICT management. The gap identified by this research represents vulnerability for non-expert managers to make effective decisions regarding cloud-computing cost estimation, deployment assessment, associated power consumption, and management flexibility in their Smart Buildings ICT environments. The project analyses cloud-computing decision-making concepts with reference to different Smart Building ICT attributes. In particular, it focuses on a structured programme of data collection which is achieved through semi-structured interviews, cost simulations and risk-analysis surveys. The main output is a theoretical management framework for non-expert decision-makers across variously-operated Smart Buildings. Furthermore, a decision-support tool is designed to enable non-expert managers to identify the extent of virtualization potential by evaluating different implementation options. This is presented to correlate with contract limitations, security challenges, system integration levels, sustainability, and long-term costs. These requirements are explored in contrast to cloud demand changes observed across specified periods. Dependencies were identified to greatly vary depending on numerous organizational aspects such as performance, size, and workload. The study argues that constructing long-term, sustainable, and cost-efficient strategies for any cloud deployment, depends on the thorough identification of required services off and on-premises. It points out that most of today’s heavy-burdened Smart Buildings are outsourcing these services to costly independent suppliers, which causes unnecessary management complexities, additional cost, and system incompatibility. The main conclusions argue that cloud-computing cost can differ depending on the Smart Building attributes and ICT requirements, and although in most cases cloud services are more convenient and cost effective at the early stages of the deployment and migration process, it can become costly in the future if not planned carefully using cost estimation service patterns. The results of the study can be exploited to enhance core competencies within Smart Buildings in order to maximize growth and attract new business opportunities

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Analyzing Granger causality in climate data with time series classification methods

Author: Decubber Stijn
Demuzere Matthias
Miralles Diego
Papagiannopoulou Christina
Verhoest Niko
Waegeman Willem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

Ghent University Academic Bibliography

Technologies and Applications for Big Data Value

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2022
Field of study

This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

Directory of Open Access Books (DOAB)