406 research outputs found
Asynchronous Snapshots of Actor Systems for Latency-Sensitive Applications
The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging purposes. A key issue is that snapshotting blocks all other operations. In modern latency-sensitive applications, stopping the application to persist its state needs to be avoided, because users may not tolerate the increased request latency. In order to minimize the impact of snapshotting on request latency, our approach persists the applicationâs state asynchronously by capturing partial heaps, completing snapshots step by step. Additionally, our solution is transparent and supports arbitrary object graphs. We prototyped our snapshotting approach on top of the Truffle/Graal platform and evaluated it with the Savina benchmarks and the Acme Air microservice application. When performing a snapshot every thousand Acme Air requests, the number of slow requests ( 0.007% of all requests) with latency above 100ms increases by 5.43%. Our Savina microbenchmark results detail how different utilization patterns impact snapshotting cost. To the best of our knowledge, this is the first system that enables asynchronous snapshotting of actor applications, i.e. without stop-the-world synchronization, and thereby minimizes the impact on latency. We thus believe it enables new deployment and debugging options for actor systems
A Deep Reinforcement Learning based Algorithm for Time and Cost Optimized Scaling of Serverless Applications
Serverless computing has gained a strong traction in the cloud computing
community in recent years. Among the many benefits of this novel computing
model, the rapid auto-scaling capability of user applications takes prominence.
However, the offer of adhoc scaling of user deployments at function level
introduces many complications to serverless systems. The added delay and
failures in function request executions caused by the time consumed for
dynamically creating new resources to suit function workloads, known as the
cold-start delay, is one such very prevalent shortcoming. Maintaining idle
resource pools to alleviate this issue often results in wasted resources from
the cloud provider perspective. Existing solutions to address this limitation
mostly focus on predicting and understanding function load levels in order to
proactively create required resources. Although these solutions improve
function performance, the lack of understanding on the overall system
characteristics in making these scaling decisions often leads to the
sub-optimal usage of system resources. Further, the multi-tenant nature of
serverless systems requires a scalable solution adaptable for multiple
co-existing applications, a limitation seen in most current solutions. In this
paper, we introduce a novel multi-agent Deep Reinforcement Learning based
intelligent solution for both horizontal and vertical scaling of function
resources, based on a comprehensive understanding on both function and system
requirements. Our solution elevates function performance reducing cold starts,
while also offering the flexibility for optimizing resource maintenance cost to
the service providers. Experiments conducted considering varying workload
scenarios show improvements of up to 23% and 34% in terms of application
latency and request failures, while also saving up to 45% in infrastructure
cost for the service providers.Comment: 15 pages, 22 figure
Recommended from our members
Distributed virtual environment scalability and security
Distributed virtual environments (DVEs) have been an active area of research and engineering for more than 20 years. The most widely deployed DVEs are network games such as Quake, Halo, and World of Warcraft (WoW), with millions of users and billions of dollars in annual revenue. Deployed DVEs remain expensive centralized implementations despite significant research outlining ways to distribute DVE workloads.
This dissertation shows previous DVE research evaluations are inconsistent with deployed DVE needs. Assumptions about avatar movement and proximity - fundamental scale factors - do not match WoWâs workload, and likely the workload of other deployed DVEs. Alternate workload models are explored and preliminary conclusions presented. Using realistic workloads it is shown that a fully decentralized DVE cannot be deployed to todayâs consumers, regardless of its overhead.
Residential broadband speeds are improving, and this limitation will eventually disappear. When it does, appropriate security mechanisms will be a fundamental requirement for technology adoption.
A trusted auditing system (âCarbonâ) is presented which has good security, scalability, and resource characteristics for decentralized DVEs. When performing exhaustive auditing, Carbon adds 27% network overhead to a decentralized DVE with a WoW-like workload. This resource consumption can be reduced significantly, depending upon the DVEâs risk tolerance.
Finally, the Pairwise Random Protocol (PRP) is described. PRP enables adversaries to fairly resolve probabilistic activities, an ability missing from most decentralized DVE security proposals.
Thus, this dissertations contribution is to address two of the obstacles for deploying research on decentralized DVE architectures. First, lack of evidence that research results apply to existing DVEs. Second, the lack of security systems combining appropriate security guarantees with acceptable overhead
The AXIOM software layers
AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft
A Survey on the Evolution of Stream Processing Systems
Stream processing has been an active research field for more than 20 years,
but it is now witnessing its prime time due to recent successful efforts by the
research community and numerous worldwide open-source communities. This survey
provides a comprehensive overview of fundamental aspects of stream processing
systems and their evolution in the functional areas of out-of-order data
management, state management, fault tolerance, high availability, load
management, elasticity, and reconfiguration. We review noteworthy past research
findings, outline the similarities and differences between early ('00-'10) and
modern ('11-'18) streaming systems, and discuss recent trends and open
problems.Comment: 34 pages, 15 figures, 5 table
Peer-to-peer network architecture for massive online gaming
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Johannesburg, 2014.Virtual worlds and massive multiplayer online games are amongst the most popular applications on the
Internet. In order to host these applications a reliable architecture is required. It is essential for the
architecture to handle high user loads, maintain a complex game state, promptly respond to game interactions,
and prevent cheating, amongst other properties. Many of todayâs Massive Multiplayer Online
Games (MMOG) use client-server architectures to provide multiplayer service. Clients (players) send
their actions to a server. The latter calculates the game state and publishes the information to the clients.
Although the client-server architecture has been widely adopted in the past for MMOG, it suffers from
many limitations. First, applications based on a client-server architecture are difficult to support and
maintain given the dynamic user base of online games. Such architectures do not easily scale (or handle
heavy loads). Also, the server constitutes a single point of failure. We argue that peer-to-peer architectures
can provide better support for MMOG. Peer-to-peer architectures can enable the user base to scale
to a large number. They also limit disruptions experienced by players due to other nodes failing.
This research designs and implements a peer-to-peer architecture for MMOG. The peer-to-peer architecture
aims at reducing message latency over the network and on the application layer. We refine the
communication between nodes in the architecture to reduce network latency by using SPDY, a protocol
designed to reduce web page load time. For the application layer, an event-driven paradigm was used to
process messages. Through user load simulation, we show that our peer-to-peer design is able to process
and reliably deliver messages in a timely manner. Furthermore, by distributing the work conducted by a
game server, our research shows that a peer-to-peer architecture responds quicker to requests compared
to client-server models
Towards Implicit Parallel Programming for Systems
Multi-core processors require a program to be decomposable into independent parts that can execute in parallel in order to scale performance with the number of cores. But parallel programming is hard especially when the program requires state, which many system programs use for optimization, such as for example a cache to reduce disk I/O. Most prevalent parallel programming models do not support a notion of state and require the programmer to synchronize state access manually, i.e., outside the realms of an associated optimizing compiler. This prevents the compiler to introduce parallelism automatically and requires the programmer to optimize the program manually.
In this dissertation, we propose a programming language/compiler co-design to provide a new programming model for implicit parallel programming with state and a compiler that can optimize the program for a parallel execution.
We define the notion of a stateful function along with their composition and control structures. An example implementation of a highly scalable server shows that stateful functions smoothly integrate into existing programming language concepts, such as object-oriented programming and programming with structs. Our programming model is also highly practical and allows to gradually adapt existing code bases. As a case study, we implemented a new data processing core for the Hadoop Map/Reduce system to overcome existing performance bottlenecks. Our lambda-calculus-based compiler automatically extracts parallelism without changing the program's semantics. We added further domain-specific semantic-preserving transformations that reduce I/O calls for microservice programs. The runtime format of a program is a dataflow graph that can be executed in parallel, performs concurrent I/O and allows for non-blocking live updates
Towards Implicit Parallel Programming for Systems
Multi-core processors require a program to be decomposable into independent parts that can execute in parallel in order to scale performance with the number of cores. But parallel programming is hard especially when the program requires state, which many system programs use for optimization, such as for example a cache to reduce disk I/O. Most prevalent parallel programming models do not support a notion of state and require the programmer to synchronize state access manually, i.e., outside the realms of an associated optimizing compiler. This prevents the compiler to introduce parallelism automatically and requires the programmer to optimize the program manually.
In this dissertation, we propose a programming language/compiler co-design to provide a new programming model for implicit parallel programming with state and a compiler that can optimize the program for a parallel execution.
We define the notion of a stateful function along with their composition and control structures. An example implementation of a highly scalable server shows that stateful functions smoothly integrate into existing programming language concepts, such as object-oriented programming and programming with structs. Our programming model is also highly practical and allows to gradually adapt existing code bases. As a case study, we implemented a new data processing core for the Hadoop Map/Reduce system to overcome existing performance bottlenecks. Our lambda-calculus-based compiler automatically extracts parallelism without changing the program's semantics. We added further domain-specific semantic-preserving transformations that reduce I/O calls for microservice programs. The runtime format of a program is a dataflow graph that can be executed in parallel, performs concurrent I/O and allows for non-blocking live updates
Control and systems software for the Cosmology Large Angular Scale Surveyor (CLASS)
The Cosmology Large Angular Scale Surveyor (CLASS) is an array of
polarization-sensitive millimeter wave telescopes that observes ~70% of the sky
at frequency bands centered near 40GHz, 90GHz, 150GHz, and 220GHz from the
Atacama desert of northern Chile. Here, we describe the architecture of the
software used to control the telescopes, acquire data from the various
instruments, schedule observations, monitor the status of the instruments and
observations, create archival data packages, and transfer data packages to
North America for analysis. The computer and network architecture of the CLASS
observing site is also briefly discussed. This software and architecture has
been in use since 2016, operating the telescopes day and night throughout the
year, and has proven successful in fulfilling its design goals.Comment: 19 pages, 8 figures, to appear in Proc. SPI
- âŠ