708 research outputs found

    Profiling Distributed Virtual Environments by Tracing Causality

    Get PDF
    Real-time interactive systems such as virtual environments have high performance requirements, and profiling is a key part of the optimisation process to meet them. Traditional techniques based on metadata and static analysis have difficulty following causality in asynchronous systems. In this paper we explore a new technique for such systems. Timestamped samples of the system state are recorded at instrumentation points at runtime. These are assembled into a graph, and edges between dependent samples recovered. This approach minimises the invasiveness of the instrumentation, while retaining high accuracy. We describe how our instrumentation can be implemented natively in common environments, how its output can be processed into a graph describing causality, and how heterogeneous data sources can be incorporated into this to maximise the scope of the profiling. Across three case studies, we demonstrate the efficacy of this approach, and how it supports a variety of metrics for comprehensively bench-marking distributed virtual environments

    Replication and fault-tolerance in real-time systems

    Get PDF
    PhD ThesisThe increased availability of sophisticated computer hardware and the corresponding decrease in its cost has led to a widespread growth in the use of computer systems for realtime plant and process control applications. Such applications typically place very high demands upon computer control systems and the development of appropriate control software for these application areas can present a number of problems not normally encountered in other applications. First of all, real-time applications must be correct in the time domain as well as the value domain: returning results which are not only correct but also delivered on time. Further, since the potential for catastrophic failures can be high in a process or plant control environment, many real-time applications also have to meet high reliability requirements. These requirements will typically be met by means of a combination of fault avoidance and fault tolerance techniques. This thesis is intended to address some of the problems encountered in the provision of fault tolerance in real-time applications programs. Specifically,it considers the use of replication to ensure the availability of services in real-time systems. In a real-time environment, providing support for replicated services can introduce a number of problems. In particular, the scope for non-deterministic behaviour in real-time applications can be quite large and this can lead to difficultiesin maintainingconsistent internal states across the members of a replica group. To tackle this problem, a model is proposed for fault tolerant real-time objects which not only allows such objects to perform application specific recovery operations and real-time processing activities such as event handling, but which also allows objects to be replicated. The architectural support required for such replicated objects is also discussed and, to conclude, the run-time overheads associated with the use of such replicated services are considered.The Science and Engineering Research Council

    Submicron Systems Architecture Project : Semiannual Technical Report

    Get PDF
    The Mosaic C is an experimental fine-grain multicomputer based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM, processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed router. The chip architecture provides low-overhead and low-latency handling of message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are now in prototype production under a subcontract with Hewlett-Packard. We are planning to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic C hardware also includes host-interface boards and high-speed communication cables. The hardware developments and activities of the past eight months are described in section 2.1. The programming system that we are developing for the Mosaic C is based on the same message-passing, reactive-process, computational model that we have used with earlier multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain concurrency. A process executes only in response to receiving a message, and may in execution send messages, create new processes, and modify its persistent variables before it either exits or becomes dormant in preparation for receiving another message. These computations are expressed in an object-oriented programming notation, a derivative of C++ called C+-. The computational model and the C+- programming notation are described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides automatic process placement and highly distributed management of system resources. The Mosaic C runtime system is described in section 2.3

    Transaction Processing over Geo-Partitioned Data

    Get PDF
    Databases are a fundamental component of any web service, storing and managing all the service data. In large-scale web services, it is essential that the data storage systems used consider techniques such as partial replication, geo-replication, and weaker consistency models so that the expectations of these systems regarding availability and latency can be met as best as possible. In this dissertation, we address the problem of executing transactions on data that is partially replicated. In this sense, we adopt the transactional causal consistency semantics, the consistency model where a transaction accesses a causally consistent snapshot of the database. However, implementing this consistency model in a partially replicated setting raises several challenges regarding handling transactions that access data items replicated in different nodes. Our work aims to design and implement a novel algorithm for executing transactions over geo-partitioned data with transactional causal consistency semantics. We discuss the problems and design choices for executing transactions over partially replicated data and present a design to implement the proposed algorithm by extending a weakly consistent geo-replicated key-value store with partial replication, adding support for executing transactions involving geo-partitioned data items. In this context, we also addressed the problem of deciding the best strategy for searching data in replicas that hold only a part of the total data of a service and where the state of each replica might diverge. We evaluate our solution using microbenchmarks based on the TPC-H database. Our results show that the overhead of the system is low for the expected scenario of a low ratio of remote transactions.As bases de dados representam um componente fundamental de qualquer serviço web, armazenando e gerindo todos os dados do serviço. Em serviços web de grande escala, é essencial que os sistemas de armazenamento de dados utilizados considerem técnicas como a replicação parcial, geo-replicação e modelos de consistência mais fracos, de forma a que as expectativas dos utilizadores desses sistemas em relação à disponibilidade e latência possam ser atendidas da melhor forma possível. Nesta dissertação, abordamos o problema de executar transações sobre dados que estão parcialmente replicados. Nesse sentido, adotamos uma semântica de consistência transacional causal, o modelo de consistência em que uma transação acede a um snapshot causalmente consistente da base de dados. No entanto, implementar este modelo de consistência numa configuração parcialmente replicada levanta vários desafios relativamente à execução de transações que acedem a dados replicados em nós diferentes. O objetivo do nosso trabalho é projetar e implementar um novo algoritmo para a execução de transações sobre dados geo-particionados com semântica de consistência causal transacional. Discutimos os problemas e as opções de design para a execução de transações em dados parcialmente replicados e apresentamos um design para implementar o algoritmo proposto, estendendo um sistema de armazenamento chave-valor geo-replicado de consistência fraca com replicação parcial, adicionando suporte para executar transações envolvendo dados geo-particionados. Nesse contexto, também abordamos o problema de decidir a melhor estratégia para procurar dados em réplicas que guardam apenas uma parte total dos dados de um serviço e onde o estado de cada réplica pode divergir. Avaliamos a nossa solução utilizando microbenchmarks baseados na base de dados TPC-H. Os nossos resultados mostram que a carga adicional do sistema é baixa para o cenário esperado de uma baixa percentagem de transações remotas

    Estimating data divergence in cloud computing storage systems

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaMany internet services are provided through cloud computing infrastructures that are composed of multiple data centers. To provide high availability and low latency, data is replicated in machines in different data centers, which introduces the complexity of guaranteeing that clients view data consistently. Data stores often opt for a relaxed approach to replication, guaranteeing only eventual consistency, since it improves latency of operations. However, this may lead to replicas having different values for the same data. One solution to control the divergence of data in eventually consistent systems is the usage of metrics that measure how stale data is for a replica. In the past, several algorithms have been proposed to estimate the value of these metrics in a deterministic way. An alternative solution is to rely on probabilistic metrics that estimate divergence with a certain degree of certainty. This relaxes the need to contact all replicas while still providing a relatively accurate measurement. In this work we designed and implemented a solution to estimate the divergence of data in eventually consistent data stores, that scale to many replicas by allowing clientside caching. Measuring the divergence when there is a large number of clients calls for the development of new algorithms that provide probabilistic guarantees. Additionally, unlike previous works, we intend to focus on measuring the divergence relative to a state that can lead to the violation of application invariants.Partially funded by project PTDC/EIA EIA/108963/2008 and by an ERC Starting Grant, Agreement Number 30773

    Performance evaluation and model checking of probabilistic real-time actors

    Get PDF
    This dissertation is composed of two parts. In the first part, performance evaluation and verification of safety properties are provided for real-time actors. Recently, the actor-based language, Timed Rebeca, was introduced to model distributed and asynchronous systems with timing constraints and message passing communication. A toolset was developed for automated translation of Timed Rebeca models to Erlang. The translated code can be executed using a timed extension of McErlang for model checking and simulation. In the first part of this dissertation, we induce a new toolset that provides statistical model checking of Timed Rebeca models. Using statistical model checking, we are now able to verify larger models against safety properties comparing to McErlang model checking. We examine the typical case studies of elevators and ticket service to show the efficiency of statistical model checking and applicability of our toolset. In the second part of this dissertation, we enhance our modeling ability and cover more properties by performance evaluation and model checking of probabilistic real-time actors. Distributed systems exhibit probabilistic and nondeterministic behaviors and may have time constraints. Probabilistic Timed Rebeca (PTRebeca) is introduced as a timed and probabilistic actor-based language for modeling distributed real-time systems with asynchronous message passing. The semantics of PTRebeca is a Timed Markov Decision Process (TMDP). We provide SOS rules for PTRebeca, and develop two toolsets for analyzing PTRebeca models. The first toolset automatically generates a TMDP model from a PTRebeca model in the form of the input language of the PRISM model checker. We use PRISM for performance analysis of PTRebeca models against expected reachability and probabilistic reachability properties. Additionally, we develop another toolset to automatically generate a Markov Automaton from a PTRebeca model in the form of the input language of the Interactive Markov Chain Analyzer (IMCA). The IMCA can be used as the back-end model checker for performance analysis of PTRebeca models against expected reachability and probabilistic reachability properties. We present the needed time for the analysis of different case studies using PRISM-based and IMCA-based approaches. The IMCA-based approach needs considerably less time, and so has the ability of analyzing significantly larger models. We show the applicability of both approaches and the efficiency of our tools by analyzing a few case studies and experimental results.Þessi ritgerð er tvískipt. Í fyrri hlutanum er farið í mat og sannprófun á eiginleikum öryggis í rauntímalíkönum. Fyrir stuttu síðan var leikendabyggða málið, Timed Rebeca, notað við líkana dreifingu og ósamstillt kerfi með tímastillingu og samskipti í skilaboðum. Búið var til verkfærasett fyrir sjálfvirka þýðingu á Timed Rebeca líkön yfir í Erlang. Hægt er að nota þýdda kóðann með því að nota tímastillta framlengingu af McErlang fyrir líkanaprófun og hermun. Í fyrri hluta þessarar ritgerðar, ætlum við að kynna verkfærasettið sem veitir tölfræðilega prófun á líkön á Timed Rebeca líkön. Með því að nota tölfræðileg próf á líkön er núna hægt að sannreyna stærri líkön eins og í öryggiskröfum McErlang. Við rannsökum dæmigerðar ferilsathuganir af lyftum og miðasölu til að sýna fram á skilvirkni tölfræðilegra líkana og beitingu verkfærasettsins okkar. Í seinni hluta þessarar ritgerðar aukum við við getu líkanagerðarinnar og við náum yfir fleiri eiginleika með mati á framkvæmd og prófunum á líkönum á líkinda rauntíma leikara. Dreifð kerfi sýna líkindi og brigðgenga hegðun sem kunna að hafa tímamörk. Probabilistic Timed Rebeca (PTRebeca) er kynnt sem tímastillt og líkinda leikarabyggt mál líkindadreifðra rauntímakerfa með ósamstillta sendingu skilaboða. Merkingarfræði PTRebeca er Timed Markov Decision Process (TMDP). Við verðum með SOS reglur fyrir PTRebeca, og þróum tvö verkfærasett til að greina PTRebeca líkön.The work on this dissertation was supported by the project "Timed Asynchronous Reactive Objects in Distributed Systems: TARO" (nr.110020021) of the Icelandic Research Fund

    Performance Analysis of Live-Virtual-Constructive and Distributed Virtual Simulations: Defining Requirements in Terms of Temporal Consistency

    Get PDF
    This research extends the knowledge of live-virtual-constructive (LVC) and distributed virtual simulations (DVS) through a detailed analysis and characterization of their underlying computing architecture. LVCs are characterized as a set of asynchronous simulation applications each serving as both producers and consumers of shared state data. In terms of data aging characteristics, LVCs are found to be first-order linear systems. System performance is quantified via two opposing factors; the consistency of the distributed state space, and the response time or interaction quality of the autonomous simulation applications. A framework is developed that defines temporal data consistency requirements such that the objectives of the simulation are satisfied. Additionally, to develop simulations that reliably execute in real-time and accurately model hierarchical systems, two real-time design patterns are developed: a tailored version of the model-view-controller architecture pattern along with a companion Component pattern. Together they provide a basis for hierarchical simulation models, graphical displays, and network I/O in a real-time environment. For both LVCs and DVSs the relationship between consistency and interactivity is established by mapping threads created by a simulation application to factors that control both interactivity and shared state consistency throughout a distributed environment

    Effective memory management for mobile environments

    Get PDF
    Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks. In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness. We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack. Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques

    CSP for Executable Scientific Workflows

    Get PDF
    corecore