Search CORE

56 research outputs found

Implementation of the Metal Privileged Architecture

Author: Hassani Fatemeh
Publication venue: 'University of Waterloo'
Publication date: 24/08/2020
Field of study

The privileged architecture of modern computer architectures is expanded through new architectural features that are implemented in hardware or through instruction set extensions. These extensions are tied to particular architecture and operating system developers are not able to customize the privileged mechanisms. As a result, they have to work around fixed abstractions provided by processor vendors to implement desired functionalities. Programmable approaches such as PALcode also remain heavily tied to the hardware and modifying the privileged architecture has to be done by the processor manufacturer. To accelerate operating system development and enable rapid prototyping of new operating system designs and features, we need to rethink the privileged architecture design. We present a new abstraction called Metal that enables extensions to the architecture by the operating system. It provides system developers with a general-purpose and easy-to-use interface to build a variety of facilities that range from performance measurements to novel privilege models. We implement a simplified version of the Alpha architecture which we call μAlpha and build a prototype of Metal on this architecture. μAlpha is a five-stage pipelined processor with a multi-level cache hierarchy. Lastly, we implement a few facilities in Metal including system calls and transactional memory to show the practicality of Metal

University of Waterloo's Institutional Repository

Enhancing virtualized application performance through dynamic adaptive paging mode selection

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Crossref

Recommended from our members

Overcoming the Intuition Wall: Measurement and Analysis in Computer Architecture

Author: Demme John David
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

These are exciting times for computer architecture research. Today there is significant demand to improve the performance and energy-efficiency of emerging, transformative applications which are being hammered out by the hundreds for new computing platforms and usage models. This booming growth of applications and the variety of programming languages used to create them is challenging our ability as architects to rapidly and rigorously characterize these applications. Concurrently, hardware has become more complex with the emergence of accelerators, multicore systems, and heterogeneity caused by further divergence between processor market segments. No one architect can now understand all the complexities of many systems and reason about the full impact of changes or new applications. To that end, this dissertation presents four case studies in quantitative methods. Each case study attacks a different application and proposes a new measurement or analytical technique. In each case study we find at least one surprising or unintuitive result which would likely not have been found without the application of our method

Columbia University Academic Commons

Exploring Views on Data Centre Power Consumption and Server Virtualization

Author: Liu Yunke
Radicke Johannes
Rodén Björn
Publication venue: Lunds universitet/Institutionen för informatik
Publication date: 01/01/2009
Field of study

The primary purpose of this Thesis is to explore views on Green IT/Computing and how it relates to Server Virtualization, in particular for Data Centre IT environments. Our secondary purpose is to explore other important aspects of Server Virtualization, in the same context. The primary research question was to determine if Data Centre (DC) power consumption reduction is related to, or perceived as, a success factor for implementing and deploying server virtualization for consolidation purposes, and if not, what other decision areas affect Server Virtualization and power consumption reduction, respectively. The conclusions from our research are that there is a difference of opinion regarding how to factor power consumption reduction from server equipment, both from promoters and deployers. However, it was a common view that power consumption reduction was usually achieved, but not necessarily considered, and thus not evaluated, as a success factor, nor that actual power consumption was measured or monitored after server virtualization deployment. We found that other factors seemed more important, such as lower cost through higher physical machine utilization, simplified high availability and disaster recovery capabilities

DevOps for Digital Leaders

Author: Ravichandran Aruna
Taylor Kieran
Waterhouse Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

DevOps; continuous delivery; software lifecycle; concurrent parallel testing; service management; ITIL; GRC; PaaS; containerization; API management; lean principles; technical debt; end-to-end automation; automatio

Directory of Open Access Books (DOAB)

Towards lightweight and high-performance hardware transactional memory

Author: Tomić Sasa
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

Conventional lock-based synchronization serializes accesses to critical sections guarded by the same lock. Using multiple locks brings the possibility of a deadlock or a livelock in the program, making parallel programming a difficult task. Transactional Memory (TM) is a promising paradigm for parallel programming, offering an alternative to lock-based synchronization. TM eliminates the risk of deadlocks and livelocks, while it provides the desirable semantics of Atomicity, Consistency, and Isolation of critical sections. TM speculatively executes a series of memory accesses as a single, atomic, transaction. The speculative changes of a transaction are kept private until the transaction commits. If a transaction can break the atomicity or cause a deadlock or livelock, the TM system aborts the transaction and rolls back the speculative changes. To be effective, a TM implementation should provide high performance and scalability. While implementations of TM in pure software (STM) do not provide desirable performance, Hardware TM (HTM) implementations introduce much smaller overhead and have relatively good scalability, due to their better control of hardware resources. However, many HTM systems support only the transactions that fit limited hardware resources (for example, private caches), and fall back to software mechanisms if hardware limits are reached. These HTM systems, called best-effort HTMs, are not desirable since they force a programmer to think in terms of hardware limits, to use both HTM and STM, and to manage concurrent transactions in HTM and STM. In contrast with best-effort HTMs, unbounded HTM systems support overflowed transactions, that do not fit into private caches. Unbounded HTM systems often require complex protocols or expensive hardware mechanisms for conflict detection between overflowed transactions. In addition, an execution with overflowed transactions is often much slower than an execution that has only regular transactions. This is typically due to restrictive or approximative conflict management mechanism used for overflowed transactions. In this thesis, we study hardware implementations of transactional memory, and make three main contributions. First, we improve the general performance of HTM systems by proposing a scalable protocol for conflict management. The protocol has precise conflict detection, in contrast with often-employed inexact Bloom-filter-based conflict detection, which often falsely report conflicts between transactions. Second, we propose a best-effort HTM that utilizes the new scalable conflict detection protocol, termed EazyHTM. EazyHTM allows parallel commits for all non-conflicting transactions, and generally simplifies transaction commits. Finally, we propose an unbounded HTM that extends and improves the initial protocol for conflict management, and we name it EcoTM. EcoTM features precise conflict detection, and it efficiently supports large as well as small and short transactions. The key idea of EcoTM is to leverage an observation that very few locations are actually conflicting, even if applications have high contention. In EcoTM, each core locally detects if a cache line is non-conflicting, and conflict detection mechanism is invoked only for the few potentially conflicting cache lines.La Sincronización tradicional basada en los cerrojos de exclusión mutua (locks) serializa los accesos a las secciones críticas protegidas este cerrojo. La utilización de varios cerrojos en forma concurrente y/o paralela aumenta la posibilidad de entrar en abrazo mortal (deadlock) o en un bloqueo activo (livelock) en el programa, está es una de las razones por lo cual programar en forma paralela resulta ser mucho mas dificultoso que programar en forma secuencial. La memoria transaccional (TM) es un paradigma prometedor para la programación paralela, que ofrece una alternativa a los cerrojos. La memoria transaccional tiene muchas ventajas desde el punto de vista tanto práctico como teórico. TM elimina el riesgo de bloqueo mutuo y de bloqueo activo, mientras que proporciona una semántica de atomicidad, coherencia, aislamiento con características similares a las secciones críticas. TM ejecuta especulativamente una serie de accesos a la memoria como una transacción atómica. Los cambios especulativos de la transacción se mantienen privados hasta que se confirma la transacción. Si una transacción entra en conflicto con otra transacción o sea que alguna de ellas escribe en una dirección que la otra leyó o escribió, o se entra en un abrazo mortal o en un bloqueo activo, el sistema de TM aborta la transacción y revierte los cambios especulativos. Para ser eficaz, una implementación de TM debe proporcionar un alto rendimiento y escalabilidad. Las implementaciones de TM en el software (STM) no proporcionan este desempeño deseable, en cambio, las mplementaciones de TM en hardware (HTM) tienen mejor desempeño y una escalabilidad relativamente buena, debido a su mejor control de los recursos de hardware y que la resolución de los conflictos así el mantenimiento y gestión de los datos se hace en hardware. Sin embargo, muchos de los sistemas de HTM están limitados a los recursos de hardware disponibles, por ejemplo el tamaño de las caches privadas, y dependen de mecanismos de software para cuando esos límites son sobrepasados. Estos sistemas HTM, llamados best-effort HTM no son deseables, ya que obligan al programador a pensar en términos de los límites existentes en el hardware que se esta utilizando, así como en el sistema de STM que se llama cuando los recursos son sobrepasados. Además, tiene que resolver que transacciones hardware y software se ejecuten concurrentemente. En cambio, los sistemas de HTM ilimitados soportan un numero de operaciones ilimitadas o sea no están restringidos a límites impuestos artificialmente por el hardware, como ser el tamaño de las caches o buffers internos. Los sistemas HTM ilimitados por lo general requieren protocolos complejos o mecanismos muy costosos para la detección de conflictos y el mantenimiento de versiones de los datos entre las transacciones. Por otra parte, la ejecución de transacciones es a menudo mucho más lenta que en una ejecución sobre un sistema de HTM que este limitado. Esto es debido al que los mecanismos utilizados en el HTM limitado trabaja con conjuntos de datos relativamente pequeños que caben o están muy cerca del núcleo del procesador. En esta tesis estudiamos implementaciones de TM en hardware. Presentaremos tres contribuciones principales: Primero, mejoramos el rendimiento general de los sistemas, al proponer un protocolo escalable para la gestión de conflictos. El protocolo detecta los conflictos de forma precisa, en contraste con otras técnicas basadas en filtros Bloom, que pueden reportar conflictos falsos entre las transacciones. Segundo, proponemos un best-effort HTM que utiliza el nuevo protocolo escalable detección de conflictos, denominado EazyHTM. EazyHTM permite la ejecución completamente paralela de todas las transacciones sin conflictos, y por lo general simplifica la ejecución. Por último, proponemos una extensión y mejora del protocolo inicial para la gestión de conflictos, que llamaremos EcoTM. EcoTM cuenta con detección de conflictos precisa, eficiente y es compatible tanto con transacciones grandes como con pequeñas. La idea clave de EcoTM es aprovechar la observación que en muy pocas ubicaciones de memoria aparecen los conflictos entre las transacciones, incluso en aplicaciones tienen muchos conflictos. En EcoTM, cada núcleo detecta localmente si la línea es conflictiva, además existe un mecanismo de detección de conflictos detallado que solo se activa para las pocas líneas de memoria que son potencialmente conflictivas

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

DevOps for Digital Leaders

Author: Ravichandran Aruna
Taylor Kieran
Waterhouse Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

The design and implementation of a prototype exokernel operating system

Author: Engler Dawson R
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (p. 99-106).by Dawson R. Engler.M.S

CiteSeerX

DSpace@MIT

Mobility models, mobile code offloading, and p2p networks of smartphones on the cloud

Author: Kosta Sokol
Publication venue
Publication date: 18/02/2013
Field of study

It was just a few years ago when I bought my first smartphone. And now, (almost) all of my friends possess at least one of these powerful devices. International Data Corporation (IDC) reports that smartphone sales showed strong growth worldwide in 2011, with 491.4 million units sold – up to 61.3 percent from 2010. Furthermore, IDC predicts that 686 million smartphones will be sold in 2012, 38.4 percent of all handsets shipped. Silently, we are becoming part of a big mobile smartphone network, and it is amazing how the perception of the world is changing thanks to these small devices. If many years ago the birth of Internet enabled the possibility to be online, smartphones nowadays allow to be online all the time. Today we use smartphones to do many of the tasks we used to do on desktops, and many new ones. We browse the Internet, watch videos, upload data on social networks, use online banking, find our way by using GPS and online maps, and communicate in revolutionary ways. Along with these benefits, these fancy and exciting devices brought many challenges to the research area of mobile and distributed systems. One of the first problems that captured our attention was the study of the network that potentially could be created by interconnecting all the smartphones together. Typically, these devices are able to communicate with each other in short distances by using com- munication technologies such as Bluetooth or WiFi. The network paradigm that rises from this intermittent communication, also known as Pocket Switched Network (PSN) or Opportunistic Network ([10, 11]), is seen as a key technology to provide innovative services to the users without the need of any fixed infrastructure. In PSNs nodes are short range communicating devices carried by humans. Wireless communication links are created and dropped in time, depending on the physical distance of the device holders. From one side, social relations among humans yield recurrent movement patterns that help researchers design and build protocols that efficiently deliver messages to destinations ([12, 13, 14] among others). The complexity of these social relations, from the other side, makes it difficult to build simple mobility models, that in an efficient way, generate large synthetic mobility traces that look real. Traces that would be very valuable in protocol validation and that would replace the limited experimentally gathered data available so far. Traces that would serve as a common benchmark to researchers worldwide on which to validate existing and yet to be designed protocols. With this in mind we start our study with re-designing SWIM [15], an already exist- ing mobility model shown to generate traces with similar properties of that of existing real ones. We make SWIM able to easily generate large (small)-scale scenarios, starting from known small (large)-scale ones. To the best of our knowledge, this is the first such study. In addition, we study the social aspects of SWIM-generated traces. We show how to SWIM-generate a scenario in which a specific community structure of nodes is required. Finally, exploiting the scaling properties of SWIM, we present the first analysis of the scal- ing capabilities of several forwarding protocols such as Epidemic [16], Delegation [13], Spray&Wait [14], and BUBBLE [12]. The first results of these works appeared in [1], and, at the time of writing, [2] is accepted with minor revision. Next, we take into account the fact that in PSNs cannot be assumed full cooperation and fairness among nodes. Selfish behavior of individuals has to be considered, since it is an inherent aspect of humans, the device holders (see [17], [18]). We design a market-based mathematical framework that enables heterogeneous mobile users in an opportunistic mobile network to compromise optimally and efficiently on their QoS 3 demands. The goal of the framework is to satisfy each user with its achieved (lesser) QoS, and at the same time maximize the social welfare of users in the network. We base our study on the consideration that, in practice, users are generally tolerant on accepting lesser QoS guarantees than what they demand, with the degree of tolerance varying from user to user. This study is described in details in Chapter 2 of this dissertation, and is included in [3]. In general, QoS could be parameters such as response time, number of computations per unit time, allocated bandwidth, etc. Along the way toward our study of the smartphone-world, there was the big advent of mobile cloud computing—smartphones getting help from cloud-enabled services. Many researchers started believing that the cloud could help solving a crucial problem regarding smartphones: improve battery life. New generation apps are becoming very complex — gaming, navigation, video editing, augmented reality, speech recognition, etc., — which require considerable amount of power and energy, and as a result, smartphones suffer short battery lifetime. Unfortunately, as a consequence, mobile users have to continually upgrade their hardware to keep pace with increasing performance requirements but still experience battery problems. Many recent works have focused on building frameworks that enable mobile computation offloading to software clones of smartphones on the cloud (see [19, 20] among others), as well as to backup systems for data and applications stored in our devices [21, 22, 23]. However, none of these address dynamic and scalability features of execution on the cloud. These are very important problems, since users may request different computational power or backup space based on their workload and deadline for tasks. Considering this and advancing on previous works, we design, build, and implement the ThinkAir framework, which focuses on the elasticity and scalability of the server side and enhances the power of mobile cloud computing by parallelizing method execution using multiple Virtual Machine (VM) images. We evaluate the system using a range of benchmarks starting from simple micro-benchmarks to more complex applications. First, we show that the execution time and energy consumption decrease two orders of magnitude for the N-queens puzzle and one order of magnitude for a face detection and a virus scan application, using cloud offloading. We then show that a parallelizable application can invoke multiple VMs to execute in the cloud in a seamless and on-demand manner such as to achieve greater reduction on execution time and energy consumption. Finally, we use a memory-hungry image combiner tool to demonstrate that applications can dynamically request VMs with more computational power in order to meet their computational requirements. The details of the ThinkAir framework and its evaluation are described in Chapter 4, and are included in [6, 5]. Later on, we push the smartphone-cloud paradigm to a further level: We develop Clone2Clone (C2C), a distributed platform for cloud clones of smartphones. Along the way toward C2C, we study the performance of device-clones hosted in various virtualization environments in both private (local servers) and public (Amazon EC2) clouds. We build the first Amazon Customized Image (AMI) for Android-OS—a key tool to get reliable performance measures of mobile cloud systems—and show how it boosts up performance of Android images on the Amazon cloud service. We then design, build, and implement Clone2Clone, which associates a software clone on the cloud to every smartphone and in- terconnects the clones in a p2p fashion exploiting the networking service within the cloud. On top of C2C we build CloneDoc, a secure real-time collaboration system for smartphone users. We measure the performance of CloneDoc on a testbed of 16 Android smartphones and clones hosted on both private and public cloud services and show that C2C makes it possible to implement distributed execution of advanced p2p services in a network of mobile smartphones. The design and implementation of the Clone2Clone platform is included in [7], recently submitted to an international conference. We believe that Clone2Clone not only enables the execution of p2p applications in a network of smartphones, but it can also serve as a tool to solve critical security problems. In particular, we consider the problem of computing an efficient patching strategy to stop worm spreading between smartphones. We assume that the worm infects the devices and spreads by using bluetooth connections, emails, or any other form of communication used by the smartphones. The C2C network is used to compute the best strategy to patch the smartphones in such a way that the number of devices to patch is low (to reduce the load on the cellular infrastructure) and that the worm is stopped quickly. We consider two well defined worms, one spreading between the devices and one attacking the cloud before moving to the real smartphones. We describe CloudShield [8], a suite of protocols running on the peer-to-peer network of clones; and show by experiments with two different datasets (Facebook and LiveJournal) that CloudShield outperforms state-of-the-art worm-containment mechanisms for mobile wireless networks. This work is done in collaboration with Marco Valerio Barbera, PhD colleague in the same department, who contributed mainly in the implementation and testing of the malware spreading and patching strategies on the different datasets. The communication between the real devices and the cloud, necessary for mobile com- putation offloading and smartphone data backup, does certainly not come for free. To the best of our knowledge, none of the works related to mobile cloud computing explicitly studies the actual overhead in terms of bandwidth and energy to achieve full backup of both data/applications of a smartphone, as well as to keep, on the cloud, up-to-date clones of smartphones for mobile computation offload purposes. In the last work during my PhD—again, in collaboration with Marco Valerio Barbera—we studied the feasibility of both mobile computation offloading and mobile software/data backup in real-life scenarios. This joint work resulted in a recent publication [9] but is not included in this thesis. As in C2C, we assume an architecture where each real device is associated to a software clone on the cloud. We define two types of clones: The off-clone, whose purpose is to support computation offloading, and the back-clone, which comes to use when a restore of user’s data and apps is needed. We measure the bandwidth and energy consumption incurred in the real device as a result of the synchronization with the off-clone or the back-clone. The evaluation is performed through an experiment with 11 Android smartphones and an equal number of clones running on Amazon EC2. We study the data communication overhead that is necessary to achieve different levels of synchronization (once every 5min, 30min, 1h, etc.) between devices and clones in both the off-clone and back-clone case, and report on the costs in terms of energy incurred by each of these synchronization frequencies as well as by the respective communication overhead. My contribution in this work is focused mainly on the experimental setup, deployment, and data collection

Archivio della ricerca- Università di Roma La Sapienza

Integrated System Architectures for High-Performance Internet Servers

Author: Binkert Nathan Lorenzo
Publication venue
Publication date: 01/01/2006
Field of study

Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90845/1/binkert-thesis.pd

CiteSeerX

Deep Blue Documents at the University of Michigan