69 research outputs found
Recommended from our members
NVSwap Latency-Aware Paging Using Non-Volatile Main Memory
Page relocation (paging) from DRAM to swap devices is an important task of a virtual memory system in operating systems. Existing Linux paging mechanisms have two main deficiencies: (1) they may incur a high I/O latency due to write interference on solid-state disks and aggressive memory page reclaiming rate under high memory pressure and (2) they do not provide predictable latency bound for latency-sensitive applications because they cannot control the allocation of system resources among concurrent processes sharing swap devices. In this thesis, we present the design and implementation of a latency-aware paging mechanism called NVSwap. It supports a hybrid swap space using both regular secondary storage devices (e.g., solid-state disks) and non-volatile main memory (NVMM). The design is more cost-effective than using only NVMM as swap spaces. Furthermore, NVSwap uses NVMM as a persistent paging buffer to serve the page-out requests and hide the latency of paging between the regular swap device and DRAM. It supports in-situ paging for pages in the persistent paging buffer avoiding the slow I/O path. Finally, NVSwap allows users to specify latency bounds for individual processes or a group of related processes and enforces the bounds by dynamically controlling the resource allocation of NVMM and page reclaiming rate in memory among scheduling units. We have implemented a prototype of NVSwap in the Linux kernel-3.16.74. Our results demonstrate that NVSwap reduces paging latency by up to 99% and provides performance guarantee and isolation among concurrent applications sharing swap devices
Enhancing the Programmability of Cloud Object Storage
En un mĂłn que depĂšn cada vegada mĂ©s de la tecnologia, les dades digitals es generen a una escala sense precedents. AixĂČ fa que empreses que requereixen d'un gran espai d'emmagatzematge, com Netflix o Dropbox, utilitzin solucions d'emmagatzematge al nĂșvol. Mes concretament, l'emmagatzematge d'objectes, donada la seva simplicitat, escalabilitat i alta disponibilitat. No obstant aixĂČ, aquests magatzems s'enfronten a tres desafiaments principals: 1) GestiĂł flexible de cĂ rregues de treball de mĂșltiples usuaris. Normalment, els magatzems d'objectes sĂłn sistemes multi-usuari, la qual cosa significa que tots ells comparteixen els mateixos recursos, el que podria ocasionar problemes d'interferĂšncia. A mĂ©s, Ă©s complex administrar polĂtiques d'emmagatzematge heterogĂšnies a gran escala en ells. 2) AutogestiĂł de dades. Els magatzems d'objectes no ofereixen molta flexibilitat pel que fa a l'autogestiĂł de dades per part dels usuaris. TĂpicament, sĂłn sistemes rĂgids, la qual cosa impedeix gestionar els requisits especĂfics dels objectes. 3) CĂČmput elĂ stic prop de les dades. Situar els cĂ lculs prop de les dades pot ser Ăștil per reduir la transferĂšncia de dades. PerĂČ, el desafiament aquĂ Ă©s com aconseguir la seva elasticitat sense provocar contenciĂł de recursos i interferĂšncies en la capa d'emmagatzematge. En aquesta tesi presentem tres contribucions innovadores que resolen aquests desafiaments. En primer lloc, presentem la primera arquitectura d'emmagatzematge definida per programari (SDS) per a magatzems d'objectes que separa les capes de control i de dades. AixĂČ permet gestionar les cĂ rregues de treball de mĂșltiples usuaris d'una manera flexible i dinĂ mica. En segon lloc, hem dissenyat una nova abstracciĂł de polĂtiques anomenada "microcontrolador" que transforma els objectes comuns en objectes intel·ligents, permetent als usuaris programar el seu comportament. Finalment, presentem la primera plataforma informĂ tica "serverless" guiada per dades i elĂ stica, que mitiga els problemes de col·locar el cĂ lcul prop de les dades.En un mundo que depende cada vez mĂĄs de la tecnologĂa, los datos digitales se generan a una escala sin precedentes. Esto hace que empresas que requieren de un gran espacio de almacenamiento, como Netflix o Dropbox, usen soluciones de almacenamiento en la nube. Mas concretamente, el almacenamiento de objectos, dada su escalabilidad y alta disponibilidad. Sin embargo, estos almacenes se enfrentan a tres desafĂos principales: 1) GestiĂłn flexible de cargas de trabajo de mĂșltiples usuarios. Normalmente, los almacenes de objetos son sistemas multi-usuario, lo que significa que todos ellos comparten los mismos recursos, lo que podrĂa ocasionar problemas de interferencia. AdemĂĄs, es complejo administrar polĂticas de almacenamiento heterogĂ©neas a gran escala en ellos. 2) AutogestiĂłn de datos. Los almacenes de objetos no ofrecen mucha flexibilidad con respecto a la autogestiĂłn de datos por parte de los usuarios. TĂpicamente, son sistemas rĂgidos, lo que impide gestionar los requisitos especĂficos de los objetos. 3) CĂłmputo elĂĄstico cerca de los datos. Situar los cĂĄlculos cerca de los datos puede ser Ăștil para reducir la transferencia de datos. Pero, el desafĂo aquĂ es cĂłmo lograr su elasticidad sin provocar contenciĂłn de recursos e interferencias en la capa de almacenamiento. En esta tesis presentamos tres contribuciones que resuelven estos desafĂos. En primer lugar, presentamos la primera arquitectura de almacenamiento definida por software (SDS) para almacenes de objetos que separa las capas de control y de datos. Esto permite gestionar las cargas de trabajo de mĂșltiples usuarios de una manera flexible y dinĂĄmica. En segundo lugar, hemos diseñado una nueva abstracciĂłn de polĂticas llamada "microcontrolador" que transforma los objetos comunes en objetos inteligentes, permitiendo a los usuarios programar su comportamiento. Finalmente, presentamos la primera plataforma informĂĄtica "serverless" guiada por datos y elĂĄstica, que mitiga los problemas de colocar el cĂĄlculo cerca de los datos.In a world that is increasingly dependent on technology, digital data is generated in an unprecedented way. This makes companies that require large storage space, such as Netflix or Dropbox, use cloud object storage solutions. This is mainly thanks to their built-in characteristics, such as simplicity, scalability and high-availability. However, cloud object stores face three main challenges: 1) Flexible management of multi-tenant workloads. Commonly, cloud object stores are multi-tenant systems, meaning that all tenants share the same system resources, which could lead to interference problems. Furthermore, it is now complex to manage heterogeneous storage policies in a massive scale. 2) Data self-management. Cloud object stores themselves do not offer much flexibility regarding data self-management by tenants. Typically, they are rigid, which prevent tenants to handle the specific requirements of their objects. 3) Elastic computation close to the data. Placing computations close to the data can be useful to reduce data transfers. But, the challenge here is how to achieve elasticity in those computations without provoking resource contention and interferences in the storage layer. In this thesis, we present three novel research contributions that solve the aforementioned challenges. Firstly, we introduce the first Software-defined Storage (SDS) architecture for cloud object stores that separates the control plane from the data plane, allowing to manage multi-tenant workloads in a flexible and dynamic way. For example, by applying different service levels of bandwidth to different tenants. Secondly, we designed a novel policy abstraction called microcontroller that transforms common objects into smart objects, enabling tenants to programmatically manage their behavior. For example, a content-level access control microcontroller attached to an specific object to filter its content depending on who is accessing it. Finally, we present the first elastic data-driven serverless computing platform that mitigates the resource contention problem of placing computation close to the data
Recommended from our members
High performance Monte Carlo computation for finance risk data analysis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Finance risk management has been playing an increasingly important role in the finance sector, to analyse finance data and to prevent any potential crisis. It has been widely recognised that Value at Risk (VaR) is an effective method for finance risk management and evaluation. This thesis conducts a comprehensive review on a number of VaR methods and discusses in depth their strengths and limitations. Among these VaR methods, Monte Carlo simulation and analysis has proven to be the most accurate VaR method in finance risk evaluation due to its strong modelling capabilities. However, one major challenge in Monte Carlo analysis is its high computing complexity of O(nÂČ). To speed up the computation in Monte Carlo analysis, this thesis parallelises Monte Carlo using the MapReduce model, which has become a major software programming model in support of data intensive applications. MapReduce consists of two functions - Map and Reduce. The Map function segments a large data set into small data chunks and distribute these data chunks among a number of computers for processing in parallel with a Mapper processing a data chunk on a computing node. The Reduce function collects the results generated by these Map nodes (Mappers) and generates an output. The parallel Monte Carlo is evaluated initially in a small scale MapReduce experimental environment, and subsequently evaluated in a large scale simulation environment. Both experimental and simulation results show that the MapReduce based parallel Monte Carlo is greatly faster than the sequential Monte Carlo in computation, and the accuracy level is maintained as well. In data intensive applications, moving huge volumes of data among the computing nodes could incur high overhead in communication. To address this issue, this thesis further considers data locality in the MapReduce based parallel Monte Carlo, and evaluates the impacts of data locality on the performance in computation
Sistemas interativos e distribuĂdos para telemedicina
doutoramento CiĂȘncias da ComputaçãoDurante as Ășltimas dĂ©cadas, as organizaçÔes de saĂșde tĂȘm vindo a adotar continuadamente as tecnologias de informação para melhorar o funcionamento dos seus serviços. Recentemente, em parte devido Ă crise financeira, algumas reformas no sector de saĂșde incentivaram o aparecimento de novas soluçÔes de telemedicina para otimizar a utilização de recursos humanos e de equipamentos. Algumas tecnologias como a computação em nuvem, a computação mĂłvel e os sistemas Web, tĂȘm sido importantes para o sucesso destas novas aplicaçÔes de telemedicina. As funcionalidades emergentes de computação distribuĂda facilitam a ligação de comunidades mĂ©dicas, promovem serviços de telemedicina e a colaboração em tempo real. TambĂ©m sĂŁo evidentes algumas vantagens que os dispositivos mĂłveis podem introduzir, tais como facilitar o trabalho remoto a qualquer hora e em qualquer lugar. Por outro lado, muitas funcionalidades que se tornaram comuns nas redes sociais, tais como a partilha de dados, a troca de mensagens, os fĂłruns de discussĂŁo e a videoconferĂȘncia, tĂȘm o potencial para promover a colaboração no sector da saĂșde.
Esta tese teve como objetivo principal investigar soluçÔes computacionais mais ĂĄgeis que permitam promover a partilha de dados clĂnicos e facilitar a criação de fluxos de trabalho colaborativos em radiologia. AtravĂ©s da exploração das atuais tecnologias Web e de computação mĂłvel, concebemos uma solução ubĂqua para a visualização de imagens mĂ©dicas e desenvolvemos um sistema colaborativo para a ĂĄrea de radiologia, baseado na tecnologia da computação em nuvem. Neste percurso, foram investigadas metodologias de mineração de texto, de representação semĂąntica e de recuperação de informação baseada no conteĂșdo da imagem. Para garantir a privacidade dos pacientes e agilizar o processo de partilha de dados em ambientes colaborativos, propomos ainda uma metodologia que usa aprendizagem automĂĄtica para anonimizar as imagens mĂ©dicasDuring the last decades, healthcare organizations have been increasingly relying on information technologies to improve their services. At the same time, the optimization of resources, both professionals and equipment, have promoted the emergence of telemedicine solutions. Some technologies including cloud computing, mobile computing, web systems and distributed computing can be used to facilitate the creation of medical communities, and the promotion of telemedicine services and real-time collaboration. On the other hand, many features that have become commonplace in social networks, such as data sharing, message exchange, discussion forums, and a videoconference, have also the potential to foster collaboration in the health sector.
The main objective of this research work was to investigate computational solutions that allow us to promote the sharing of clinical data and to facilitate the creation of collaborative workflows in radiology. By exploring computing and mobile computing technologies, we have designed a solution for medical imaging visualization, and developed a collaborative system for radiology, based on cloud computing technology. To extract more information from data, we investigated several methodologies such as text mining, semantic representation, content-based information retrieval. Finally, to ensure patient privacy and to streamline the data sharing in collaborative environments, we propose a machine learning methodology to anonymize medical images
Improving Data Management and Data Movement Efficiency in Hybrid Storage Systems
University of Minnesota Ph.D. dissertation.July 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 116 pages.In the big data era, large volumes of data being continuously generated drive the emergence of high performance large capacity storage systems. To reduce the total cost of ownership, storage systems are built in a more composite way with many different types of emerging storage technologies/devices including Storage Class Memory (SCM), Solid State Drives (SSD), Shingle Magnetic Recording (SMR), Hard Disk Drives (HDD), and even across off-premise cloud storage. To make better utilization of each type of storage, industries have provided multi-tier storage through dynamically placing hot data in the faster tiers and cold data in the slower tiers. Data movement happens between devices on one single device and as well as between devices connected via various networks. Toward improving data management and data movement efficiency in such hybrid storage systems, this work makes the following contributions: To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information (I/O access hints) from upper layers to the block storage layer may greatly improve application performance or ease data management in heterogeneous storage systems. We present and develop a generic and flexible framework, called HintStor, to execute and evaluate various I/O access hints on heterogeneous storage systems with minor modifications to the kernel and applications. The design of HintStor contains a new application/user level interface, a file system plugin and a block storage data manager. With HintStor, storage systems composed of various storage devices can perform pre-devised data placement, space reallocation and data migration polices assisted by the added access hints. Each storage device/technology has its own unique price-performance tradeoffs and idiosyncrasies with respect to workload characteristics they prefer to support. To explore the internal access patterns and thus efficiently place data on storage systems with fully connected (i.e., data can move from one device to any other device instead of moving tier by tier) differential pools (each pool consists of storage devices of a particular type), we propose a chunk-level storage-aware workload analyzer framework, simplified as ChewAnalyzer. With ChewAnalzyer, the storage manager can adequately distribute and move the data chunks across different storage pools. To reduce the duplicate content transferred between local storage devices and devices in remote data centers, an inline Network Redundancy Elimination (NRE) process with Content-Defined Chunking (CDC) policy can obtain a higher Redundancy Elimination (RE) ratio but may suffer from a considerably higher computational requirement than fixed-size chunking. We build an inline NRE appliance which incorporates an improved FPGA based scheme to speed up CDC processing. To efficiently utilize the hardware resources, the whole NRE process is handled by a Virtualized NRE (VNRE) controller. The uniqueness of this VNRE that we developed lies in its ability to exploit the redundancy patterns of different TCP flows and customize the chunking process to achieve a higher RE ratio
Millimeter-wave Wireless LAN and its Extension toward 5G Heterogeneous Networks
Millimeter-wave (mmw) frequency bands, especially 60 GHz unlicensed band, are
considered as a promising solution for gigabit short range wireless
communication systems. IEEE standard 802.11ad, also known as WiGig, is
standardized for the usage of the 60 GHz unlicensed band for wireless local
area networks (WLANs). By using this mmw WLAN, multi-Gbps rate can be achieved
to support bandwidth-intensive multimedia applications. Exhaustive search along
with beamforming (BF) is usually used to overcome 60 GHz channel propagation
loss and accomplish data transmissions in such mmw WLANs. Because of its short
range transmission with a high susceptibility to path blocking, multiple number
of mmw access points (APs) should be used to fully cover a typical target
environment for future high capacity multi-Gbps WLANs. Therefore, coordination
among mmw APs is highly needed to overcome packet collisions resulting from
un-coordinated exhaustive search BF and to increase the total capacity of mmw
WLANs. In this paper, we firstly give the current status of mmw WLANs with our
developed WiGig AP prototype. Then, we highlight the great need for coordinated
transmissions among mmw APs as a key enabler for future high capacity mmw
WLANs. Two different types of coordinated mmw WLAN architecture are introduced.
One is the distributed antenna type architecture to realize centralized
coordination, while the other is an autonomous coordination with the assistance
of legacy Wi-Fi signaling. Moreover, two heterogeneous network (HetNet)
architectures are also introduced to efficiently extend the coordinated mmw
WLANs to be used for future 5th Generation (5G) cellular networks.Comment: 18 pages, 24 figures, accepted, invited paper
Effective Use of SSDs in Database Systems
With the advent of solid state drives (SSDs), the storage industry has experienced a revolutionary improvement in I/O performance. Compared to traditional hard disk drives (HDDs), SSDs benefit from shorter I/O latency, better power efficiency, and cheaper random I/Os. Because of these superior properties, SSDs are gradually replacing HDDs. For decades, database management systems have been designed, architected, and optimized based on the performance characteristics of HDDs. In order to utilize the superior performance of SSDs, new methods should be developed, some database components should be
redesigned, and architectural decisions should be revisited.
In this thesis, novel methods are proposed to exploit the new capabilities of modern SSDs to improve the performance of database systems. The first is a new method for using SSDs as a fully persistent second level memory buffer pool. This method uses SSDs as a supplementary storage device to improve transactional throughput and to reduce the checkpoint and recovery times. A prototype of the proposed method is compared with its closest existing competitor. The second considers the impact of the parallel I/O capability of modern SSDs on the database query optimizer. It is shown that a query optimizer that is unaware of the parallel I/O capability of SSDs can make significantly sub-optimal decisions. In addition, a practical method for making the query optimizer parallel-I/O-aware is introduced and evaluated empirically. The third technique is an SSD-friendly external merge sort. This sorting technique has better performance than other common external sorting techniques. It also improves the SSD's lifespan by reducing the number of write operations required during sorting
- âŠ