130 research outputs found

    On the classification and evaluation of prefetching schemes

    Get PDF
    Abstract available: p. [2

    PRISMA: a prefetching storage middleware for accelerating deep learning frameworks

    Get PDF
    Dissertação mestrado integrado em Informatics EngineeringDeep Learning (DL) is a widely used technique often applied to many domains, from computer vision to natural language processing. To avoid overfitting, DL applications have to access large amounts of data, which affects the training performance. Although significant hardware advances have already been made, current storage systems cannot keep up with the needs required by DL techniques. Considering this, multiple storage solutions have already been developed to improve the Input/Output (I/O) performance of DL training. Nevertheless, they are either specific to certain DL frameworks or present drawbacks, such as loss of accuracy. Most DL frameworks also contain internal I/O optimizations, however they cannot be easily decoupled and applied to other frameworks. Furthermore, most of these optimizations have to be manually configured or comprise greedy provisioning algorithms that waste computational resources. To address these issues, we propose PRISMA, a novel storage middleware that employs data prefetching and parallel I/O to improve DL training performance. PRISMA provides an autotuning mechanism to automatically select the optimal configuration. This mechanism was designed to achieve a good trade-off between performance and resource usage. PRISMA is framework-agnostic, meaning that it can be applied to any DL framework, and does not impact the accuracy of the training model. In addition to PRISMA, we provide a thorough study and evaluation of the TensorFlow Dataset Application Programming Interface (API), demonstrating that local DL can benefit from I/O optimization. PRISMA was integrated and evaluated with two popular DL frameworks, namely Tensor Flow and PyTorch, proving that it is successful under different I/O workloads. Experimental results demonstrate that PRISMA is the most efficient solution for the majority of the scenar ios that were studied, while for the other scenarios exhibits similar performance to built-in optimizations of TensorFlow and PyTorch.Aprendizagem Profunda (AP) é uma área bastante abrangente que é atualmente utilizada em diversos domínios, como é o caso da visão por computador e do processamento de linguagem natural. A aplicação de técnicas de AP implica o acesso a grandes quantidades de dados, o que afeta o desempenho de treino. Embora já tenham sido alcançados avanços significativos em termos de hardware, os sistemas de armazenamento atuais não conseguem acompanhar os requisitos de desempenho que os mecanismos de AP impõem. Considerando isto, foram desenvolvidas várias soluções de armazenamento com o objetivo de melhorar o desempenho de Entrada/Saída (E/S) do treino de AP. No entanto, as soluções existentes possuem certas desvantagens, nomeadamente perda de precisão do modelo de treino e o facto de serem específicas a determinadas plataformas de AP. A maioria das plataformas de AP também possuem otimizações de E/S, contudo essas otimizações não podem ser facilmente desacopladas e aplicadas a outras plataformas. Para além disto, a maioria destas otimizações tem que ser configurada manualmente ou contém algoritmos de provisionamento gananciosos, que desperdiçam recursos computacionais. Para resolver os problemas anteriormente mencionados, esta dissertação propõe o PRISMA, um middleware de armazenamento que executa pré-busca de dados e paralelismo de E/S, de forma a melhorar o desempenho de treino de AP. O PRISMA providencia um mecanismo de configuração automática para determinar uma combinação de parâmetros ótima. Este mecanismo foi desenvolvido com o objetivo de obter um bom equilíbrio entre desempenho e utilização de recursos. O PRISMA é independente da plataforma de AP e não afeta a precisão do modelo de treino. Além do PRISMA, esta dissertação providencia um estudo e uma avaliação detalhados da Interface de Programação de Aplicações (API) Dataset do TensorFlow, provando que AP local pode beneficiar de otimizações de E/S. O PRISMA foi integrado e avaliado com duas plataformas de AP amplamente utilizadas, o TensorFlow e o PyTorch, demonstrando que este middleware tem sucesso sob diferentes cargas de trabalho de E/S. Os resultados experimentais demonstram que o PRISMA é a solução mais eficiente na maioria dos cenários estudados, e possui um desempenho semelhante às otimizações internas do TensorFlow e do PyTorch.Fundação para a Ciência e a Tecnologia (FCT) - project UIDB/50014/202

    Enhancing the Programmability of Cloud Object Storage

    Get PDF
    En un món que depèn cada vegada més de la tecnologia, les dades digitals es generen a una escala sense precedents. Això fa que empreses que requereixen d'un gran espai d'emmagatzematge, com Netflix o Dropbox, utilitzin solucions d'emmagatzematge al núvol. Mes concretament, l'emmagatzematge d'objectes, donada la seva simplicitat, escalabilitat i alta disponibilitat. No obstant això, aquests magatzems s'enfronten a tres desafiaments principals: 1) Gestió flexible de càrregues de treball de múltiples usuaris. Normalment, els magatzems d'objectes són sistemes multi-usuari, la qual cosa significa que tots ells comparteixen els mateixos recursos, el que podria ocasionar problemes d'interferència. A més, és complex administrar polítiques d'emmagatzematge heterogènies a gran escala en ells. 2) Autogestió de dades. Els magatzems d'objectes no ofereixen molta flexibilitat pel que fa a l'autogestió de dades per part dels usuaris. Típicament, són sistemes rígids, la qual cosa impedeix gestionar els requisits específics dels objectes. 3) Còmput elàstic prop de les dades. Situar els càlculs prop de les dades pot ser útil per reduir la transferència de dades. Però, el desafiament aquí és com aconseguir la seva elasticitat sense provocar contenció de recursos i interferències en la capa d'emmagatzematge. En aquesta tesi presentem tres contribucions innovadores que resolen aquests desafiaments. En primer lloc, presentem la primera arquitectura d'emmagatzematge definida per programari (SDS) per a magatzems d'objectes que separa les capes de control i de dades. Això permet gestionar les càrregues de treball de múltiples usuaris d'una manera flexible i dinàmica. En segon lloc, hem dissenyat una nova abstracció de polítiques anomenada "microcontrolador" que transforma els objectes comuns en objectes intel·ligents, permetent als usuaris programar el seu comportament. Finalment, presentem la primera plataforma informàtica "serverless" guiada per dades i elàstica, que mitiga els problemes de col·locar el càlcul prop de les dades.En un mundo que depende cada vez más de la tecnología, los datos digitales se generan a una escala sin precedentes. Esto hace que empresas que requieren de un gran espacio de almacenamiento, como Netflix o Dropbox, usen soluciones de almacenamiento en la nube. Mas concretamente, el almacenamiento de objectos, dada su escalabilidad y alta disponibilidad. Sin embargo, estos almacenes se enfrentan a tres desafíos principales: 1) Gestión flexible de cargas de trabajo de múltiples usuarios. Normalmente, los almacenes de objetos son sistemas multi-usuario, lo que significa que todos ellos comparten los mismos recursos, lo que podría ocasionar problemas de interferencia. Además, es complejo administrar políticas de almacenamiento heterogéneas a gran escala en ellos. 2) Autogestión de datos. Los almacenes de objetos no ofrecen mucha flexibilidad con respecto a la autogestión de datos por parte de los usuarios. Típicamente, son sistemas rígidos, lo que impide gestionar los requisitos específicos de los objetos. 3) Cómputo elástico cerca de los datos. Situar los cálculos cerca de los datos puede ser útil para reducir la transferencia de datos. Pero, el desafío aquí es cómo lograr su elasticidad sin provocar contención de recursos e interferencias en la capa de almacenamiento. En esta tesis presentamos tres contribuciones que resuelven estos desafíos. En primer lugar, presentamos la primera arquitectura de almacenamiento definida por software (SDS) para almacenes de objetos que separa las capas de control y de datos. Esto permite gestionar las cargas de trabajo de múltiples usuarios de una manera flexible y dinámica. En segundo lugar, hemos diseñado una nueva abstracción de políticas llamada "microcontrolador" que transforma los objetos comunes en objetos inteligentes, permitiendo a los usuarios programar su comportamiento. Finalmente, presentamos la primera plataforma informática "serverless" guiada por datos y elástica, que mitiga los problemas de colocar el cálculo cerca de los datos.In a world that is increasingly dependent on technology, digital data is generated in an unprecedented way. This makes companies that require large storage space, such as Netflix or Dropbox, use cloud object storage solutions. This is mainly thanks to their built-in characteristics, such as simplicity, scalability and high-availability. However, cloud object stores face three main challenges: 1) Flexible management of multi-tenant workloads. Commonly, cloud object stores are multi-tenant systems, meaning that all tenants share the same system resources, which could lead to interference problems. Furthermore, it is now complex to manage heterogeneous storage policies in a massive scale. 2) Data self-management. Cloud object stores themselves do not offer much flexibility regarding data self-management by tenants. Typically, they are rigid, which prevent tenants to handle the specific requirements of their objects. 3) Elastic computation close to the data. Placing computations close to the data can be useful to reduce data transfers. But, the challenge here is how to achieve elasticity in those computations without provoking resource contention and interferences in the storage layer. In this thesis, we present three novel research contributions that solve the aforementioned challenges. Firstly, we introduce the first Software-defined Storage (SDS) architecture for cloud object stores that separates the control plane from the data plane, allowing to manage multi-tenant workloads in a flexible and dynamic way. For example, by applying different service levels of bandwidth to different tenants. Secondly, we designed a novel policy abstraction called microcontroller that transforms common objects into smart objects, enabling tenants to programmatically manage their behavior. For example, a content-level access control microcontroller attached to an specific object to filter its content depending on who is accessing it. Finally, we present the first elastic data-driven serverless computing platform that mitigates the resource contention problem of placing computation close to the data

    Web-based applications for open display networks : developers’ perspective

    Get PDF
    Open Display Networks represent a new paradigm for large scale networks of public displays that are open to applications and content from third parties. Web technologies may be particularly interesting as a technological framework for third-party application development in Open Display Networks because of their portability and widespread use. However, there are also significant challenges involved that result from the specificities of this particular usage domain and the lack of specific development insights for this context. In this work, we address the concept of public display application (display app) from a development perspective. The overall goal of this paper is to identify and characterize some of the key specificities of display applications and the appropriate Web solutions that can serve in the development of this type of application. The contribution of this paper builds on our extensive experience with the application development for a real world public display infrastructure and also on a short-term experiment with third party developers. Overall, the results show that Web technologies are valuable building blocks for public displays applications and their adoption is not only a subject for adaptation procedures but also for redesigning their use according to the characteristics and user experience offered by public displays. This research will inform the design of new Web-based models of display applications and shed light on the challenges that may impede third party development and the evolution of an application ecosystem in this area.Fundação para a Ciência e a Tecnologia (FCT

    SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads

    Get PDF
    Online transaction processing (OLTP) is at the core of many data center applications. OLTP workloads are known to have large instruction footprints that foil existing L1 instruction caches resulting in poor overall performance. Prefetching can reduce the impact of such instruction cache miss stalls; however, state-of-the-art solutions require large dedicated hardware tables on the order of 40KB in size. SLICC is a programmer transparent, low cost technique to minimize instruction cache misses when executing OLTP workloads. SLICC migrates threads, spreading their instruction footprint over several L1 caches. It exploits repetition within and across transactions, where a transaction’s first iteration prefetches the instructions for subsequent iterations or similar subsequent transactions. SLICC reduces instruction misses by 56% on average for TPC-C and TPC-E, thereby improving performance by 68%. When compared to a state-of-the-art prefetcher, and notwithstanding the increased storage overheads (42× as compared to SLICC), performance using SLICC is 21% higher for TPC-E and within 2% for TPC-C

    Prefetch throttling and data pinning for improving performance of shared caches

    Get PDF
    In this paper, we (i) quantify the impact of compilerdirected I/O prefetching on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings some benefits, its effectiveness reduces significantly as the number of clients (compute nodes) is increased; (ii) identify interclient misses due to harmful I/O prefetches as one of the main sources for this reduction in performance with increased number of clients; and (iii) propose and experimentally evaluate prefetch throttling and data pinning schemes to improve performance of I/O prefetching. Prefetch throttling prevents one or more clients from issuing further prefetches if such prefetches are predicted to be harmful, i.e., replace from the memory cache the useful data accessed by other clients. Data pinning on the other hand makes selected data blocks immune to harmful prefetches by pinning them in the memory cache. We show that these two schemes can be applied in isolation or combined together, and they can be applied at a coarse or fine granularity. Our experiments with these two optimizations using four disk-intensive applications reveal that they can improve performance by 9.7% and 15.1% on average, over standard compiler-directed I/O prefetching and no-prefetch case, respectively, when 8 clients are used. © 2008 IEEE
    corecore