505 research outputs found

    File system metadata virtualization

    Get PDF
    The advance of computing systems has brought new ways to use and access the stored data that push the architecture of traditional file systems to its limits, making them inadequate to handle the new needs. Current challenges affect both the performance of high-end computing systems and its usability from the applications perspective. On one side, high-performance computing equipment is rapidly developing into large-scale aggregations of computing elements in the form of clusters, grids or clouds. On the other side, there is a widening range of scientific and commercial applications that seek to exploit these new computing facilities. The requirements of such applications are also heterogeneous, leading to dissimilar patterns of use of the underlying file systems. Data centres have tried to compensate this situation by providing several file systems to fulfil distinct requirements. Typically, the different file systems are mounted on different branches of a directory tree, and the preferred use of each branch is publicised to users. A similar approach is being used in personal computing devices. Typically, in a personal computer, there is a visible and clear distinction between the portion of the file system name space dedicated to local storage, the part corresponding to remote file systems and, recently, the areas linked to cloud services as, for example, directories to keep data synchronized across devices, to be shared with other users, or to be remotely backed-up. In practice, this approach compromises the usability of the file systems and the possibility of exploiting all the potential benefits. We consider that this burden can be alleviated by determining applicable features on a per-file basis, and not associating them to the location in a static, rigid name space. Moreover, usability would be further increased by providing multiple dynamic name spaces that could be adapted to specific application needs. This thesis contributes to this goal by proposing a mechanism to decouple the user view of the storage from its underlying structure. The mechanism consists in the virtualization of file system metadata (including both the name space and the object attributes) and the interposition of a sensible layer to take decisions on where and how the files should be stored in order to benefit from the underlying file system features, without incurring on usability or performance penalties due to inadequate usage. This technique allows to present multiple, simultaneous virtual views of the name space and the file system object attributes that can be adapted to specific application needs without altering the underlying storage configuration. The first contribution of the thesis introduces the design of a metadata virtualization framework that makes possible the above-mentioned decoupling; the second contribution consists in a method to improve file system performance in large-scale systems by using such metadata virtualization framework; finally, the third contribution consists in a technique to improve the usability of cloud-based storage systems in personal computing devices.Postprint (published version

    swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

    Full text link
    The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

    YOLO: Accélération du temps de démarrage de la machine virtuelleen réduisant les opérations d’I/O

    Get PDF
    Several works have shown that the time to boot one virtual machine (VM) can last up to a fewminutes in high consolidated cloud scenarios. This time is critical as VM boot duration defines how anapplication can react w.r.t. demands’ fluctuations (horizontal elasticity). To limit as much as possible thetime to boot a VM, we design the YOLO mechanism (You Only Load Once). YOLO optimizes the numberof I/O operations generated during a VM boot process by relying on the boot image abstraction, a subsetof the VM image (VMI) that contains data blocks necessary to complete the boot operation. Whenevera VM is booted, YOLO intercepts all read accesses and serves them directly from the boot image, whichhas been locally stored on fast access storage devices (e.g., memory, SSD, etc.). Creating boot imagesfor 900+ VMIs from Google Cloud shows that only 40 GB is needed to store all the mandatory data.Experiments show that YOLO can speed up VM boot duration 2-13 times under different resourcescontention with a negligible overhead on the I/O path. Finally, we underline that although YOLO hasbeen validated with a KVM environment, it does not require any modification on the hypervisor, theguest kernel nor the VM image (VMI) structure and can be used for several kinds of VMIs (in this study,Linux and Windows VMIs have been tested)Plusieurs travaux ont montré que le temps de démarrage d’une machine virtuelle (VM)peut s’étale sur plusieurs minutes dans des scénarios fortement consolidés. Ce délai est critique car ladurée de démarrage d’une VM définit la réactivité d’une application en fonction des fluctuations decharge (élasticité horizontale). Pour limiter au maximum le temps de démarrage d’une VM, nous avonsconçu le mécanisme YOLO (You Only Load Once). YOLO optimise le nombre d’opérations “disque”générées pendant le processus de démarrage. Pour ce faire, il s’appuie sur une nouvelle abstractionintitulée “image de démarrage” et correspondant à un sous-ensemble des données de l’image de la VM.Chaque fois qu’une machine virtuelle est démarrée, YOLO intercepte l’ensemble des accès en lectureafin de les satisfaire directement à partir de l’image de démarrage, qui a été stockée préalablement surdes périphériques de stockage à accès rapide (par exemple, mémoire, SSD, etc.). La création d’imagede démarrage pour les 900 types des VMs proposées sur l’infrastructure Cloud de Google représenteseulement 40 Go, ce qui est une quantité de données qui peut tout à fait être stockée sur chacundes noeuds de calculs. Les expériences réalisées montrent que YOLO permet accélérer la durée dedémarrage d’un facteur allant de 2 à 13 selon les différents scénarios de consolidation. Nous soulignonsque bien que YOLO ait été validé avec un environnement KVM, il ne nécessite aucune modificatfionsur l’hyperviseur, le noyau invité ou la structure d’image de la VM et peut donc être utilisé pourplusieurs types d’images (dans cette étude, nous testons des images Linux et Windows)

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5
    • …
    corecore