2,814 research outputs found

    TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

    Full text link
    Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

    JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

    Get PDF
    The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least engineering cost for enabling computational acceleration on multiple architectures while programmers are only required to add meta information upon sequential code. Optimizations for obtaining the best possible efficiency, however, are often challenging. The insertions of directives by the programmer can lead to side-effects that limit the available compiler optimization possible, which could result in performance degradation. This is exacerbated when targeting multi-GPU systems, as pragmas do not automatically adapt to such systems, and require expensive and time consuming code adjustment by programmers. This paper introduces JACC, an OpenACC runtime framework which enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler. We add a versatile code-translation method for multi-device utilization by which manually-optimized applications can be distributed automatically while keeping original code structure and parallelism. We show in some cases nearly linear scaling on the part of kernel execution with the NVIDIA V100 GPUs. While adaptively using multi-GPUs, the resulting performance improvements amortize the latency of GPU-to-GPU communications.Comment: Extended version of a paper to appear in: Proceedings of the 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), December 17-18, 202

    ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

    Full text link
    Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach simultaneously realizes less computation, less memory access, and high memory throughput. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs

    Predicting temporary wetland plant community responses to changes in the hydroperiod

    Get PDF
    The expected changes on rainfall in the next decades may cause significant changes of the hydroperiod of temporary wetlands and, consequently, shifts on plant community distributions. Predicting plant community responses to changes in the hydroperiod is a key issue for conservation and management of temporary wetlands. We present a predictive distribution model for Arthrocnemum macrostachyum communities in the Doñana wetland (Southern Spain). Logistic regression was used to fit the model using the number of days of inundation and the mean water height as predictors. The internal validation of the model yielded good performance measures. The model was applied to a set of expected scenarios of changes in the hydroperiod to anticipate the most likely shifts in the distribution of Arthrocnemum macrostachyum communities

    Studies on the mechanism of action of the antimicrobial S-linked glycopeptide sublancin

    Get PDF
    Infectious diseases are a continuing threat to human health. In particular, the rapid development of bacterial antibiotic resistance not only decreases the effectiveness of known antibiotics, but also increases the need for the ongoing discovery of novel drugs. Since the discovery of penicillin, natural products have become a great source of templates for the development of new antibiotics. Derivatization of known drugs is one approach commonly used to combat rapidly evolving bacterial strains. However, the mechanisms of actions of derivatized drugs are more often than not very similar to the parent compound, making it difficult to develop drugs with new and unique modes of action by generating analogs. Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are a rapidly expanding class of compounds with antimicrobial activity. Sublancin is one of five members of the glycocin family of RiPPs, and contains an unusual S-linked glycosylation. This unprecedented post-translational modification, as well as its increased stability, when compared to known RiPP antimicrobials, suggests a unique antibacterial mode of action. In an effort to understand the remarkable stability of sublancin, the three-dimensional NMR structure was solved, as described in chapter 2, revealing that hydrophobic interactions as well as hydrogen bonding are responsible for the stable and well-structured peptide. Unlike better-understood natural products, the molecular target of sublancin is currently unknown. In order to further understand how sublancin exerts its activity against bacteria, a number of sublancin analogues were made. These analogues were prepared either by heterologous expression followed by in vitro modification, as well as by solid phase peptide synthesis. The antimicrobial activity of all analogues was then assessed against sensitive bacteria and sublancin-resistant mutant strains. While sublancin exhibits sub-micromolar activity against Gram-positive bacteria, its molecular target is currently unknown. Chapter 3 describes studies focused on understanding sublancin’s mode of action. In chapter 4 we performed super resolution microscopy and determined that sublancin localizes to the cell membrane. Furthermore, the mechanism of action of the S-glycosyltransferase SunS, the enzyme responsible for installing an S-linked sugar onto sublancin, was studied in chapter 5, which provided insights into its enzymatic mechanism. The understanding of the biosynthesis of these unique peptides can aid in the bioengineering of other, more potent complex molecules

    Less can be more: loss of MHC functional diversity can reflect adaptation to novel conditions during fish invasions

    Get PDF
    The ability of invasive species to adapt to novel conditions depends on population size and environmental mismatch, but also on genetic variation. Away from their native range, invasive species confronted with novel selective pressures may display different levels of neutral versus functional genetic variation. However, the majority of invasion studies have only examined genetic variation at neutral markers, which may reveal little about how invaders adapt to novel environments. Salmonids are good model systems to examine adaptation to novel pressures because they have been translocated all over the world and represent major threats to freshwater biodiversity in the Southern Hemisphere, where they have become invasive. We examined patterns of genetic differentiation at seven putatively neutral (microsatellites) loci and one immune-related major histocompatibility complex (MHC class II-β) locus among introduced rainbow trout living in captivity (farmed) or under natural conditions (naturalized) in Chilean Patagonia. A significant positive association was found between differentiation at neutral and functional markers, highlighting the role of neutral evolutionary forces in shaping genetic variation at immune-related genes in salmonids. However, functional (MHC) genetic diversity (but not microsatellite diversity) decreased with time spent in the wild since introduction, suggesting that there was selection against alleles associated with captive rearing of donor populations that do not provide an advantage in the wild. Thus, although high genetic diversity may initially enhance fitness in translocated populations, it does not necessarily reflect invasion success, as adaptation to novel conditions may result in rapid loss of functional MHC diversity
    corecore