17 research outputs found

    Applications and Techniques for Fast Machine Learning in Science

    Get PDF
    In this community review report, we discuss applications and techniques for fast machine learning (ML) in science - the concept of integrating powerful ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs

    Nonlinear Dynamics

    Get PDF
    This volume covers a diverse collection of topics dealing with some of the fundamental concepts and applications embodied in the study of nonlinear dynamics. Each of the 15 chapters contained in this compendium generally fit into one of five topical areas: physics applications, nonlinear oscillators, electrical and mechanical systems, biological and behavioral applications or random processes. The authors of these chapters have contributed a stimulating cross section of new results, which provide a fertile spectrum of ideas that will inspire both seasoned researches and students

    Big Data Application and System Co-optimization in Cloud and HPC Environment

    Get PDF
    The emergence of big data requires powerful computational resources and memory subsystems that can be scaled efficiently to accommodate its demands. Cloud is a new well-established computing paradigm that can offer customized computing and memory resources to meet the scalable demands of big data applications. In addition, the flexible pay-as-you-go pricing model offers opportunities for using large scale of resources with low cost and no infrastructure maintenance burdens. High performance computing (HPC) on the other hand also has powerful infrastructure that has potential to support big data applications. In this dissertation, we explore the application and system co-optimization opportunities to support big data in both cloud and HPC environments. Specifically, we explore the unique features of both application and system to seek overlooked optimization opportunities or tackle challenges that are difficult to be addressed by only looking at the application or system individually. Based on the characteristics of the workloads and their underlying systems to derive the optimized deployment and runtime schemes, we divide the workflow into four categories: 1) memory intensive applications; 2) compute intensive applications; 3) both memory and compute intensive applications; 4) I/O intensive applications.When deploying memory intensive big data applications to the public clouds, one important yet challenging problem is selecting a specific instance type whose memory capacity is large enough to prevent out-of-memory errors while the cost is minimized without violating performance requirements. In this dissertation, we propose two techniques for efficient deployment of big data applications with dynamic and intensive memory footprint in the cloud. The first approach builds a performance-cost model that can accurately predict how, and by how much, virtual memory size would slow down the application and consequently, impact the overall monetary cost. The second approach employs a lightweight memory usage prediction methodology based on dynamic meta-models adjusted by the application's own traits. The key idea is to eliminate the periodical checkpointing and migrate the application only when the predicted memory usage exceeds the physical allocation. When applying compute intensive applications to the clouds, it is critical to make the applications scalable so that it can benefit from the massive cloud resources. In this dissertation, we first use the Kirchhoff law, which is one of the most widely used physical laws in many engineering principles, as an example workload for our study. The key challenge of applying the Kirchhoff law to real-world applications at scale lies in the high, if not prohibitive, computational cost to solve a large number of nonlinear equations. In this dissertation, we propose a high-performance deep-learning-based approach for Kirchhoff analysis, namely HDK. HDK employs two techniques to improve the performance: (i) early pruning of unqualified input candidates which simplify the equation and select a meaningful input data range; (ii) parallelization of forward labelling which execute steps of the problem in parallel. When it comes to both memory and compute intensive applications in clouds, we use blockchain system as a benchmark. Existing blockchain frameworks exhibit a technical barrier for many users to modify or test out new research ideas in blockchains. To make it worse, many advantages of blockchain systems can be demonstrated only at large scales, which are not always available to researchers. In this dissertation, we develop an accurate and efficient emulating system to replay the execution of large-scale blockchain systems on tens of thousands of nodes in the cloud. For I/O intensive applications, we observe one important yet often neglected side effect of lossy scientific data compression. Lossy compression techniques have demonstrated promising results in significantly reducing the scientific data size while guaranteeing the compression error bounds, but the compressed data size is often highly skewed and thus impact the performance of parallel I/O. Therefore, we believe it is critical to pay more attention to the unbalanced parallel I/O caused by lossy scientific data compression

    Runtime scheduling and updating for deep learning applications

    Get PDF
    Recent decades have witnessed the breakthrough of deep learning algorithms, which have been widely used in many areas. Typically, deployment of deep learning applications consists of compute-intensive training and latency-sensitive inference. To support deep learning applications, enterprises build large-scale computing clusters composed of expensive accelerators, such as GPUs, FPGAs or other domain-specific ASICs. However, it is challenging for deep learning applications to achieve high resource utilization and maintain high accuracy in the face of dynamic workloads. On the one hand, the workload of deep learning tasks always changes over time, which leads to a gap between the required resources and statically allocated resources. On the other hand, the distribution of input data may also change over time, and the accuracy of inference could decrease before updating the model. In this thesis, we present a new deep learning system architecture which can schedule and update deep learning applications at runtime to efficiently handle dynamic workloads. We identify and study three key components. (i) PipeSwitch: A deep learning system that allows multiple deep learning applications to time-share the same GPU with the entire GPU memory and millisecond-scale switching overhead. PipeSwitch enables unused cycles of inference applications to be dynamically filled by training or other inference applications. With PipeSwitch, GPU utilization can be significantly improved without sacrificing service level objectives. (ii) DistMind: A disaggregated deep learning system that enables efficient multiplexing of deep learning applications with near-optimal resource utilization. DistMind decouples compute from host memory, and exposes the abstractions of a GPU pool and a memory pool, each of which can be independently provisioned and dynamically allocated to deep learning tasks. (iii) RegexNet: A payload-based, automated, reactive recovery system for web services under regular expression denial of service attacks. RegexNet adopts a deep learning model, which is updated constantly in a feedback loop during runtime, to classify payloads of upcoming HTTP requests. We have built system prototypes for these components, and integrated them with existing software. Our evaluation on a variety of environments and configurations shows the benefits of our solution
    corecore