1,573 research outputs found

    Technology readiness levels for machine learning systems

    Get PDF
    The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, where mission critical measures and robustness are ingrained in the development process. Drawing on experience in both spacecraft engineering and ML (from research through product across domain areas), we have developed a proven systems engineering approach for machine learning development and deployment. Our Machine Learning Technology Readiness Levels (MLTRL) framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for ML workflows, including key distinctions from traditional software engineering. Even more, MLTRL defines a lingua franca for people across teams and organizations to work collaboratively on artificial intelligence and machine learning technologies. Here we describe the framework and elucidate it with several real world use-cases of developing ML methods from basic research through productization and deployment, in areas such as medical diagnostics, consumer computer vision, satellite imagery, and particle physics

    Water Data Science: Data Driven Techniques, Training, and Tools for Improved Management of High Frequency Water Resources Data

    Get PDF
    Electronic sensors can measure water and climate conditions at high frequency and generate large quantities of observed data. This work addresses data management challenges associated with the volume and complexity of high frequency water data. We developed techniques for automatically reviewing data, created materials for training water data managers, and explored existing and emerging technologies for sensor data management. Data collected by sensors often include errors due to sensor failure or environmental conditions that need to be removed, labeled, or corrected before the data can be used for analysis. Manual review and correction of these data can be tedious and time consuming. To help automate these tasks, we developed a computer program that automatically checks the data for mistakes and attempts to fix them. This tool has the potential to save time and effort and is available to scientists and practitioners who use sensors to monitor water. Scientists may lack skillsets for working with sensor data because traditional engineering or science courses do not address how work with complex data with modern technology. We surveyed and interviewed instructors who teach courses related to “hydroinformatics” or “water data science” to understand challenges in incorporating data science techniques and tools into water resources teaching. Based on their feedback, we created educational materials that demonstrate how the articulated challenges can be effectively addressed to provide high-quality instruction. These materials are available online for students and teachers. In addition to skills for working with sensor data, scientists and engineers need tools for storing, managing, and sharing these data. Hydrologic information systems (HIS) help manage the data collected using sensors. HIS make sure that data can be effectively used by providing the computer infrastructure to get data from sensors in the field to secure data storage and then into the hands of scientists and others who use them. This work describes the evolution of software and standards that comprise HIS. We present the main components of HIS, describe currently available systems and gaps in technology or functionality, and then discuss opportunities for improved infrastructure that would make sensor data easier to collect, manage, and use. In short, we are trying to make sure that sensor data are good and useful; we’re helping instructors teach prospective data collectors and users about water and data; and we are making sure that the systems that enable collection, storage, management, and use of the data work smoothly

    Intelligent Resource Prediction for HPC and Scientific Workflows

    Get PDF
    Scientific workflows and high-performance computing (HPC) platforms are critically important to modern scientific research. In order to perform scientific experiments at scale, domain scientists must have knowledge and expertise in software and hardware systems that are highly complex and rapidly evolving. While computational expertise will be essential for domain scientists going forward, any tools or practices that reduce this burden for domain scientists will greatly increase the rate of scientific discoveries. One challenge that exists for domain scientists today is knowing the resource usage patterns of an application for the purpose of resource provisioning. A tool that accurately estimates these resource requirements would benefit HPC users in many ways, by reducing job failures and queue times on traditional HPC platforms and reducing costs on cloud computing platforms. To that end, we present Tesseract, a semi-automated tool that predicts resource usage for any application on any computing platform, from historical data, with minimal input from the user. We employ Tesseract to predict runtime, memory usage, and disk usage for a diverse set of scientific workflows, and in particular we show how these resource estimates can prevent under-provisioning. Finally, we leverage this core prediction capability to develop solutions for the related challenges of anomaly detection, cross-platform runtime prediction, and cost prediction

    Integrating Data Science and Earth Science

    Get PDF
    This open access book presents the results of three years collaboration between earth scientists and data scientist, in developing and applying data science methods for scientific discovery. The book will be highly beneficial for other researchers at senior and graduate level, interested in applying visual data exploration, computational approaches and scientifc workflows

    Technology readiness levels for machine learning systems

    Get PDF
    The development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, with mission critical measures and robustness throughout the process. Drawing on experience in both spacecraft engineering and machine learning (research through product across domain areas), we’ve developed a proven systems engineering approach for machine learning and artificial intelligence: the Machine Learning Technology Readiness Levels framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for machine learning workflows, including key distinctions from traditional software engineering, and a lingua franca for people across teams and organizations to work collaboratively on machine learning and artificial intelligence technologies. Here we describe the framework and elucidate with use-cases from physics research to computer vision apps to medical diagnostics

    Optimization and Prediction Techniques for Self-Healing and Self-Learning Applications in a Trustworthy Cloud Continuum

    Get PDF
    The current IT market is more and more dominated by the “cloud continuum”. In the “traditional” cloud, computing resources are typically homogeneous in order to facilitate economies of scale. In contrast, in edge computing, computational resources are widely diverse, commonly with scarce capacities and must be managed very efficiently due to battery constraints or other limitations. A combination of resources and services at the edge (edge computing), in the core (cloud computing), and along the data path (fog computing) is needed through a trusted cloud continuum. This requires novel solutions for the creation, optimization, management, and automatic operation of such infrastructure through new approaches such as infrastructure as code (IaC). In this paper, we analyze how artificial intelligence (AI)-based techniques and tools can enhance the operation of complex applications to support the broad and multi-stage heterogeneity of the infrastructural layer in the “computing continuum” through the enhancement of IaC optimization, IaC self-learning, and IaC self-healing. To this extent, the presented work proposes a set of tools, methods, and techniques for applications’ operators to seamlessly select, combine, configure, and adapt computation resources all along the data path and support the complete service lifecycle covering: (1) optimized distributed application deployment over heterogeneous computing resources; (2) monitoring of execution platforms in real time including continuous control and trust of the infrastructural services; (3) application deployment and adaptation while optimizing the execution; and (4) application self-recovery to avoid compromising situations that may lead to an unexpected failure.This research was funded by the European project PIACERE (Horizon 2020 research and innovation Program, under grant agreement no 101000162)

    Towards Advanced Monitoring for Scientific Workflows

    Full text link
    Scientific workflows consist of thousands of highly parallelized tasks executed in a distributed environment involving many components. Automatic tracing and investigation of the components' and tasks' performance metrics, traces, and behavior are necessary to support the end user with a level of abstraction since the large amount of data cannot be analyzed manually. The execution and monitoring of scientific workflows involves many components, the cluster infrastructure, its resource manager, the workflow, and the workflow tasks. All components in such an execution environment access different monitoring metrics and provide metrics on different abstraction levels. The combination and analysis of observed metrics from different components and their interdependencies are still widely unregarded. We specify four different monitoring layers that can serve as an architectural blueprint for the monitoring responsibilities and the interactions of components in the scientific workflow execution context. We describe the different monitoring metrics subject to the four layers and how the layers interact. Finally, we examine five state-of-the-art scientific workflow management systems (SWMS) in order to assess which steps are needed to enable our four-layer-based approach.Comment: Paper accepted in 2022 IEEE International Conference on Big Data Workshop SCDM 202
    • …
    corecore