4 research outputs found

    An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain

    Get PDF
    Deep neural networks achieve state-of-the-art performance on many tasks, but require increasingly complex architectures and costly training procedures. Engineers can reduce costs by reusing a pre-trained model (PTM) and fine-tuning it for their own tasks. To facilitate software reuse, engineers collaborate around model hubs, collections of PTMs and datasets organized by problem domain. Although model hubs are now comparable in popularity and size to other software ecosystems, the associated PTM supply chain has not yet been examined from a software engineering perspective. We present an empirical study of artifacts and security features in 8 model hubs. We indicate the potential threat models and show that the existing defenses are insufficient for ensuring the security of PTMs. We compare PTM and traditional supply chains, and propose directions for further measurements and tools to increase the reliability of the PTM supply chain

    Robust Source Attribution of Synthetically Generated Western Blot Images

    No full text
    Retracted papers commonly include manipulations of images and figures that are unfit for publication. While some manipulations are benign, like increasing contrast or zooming in, others are designed to fool the intended audience. Recent improvements in generative computer vision pose a security risk to scientific review since generated images are often indistinguishable from authentic bioscience evidence; even to field experts. In this work, we improve upon previous attempts to detect synthetic images by attending to their differences in the frequency domain. Additionally, we solve the multi- class classification of synthetic Western blots and attribute Western blot images to their respective generative model architecture. We demonstrate that our method outperforms previous methods for synthetic Western blot detection; including efforts to classify JPEG compressed images. Please note that this is a technical report based on preliminary research results

    An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

    Get PDF
    Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks. Prior works have studied reuse practices for traditional software packages to guide software engineers towards better package maintenance and dependency management. We lack a similar foundation of knowledge to guide behaviors in pre-trained model ecosystems. In this work, we present the first empirical investigation of PTM reuse. We interviewed 12 practitioners from the most popular PTM ecosystem, Hugging Face, to learn the practices and challenges of PTM reuse. From this data, we model the decision-making process for PTM reuse. Based on the identified practices, we describe useful attributes for model reuse, including provenance, reproducibility, and portability. Three challenges for PTM reuse are missing attributes, discrepancies between claimed and actual performance, and model risks. We substantiate these identified challenges with systematic measurements in the Hugging Face ecosystem. Our work informs future directions on optimizing deep learning ecosystems by automated measuring useful attributes and potential attacks, and envision future research on infrastructure and standardization for model registries

    Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics

    No full text
    Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about process, not just product. In this work, we present PRiME (PRocess MEtrics), a tool for computing and visualizing process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor. We illustrate the value of longitudinal data and conclude with a research agenda. The tool\u27s demo video can be watched at this https URL. The source code can be found at this https URL
    corecore