3 research outputs found
A First Look at the Deprecation of RESTful APIs: An Empirical Study
REpresentational State Transfer (REST) is considered as one standard software
architectural style to build web APIs that can integrate software systems over
the internet. However, while connecting systems, RESTful APIs might also break
the dependent applications that rely on their services when they introduce
breaking changes, e.g., an older version of the API is no longer supported. To
warn developers promptly and thus prevent critical impact on downstream
applications, a deprecated-removed model should be followed, and
deprecation-related information such as alternative approaches should also be
listed. While API deprecation analysis as a theme is not new, most existing
work focuses on non-web APIs, such as the ones provided by Java and Android. To
investigate RESTful API deprecation, we propose a framework called RADA
(RESTful API Deprecation Analyzer). RADA is capable of automatically
identifying deprecated API elements and analyzing impacted operations from an
OpenAPI specification, a machine-readable profile for describing RESTful web
service. We apply RADA on 2,224 OpenAPI specifications of 1,368 RESTful APIs
collected from APIs.guru, the largest directory of OpenAPI specifications.
Based on the data mined by RADA, we perform an empirical study to investigate
how the deprecated-removed protocol is followed in RESTful APIs and
characterize practices in RESTful API deprecation. The results of our study
reveal several severe deprecation-related problems in existing RESTful APIs.
Our implementation of RADA and detailed empirical results are publicly
available for future intelligent tools that could automatically identify and
migrate usage of deprecated RESTful API operations in client code
PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software
The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse. This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset’s comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model’s training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and crosscutting questions.
Our artifact is available at https://github.com/PurdueDualityLab/PeaTMOSS-Artifact
PeaTMOSS: Mining Pre-Trained Models in Open-Source Software
Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the widespread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos