988 research outputs found

    Promises and Perils of Mining Software Package Ecosystem Data

    Full text link
    The use of third-party packages is becoming increasingly popular and has led to the emergence of large software package ecosystems with a maze of inter-dependencies. Since the reliance on these ecosystems enables developers to reduce development effort and increase productivity, it has attracted the interest of researchers: understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities, to name a few examples. But the reality of these ecosystems also poses challenges to software engineering researchers, such as: How do we obtain the complete network of dependencies along with the corresponding versioning information? What are the boundaries of these package ecosystems? How do we consistently detect dependencies that are declared but not used? How do we consistently identify developers within a package ecosystem? How much of the ecosystem do we need to understand to analyse a single component? How well do our approaches generalise across different programming languages and package ecosystems? In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.Comment: Submitted as a Book Chapte

    Demystifying Compiler Unstable Feature Usage and Impacts in the Rust Ecosystem

    Full text link
    Rust programming language is gaining popularity rapidly in building reliable and secure systems due to its security guarantees and outstanding performance. To provide extra functionalities, the Rust compiler introduces Rust unstable features (RUF) to extend compiler functionality, syntax, and standard library support. However, these features are unstable and may get removed, introducing compilation failures to dependent packages. Even worse, their impacts propagate through transitive dependencies, causing large-scale failures in the whole ecosystem. Although RUF is widely used in Rust, previous research has primarily concentrated on Rust code safety, with the usage and impacts of RUF from the Rust compiler remaining unexplored. Therefore, we aim to bridge this gap by systematically analyzing the RUF usage and impacts in the Rust ecosystem. We propose novel techniques for extracting RUF precisely, and to assess its impact on the entire ecosystem quantitatively, we accurately resolve package dependencies. We have analyzed the whole Rust ecosystem with 590K package versions and 140M transitive dependencies. Our study shows that the Rust ecosystem uses 1000 different RUF, and at most 44% of package versions are affected by RUF, causing compiling failures for at most 12%. To mitigate wide RUF impacts, we further design and implement a RUF-compilation-failure recovery tool that can recover up to 90% of the failure. We believe our techniques, findings, and tools can help to stabilize the Rust compiler, ultimately enhancing the security and reliability of the Rust ecosystem.Comment: Published in ICSE'2024 Conference: https://conf.researchr.org/details/icse-2024/icse-2024-research-track/6/Demystifying-Compiler-Unstable-Feature-Usage-and-Impacts-in-the-Rust-Ecosystem. Project webiste: https://sites.google.com/view/ruf-study/home. Released Source Code Zonodo: https://zenodo.org/records/828937

    Avatud lähtekoodiga tarkvaraprojektide vearaportite ja tehniliste sõltuvuste haldamise analüüsimine

    Get PDF
    Nüüdisaegses tarkvaraarenduses kasutatakse avatud lähtekoodiga tarkvara komponente, et vähendada korratava töö hulka. Tarkvaraarendajad lisavad vaba lähtekoodiga komponente oma projektidesse, omamata ülevaadet kasutatud komponentide arendamisest ja hooldamisest. Selle töö eesmärk on analüüsida tarkvaraprojektide vearaporteid ja sõltuvuste haldamist ning arendada välja kohased meetodid. Tarkvaraprojektides kasutatakse töö organiseerimiseks veahaldussüsteeme, mille abil hallatakse tööülesandeid, vearaporteid ja uusi kasutajanõudeid. Enamat kui 4000 avatud lähtekoodiga projekti analüüsides selgus, et paljud vearaportid jäävad pikaks ajaks lahendamata. Muu hulgas võib nii ka mõni kriitiline turvaviga parandamata jääda. Doktoritöös arendatakse välja meetod, mis võimaldab automaatselt hinnata vearaporti lahendamiseks kuluvat aega. Meetod põhineb veahaldussüsteemi talletunud andmete analüüsil. Vearaporti eluaja hindamine aitab projektiosalistel prioriseerida tööülesandeid ja planeerida ressursse. Töö teises osas uuritakse, kuidas avatud lähtekoodiga projektide koodis kolmanda poole komponente kasutatakse. Tarkvaraarendajad kasutavad varem väljaarendatud komponente, et kiirendada arendust ja vähendada korratava töö hulka. Samamoodi kasutavad spetsiifilised komponendid veel omakorda teisi komponente, misläbi moodustub komponentide vaheliste seoste kaudu sõltuvuslik võrgustik. Selles doktoritöös analüüsitakse sõltuvuste võrgustikku populaarsete programmeerimiskeelte näidetel. Töö käigus arendatud meetod on rakendatav sõltuvuste võrgustiku struktuuri ja kasvu analüüsimiseks. Töös demonstreeritakse, kuidas võrgustiku struktuuri analüüsi abil saab hinnata tarkvaraprojektide riski hõlmata sõltuvusahela kaudu mõni turvaviga. Doktoritöös arendatud meetodid ja tulemused aitavad avatud lähtekoodiga projektide vearaportite ja tehniliste sõltuvuste haldamise praktikat läbipaistvamaks muuta.Modern software development relies on open-source software to facilitate reuse and reduce redundant work. Software developers use open-source packages in their projects without having insights into how these components are being developed and maintained. The aim of this thesis is to develop approaches for analyzing issue and dependency management in software projects. Software projects organize their work with issue trackers, tools for tracking issues such as development tasks, bug reports, and feature requests. By analyzing issue handling in more than 4,000 open-source projects, we found that many issues are left open for long periods of time, which can result in bugs and vulnerabilities not being fixed in a timely manner. This thesis proposes a method for predicting the amount of time it takes to resolve an issue by using the historical data available in issue trackers. Methods for predicting issue lifetime can help software project managers to prioritize issues and allocate resources accordingly. Another problem studied in this thesis is how software dependencies are used. Software developers often include third-party open-source software packages in their project code as a dependency. The included dependencies can also have their own dependencies. A complex network of dependency relationships exists among open-source software packages. This thesis analyzes the structure and the evolution of dependency networks of three popular programming languages. We propose an approach to measure the growth and the evolution of dependency networks. This thesis demonstrates that dependency network analysis can quantify what is the likelihood of acquiring vulnerabilities through software packages and how it changes over time. The approaches and findings developed here could help to bring transparency into open-source projects with respect to how issues are handled, or dependencies are updated

    Untriviality of Trivial Packages

    Get PDF
    Nowadays, developing software would be unthinkable without the use of third-party packages. Although such code reuse helps to achieve rapid continuous delivery of software to end-users, blindly reusing code has its pitfalls. Prior work investigated the rationale for using packages of small size, known as trivial packages, that implement simple functionality. This prior work showed that, although these trivial packages are simple, they are popular and prevalent in the \npm ecosystem. This popularity and prevalence of trivial packages piqued our interest in questioning; first, the `triviality' of these packages and, second, the impact of using these packages on the quality of the client software applications. To better understand the `triviality' of trivial packages and their impact, in this thesis we report on two large scale empirical studies. In both studies, we mine a large set of JavaScript applications that use trivial \npm packages. In the first study, we evaluate the `triviality' of these packages from two complementary points of view: based on application usage and ecosystem usage. Our result shows that trivial packages are being used in important JavaScript files, by the means of their `centrality', in software applications. Additionally, by analyzing all external package API calls in these JavaScript files, we find that a high percentage of these API calls are attributed to trivial packages. Therefore, these packages play a significant role in JavaScript files. Furthermore, in the package dependency network, we observe that 16.8% packages are trivial and in some cases removing a trivial package can break approximately 30% of the packages in the ecosystem. In the second study, we started by understanding the circumstances which incorporate trivial packages in software applications. We analyze and classify commits that introduce trivial packages into software applications. We notice that developers resort to trivial packages while performing a wild range of development tasks that are mostly related to `Building' and `Refactoring'. We empirically evaluate the bugginess of the files and applications that use trivial packages. Our result shows that JavaScript files and applications that use trivial packages tend to have a higher percentage of bug-fixing commits than files and applications that do not depend on trivial packages. Overall, the findings of our thesis indicate that although smaller in size and complexity, trivial packages are highly depended on packages. These packages may be trivial by the means of size, their utility in software applications suggests that their role is not so trivial

    Software tools for conducting real-time information processing and visualization in industry: an up-to-date review

    Get PDF
    The processing of information in real-time (through the processing of complex events) has become an essential task for the optimal functioning of manufacturing plants. Only in this way can artificial intelligence, data extraction, and even business intelligence techniques be applied, and the data produced daily be used in a beneficent way, enhancing automation processes and improving service delivery. Therefore, professionals and researchers need a wide range of tools to extract, transform, and load data in real-time efficiently. Additionally, the same tool supports or at least facilitates the visualization of this data intuitively and interactively. The review presented in this document aims to provide an up-to-date review of the various tools available to perform these tasks. Of the selected tools, a brief description of how they work, as well as the advantages and disadvantages of their use, will be presented. Furthermore, a critical analysis of overall operation and performance will be presented. Finally, a hybrid architecture that aims to synergize all tools and technologies is presented and discussed.This work is funded by “FCT—Fundação para a Ciência e Tecnologia” within the R&D Units Project Scope: UIDB/00319/2020. The grants of R.S., R.M., A.M., and N.L. are supported by the European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internalization Programme (COMPETE 2020). [Project n. 039479. Funding Reference: POCI-01-0247-FEDER-039479]

    Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement

    Full text link
    Deep learning (DL) package supply chains (SCs) are critical for DL frameworks to remain competitive. However, vital knowledge on the nature of DL package SCs is still lacking. In this paper, we explore the domains, clusters, and disengagement of packages in two representative PyPI DL package SCs to bridge this knowledge gap. We analyze the metadata of nearly six million PyPI package distributions and construct version-sensitive SCs for two popular DL frameworks: TensorFlow and PyTorch. We find that popular packages (measured by the number of monthly downloads) in the two SCs cover 34 domains belonging to eight categories. Applications, Infrastructure, and Sciences categories account for over 85% of popular packages in either SC and TensorFlow and PyTorch SC have developed specializations on Infrastructure and Applications packages respectively. We employ the Leiden community detection algorithm and detect 131 and 100 clusters in the two SCs. The clusters mainly exhibit four shapes: Arrow, Star, Tree, and Forest with increasing dependency complexity. Most clusters are Arrow or Star, but Tree and Forest clusters account for most packages (Tensorflow SC: 70%, PyTorch SC: 90%). We identify three groups of reasons why packages disengage from the SC (i.e., remove the DL framework and its dependents from their installation dependencies): dependency issues, functional improvements, and ease of installation. The most common disengagement reason in the two SCs are different. Our study provides rich implications on the maintenance and dependency management practices of PyPI DL SCs.Comment: Manuscript submitted to ACM Transactions on Software Engineering and Methodolog
    corecore