1,336 research outputs found

    Untriviality of Trivial Packages

    Get PDF
    Nowadays, developing software would be unthinkable without the use of third-party packages. Although such code reuse helps to achieve rapid continuous delivery of software to end-users, blindly reusing code has its pitfalls. Prior work investigated the rationale for using packages of small size, known as trivial packages, that implement simple functionality. This prior work showed that, although these trivial packages are simple, they are popular and prevalent in the \npm ecosystem. This popularity and prevalence of trivial packages piqued our interest in questioning; first, the `triviality' of these packages and, second, the impact of using these packages on the quality of the client software applications. To better understand the `triviality' of trivial packages and their impact, in this thesis we report on two large scale empirical studies. In both studies, we mine a large set of JavaScript applications that use trivial \npm packages. In the first study, we evaluate the `triviality' of these packages from two complementary points of view: based on application usage and ecosystem usage. Our result shows that trivial packages are being used in important JavaScript files, by the means of their `centrality', in software applications. Additionally, by analyzing all external package API calls in these JavaScript files, we find that a high percentage of these API calls are attributed to trivial packages. Therefore, these packages play a significant role in JavaScript files. Furthermore, in the package dependency network, we observe that 16.8% packages are trivial and in some cases removing a trivial package can break approximately 30% of the packages in the ecosystem. In the second study, we started by understanding the circumstances which incorporate trivial packages in software applications. We analyze and classify commits that introduce trivial packages into software applications. We notice that developers resort to trivial packages while performing a wild range of development tasks that are mostly related to `Building' and `Refactoring'. We empirically evaluate the bugginess of the files and applications that use trivial packages. Our result shows that JavaScript files and applications that use trivial packages tend to have a higher percentage of bug-fixing commits than files and applications that do not depend on trivial packages. Overall, the findings of our thesis indicate that although smaller in size and complexity, trivial packages are highly depended on packages. These packages may be trivial by the means of size, their utility in software applications suggests that their role is not so trivial

    Modeling User-Affected Software Properties for Open Source Software Supply Chains

    Get PDF
    Background: Open Source Software development community relies heavily on users of the software and contributors outside of the core developers to produce top-quality software and provide long-term support. However, the relationship between a software and its contributors in terms of exactly how they are related through dependencies and how the users of a software affect many of its properties are not very well understood. Aim: My research covers a number of aspects related to answering the overarching question of modeling the software properties affected by users and the supply chain structure of software ecosystems, viz. 1) Understanding how software usage affect its perceived quality; 2) Estimating the effects of indirect usage (e.g. dependent packages) on software popularity; 3) Investigating the patch submission and issue creation patterns of external contributors; 4) Examining how the patch acceptance probability is related to the contributors\u27 characteristics. 5) A related topic, the identification of bots that commit code, aimed at improving the accuracy of these and other similar studies was also investigated. Methodology: Most of the Research Questions are addressed by studying the NPM ecosystem, with data from various sources like the World of Code, GHTorrent, and the GiHub API. Different supervised and unsupervised machine learning models, including Regression, Random Forest, Bayesian Networks, and clustering, were used to answer appropriate questions. Results: 1) Software usage affects its perceived quality even after accounting for code complexity measures. 2) The number of dependents and dependencies of a software were observed to be able to predict the change in its popularity with good accuracy. 3) Users interact (contribute issues or patches) primarily with their direct dependencies, and rarely with transitive dependencies. 4) A user\u27s earlier interaction with the repository to which they are contributing a patch, and their familiarity with related topics were important predictors impacting the chance of a pull request getting accepted. 5) Developed BIMAN, a systematic methodology for identifying bots. Conclusion: Different aspects of how users and their characteristics affect different software properties were analyzed, which should lead to a better understanding of the complex interaction between software developers and users/ contributors

    A Closer Look at the Security Risks in the Rust Ecosystem

    Full text link
    Rust is an emerging programming language designed for the development of systems software. To facilitate the reuse of Rust code, crates.io, as a central package registry of the Rust ecosystem, hosts thousands of third-party Rust packages. The openness of crates.io enables the growth of the Rust ecosystem but comes with security risks by severe security advisories. Although Rust guarantees a software program to be safe via programming language features and strict compile-time checking, the unsafe keyword in Rust allows developers to bypass compiler safety checks for certain regions of code. Prior studies empirically investigate the memory safety and concurrency bugs in the Rust ecosystem, as well as the usage of unsafe keywords in practice. Nonetheless, the literature lacks a systematic investigation of the security risks in the Rust ecosystem. In this paper, we perform a comprehensive investigation into the security risks present in the Rust ecosystem, asking ``what are the characteristics of the vulnerabilities, what are the characteristics of the vulnerable packages, and how are the vulnerabilities fixed in practice?''. To facilitate the study, we first compile a dataset of 433 vulnerabilities, 300 vulnerable code repositories, and 218 vulnerability fix commits in the Rust ecosystem, spanning over 7 years. With the dataset, we characterize the types, life spans, and evolution of the disclosed vulnerabilities. We then characterize the popularity, categorization, and vulnerability density of the vulnerable Rust packages, as well as their versions and code regions affected by the disclosed vulnerabilities. Finally, we characterize the complexity of vulnerability fixes and localities of corresponding code changes, and inspect how practitioners fix vulnerabilities in Rust packages with various localities.Comment: preprint of accepted TOSEM pape

    Avatud lĂ€htekoodiga tarkvaraprojektide vearaportite ja tehniliste sĂ”ltuvuste haldamise analĂŒĂŒsimine

    Get PDF
    NĂŒĂŒdisaegses tarkvaraarenduses kasutatakse avatud lĂ€htekoodiga tarkvara komponente, et vĂ€hendada korratava töö hulka. Tarkvaraarendajad lisavad vaba lĂ€htekoodiga komponente oma projektidesse, omamata ĂŒlevaadet kasutatud komponentide arendamisest ja hooldamisest. Selle töö eesmĂ€rk on analĂŒĂŒsida tarkvaraprojektide vearaporteid ja sĂ”ltuvuste haldamist ning arendada vĂ€lja kohased meetodid. Tarkvaraprojektides kasutatakse töö organiseerimiseks veahaldussĂŒsteeme, mille abil hallatakse tĂ¶Ă¶ĂŒlesandeid, vearaporteid ja uusi kasutajanĂ”udeid. Enamat kui 4000 avatud lĂ€htekoodiga projekti analĂŒĂŒsides selgus, et paljud vearaportid jÀÀvad pikaks ajaks lahendamata. Muu hulgas vĂ”ib nii ka mĂ”ni kriitiline turvaviga parandamata jÀÀda. Doktoritöös arendatakse vĂ€lja meetod, mis vĂ”imaldab automaatselt hinnata vearaporti lahendamiseks kuluvat aega. Meetod pĂ”hineb veahaldussĂŒsteemi talletunud andmete analĂŒĂŒsil. Vearaporti eluaja hindamine aitab projektiosalistel prioriseerida tĂ¶Ă¶ĂŒlesandeid ja planeerida ressursse. Töö teises osas uuritakse, kuidas avatud lĂ€htekoodiga projektide koodis kolmanda poole komponente kasutatakse. Tarkvaraarendajad kasutavad varem vĂ€ljaarendatud komponente, et kiirendada arendust ja vĂ€hendada korratava töö hulka. Samamoodi kasutavad spetsiifilised komponendid veel omakorda teisi komponente, mislĂ€bi moodustub komponentide vaheliste seoste kaudu sĂ”ltuvuslik vĂ”rgustik. Selles doktoritöös analĂŒĂŒsitakse sĂ”ltuvuste vĂ”rgustikku populaarsete programmeerimiskeelte nĂ€idetel. Töö kĂ€igus arendatud meetod on rakendatav sĂ”ltuvuste vĂ”rgustiku struktuuri ja kasvu analĂŒĂŒsimiseks. Töös demonstreeritakse, kuidas vĂ”rgustiku struktuuri analĂŒĂŒsi abil saab hinnata tarkvaraprojektide riski hĂ”lmata sĂ”ltuvusahela kaudu mĂ”ni turvaviga. Doktoritöös arendatud meetodid ja tulemused aitavad avatud lĂ€htekoodiga projektide vearaportite ja tehniliste sĂ”ltuvuste haldamise praktikat lĂ€bipaistvamaks muuta.Modern software development relies on open-source software to facilitate reuse and reduce redundant work. Software developers use open-source packages in their projects without having insights into how these components are being developed and maintained. The aim of this thesis is to develop approaches for analyzing issue and dependency management in software projects. Software projects organize their work with issue trackers, tools for tracking issues such as development tasks, bug reports, and feature requests. By analyzing issue handling in more than 4,000 open-source projects, we found that many issues are left open for long periods of time, which can result in bugs and vulnerabilities not being fixed in a timely manner. This thesis proposes a method for predicting the amount of time it takes to resolve an issue by using the historical data available in issue trackers. Methods for predicting issue lifetime can help software project managers to prioritize issues and allocate resources accordingly. Another problem studied in this thesis is how software dependencies are used. Software developers often include third-party open-source software packages in their project code as a dependency. The included dependencies can also have their own dependencies. A complex network of dependency relationships exists among open-source software packages. This thesis analyzes the structure and the evolution of dependency networks of three popular programming languages. We propose an approach to measure the growth and the evolution of dependency networks. This thesis demonstrates that dependency network analysis can quantify what is the likelihood of acquiring vulnerabilities through software packages and how it changes over time. The approaches and findings developed here could help to bring transparency into open-source projects with respect to how issues are handled, or dependencies are updated

    Sustainability of Open Source Software Projects: On the Influence of Technical Interdependencies in Software Ecosystems on Developer Participation

    Get PDF
    In the community-based model of open source software (OSS) development, OSS projects are built and maintained by developers that voluntarily contribute their skills, knowledge, and time, thus making them dependent on their continued participation. Therefore, the question of how projects can attract and retain developers is of major concern for their sustainability. OSS projects are embedded into a complex network of technical interdependent projects that emerges from building upon and reusing existing software components. In these so-called software ecosystems, the issue of sustained participation is not only a concern of a single project but also other dependent projects. However, the role and influence of these interdependencies between projects have so far been neglected by Information Systems researchers. This dissertation thus asks: _How do technical interdependencies in software ecosystems influence the sustainability of open source software projects?_ To answer this question, this dissertation consists of three independent empirical studies that focus on three aspects of how technical interdependencies influence developer participation and thus contribute to the sustainability of open source projects: (1) the ability to attract developers, (2) the influences on developers' participation decision, and (3) the retention of developers in a project. This dissertation finds that OSS projects attract more developers when depending on other projects and their ability to retain developers increases with the number of shared developers with other technical interrelated projects. Furthermore, the participation decisions of developers are also positively influenced by these technical relations. Together, these studies contribute to the body of knowledge on developer participation by highlighting the role of technical interdependencies for the overall sustainability of open source projects
    • 

    corecore