Search CORE

6 research outputs found

The Role of Data Filtering in Open Source Software Ranking and Selection

Author: Malviya-Thakur Addi
Mockus Audris
Publication venue
Publication date: 18/01/2024
Field of study

Faced with over 100M open source projects most empirical investigations select a subset. Most research papers in leading venues investigated filtering projects by some measure of popularity with explicit or implicit arguments that unpopular projects are not of interest, may not even represent "real" software projects, or that less popular projects are not worthy of study. However, such filtering may have enormous effects on the results of the studies if and precisely because the sought-out response or prediction is in any way related to the filtering criteria. We exemplify the impact of this practice on research outcomes: how filtering of projects listed on GitHub affects the assessment of their popularity. We randomly sample over 100,000 repositories and use multiple regression to model the number of stars (a proxy for popularity) based on the number of commits, the duration of the project, the number of authors, and the number of core developers. Comparing control with the entire dataset with a filtered model projects having ten or more authors we find that while certain characteristics of the repository consistently predict popularity, the filtering process significantly alters the relation ships between these characteristics and the response. The number of commits exhibited a positive correlation with popularity in the control sample but showed a negative correlation in the filtered sample. These findings highlight the potential biases introduced by data filtering and emphasize the need for careful sample selection in empirical research of mining software repositories. We recommend that empirical work should either analyze complete datasets such as World of Code, or employ stratified random sampling from a complete dataset to ensure that filtering is not biasing the results.Comment: International Workshop on Methodological Issues with Empirical Studies in Software Engineering (WSESE 2024

arXiv.org e-Print Archive

An Open Community-Driven Model For Sustainable Research Software: Sustainable Research Software Institute

Author: Hoffman Bill
Katz Daniel S.
Kellerman John
Malviya-Thakur Addi
Raybourn Elaine M.
Robinson Dana
Roundy Clark
Watson Gregory R.
Publication venue
Publication date: 30/08/2023
Field of study

Research software plays a crucial role in advancing scientific knowledge, but ensuring its sustainability, maintainability, and long-term viability is an ongoing challenge. To address these concerns, the Sustainable Research Software Institute (SRSI) Model presents a comprehensive framework designed to promote sustainable practices in the research software community. This white paper provides an in-depth overview of the SRSI Model, outlining its objectives, services, funding mechanisms, collaborations, and the significant potential impact it could have on the research software community. It explores the wide range of services offered, diverse funding sources, extensive collaboration opportunities, and the transformative influence of the SRSI Model on the research software landscapeComment: 13 pages, 1 figur

arXiv.org e-Print Archive

Transitioning ECP Software Technology into a Foundation for Sustainable Research Software

Author: Hoffman Bill
Katz Daniel S.
Kellerman John
Malviya-Thakur Addi
Raybourn Elaine M.
Robinson Dana
Roundy Clark
Watson Gregory R.
Publication venue
Publication date: 30/08/2023
Field of study

Research software plays a crucial role in advancing scientific knowledge, but ensuring its sustainability, maintainability, and long-term viability is an ongoing challenge. The Sustainable Research Software Institute (SRSI) Model has been designed to address the concerns, and presents a comprehensive framework designed to promote sustainable practices in the research software community. However the SRSI Model does not address the transitional requirements for the Exascale Computing Project (ECP) Software Technology (ECP-ST) focus area specifically. This white paper provides an overview and detailed description of how ECP-ST will transition into the SRSI in a compressed time frame that a) meets the needs of the ECP end-of-technical-activities deadline; and b) ensures the continuity of the sustainability efforts that are already underway.Comment: 7 pages, 1 figur

arXiv.org e-Print Archive

Giving RSEs a Larger Stage through the Better Scientific Software Fellowship

Author: Arora Ritu
Beattie Keith
Bernholdt David E.
Bratt Sarah E.
Godoy William F.
Katz Daniel S.
Laguna Ignacio
Maji Amiya K.
Mudafort Rafael M.
Rouson Damian
Rubio-González Cindy
Sukhija Nitin
Thakur Addi Malviya
Vahi Karan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2022
Field of study

The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). This paper provides case studies from several of the program's participants to illustrate some of the diverse ways BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as BSSwF can be a valuable means to recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations and ideas for a larger audience.Comment: submitted to Computing in Science & Engineering (CiSE), Special Issue on the Future of Research Software Engineers in the U

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Giving Research Software Engineers a Larger Stage Through the Better Scientific Software Fellowship

Author: Arora Ritu
Beattie Keith
Bernholdt David
Bratt Sarah
Godoy William
Katz Daniel
Laguna Ignacio
Maji Amiya
Malviya-Thakur Addi
Mudafort Rafael
Rouson Damian
Rubio-Gonzalez Cindy
Sukhija Nitin
Vahi Karan
Publication venue: eScholarship, University of California
Publication date: 01/10/2022
Field of study

The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. The BSSwF’s vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). Case studies from several of the program’s participants illustrate the diverse ways the BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as the BSSwF can help recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations, and ideas for a larger audience

eScholarship - University of California

Recommended from our members

A Cast of Thousands: How the IDEAS Productivity Project Has Advanced Software Productivity and Sustainability

Author: Almgren Ann
Bartlett Roscoe A
Bernholdt David E
Cranfill Kita
Dubey Anshu
Fickas Stephen
Frederick Don
Godoy William F
Gonsiorowski Elsa
Grubel Patricia A
Gupta Rinku
Hartman-Baker Rebecca
Heroux Michael A
Huebl Axel
Lynch Rose
Malviya-Thakur Addi
Marques Osni
McInnes Lois Curfman
Milewicz Reed
Miller Mark C
Moulton J David
Mundt Miranda R
Nam Hai Ah
Norris Boyana
Palmer Erik
Parete-Koon Suzanne
Phinney Megan
Raybourn Elaine M
Riley Katherine
Rogers David M
Sims Benjamin
Stevens Deborah
Watson Gregory R
Willenbring Jim
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. Department of Energy's Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-generation applications and addresses disruptive changes in computer architectures. However, concerns are growing about the productivity of the developers of scientific software. Members of the Interoperable Design of Extreme-scale Application Software project serve as catalysts to address these challenges through fostering software communities, incubating and curating methodologies and resources, and disseminating knowledge to advance developer productivity and software sustainability. This article discusses how these synergistic activities are advancing scientific discovery-mitigating technical risks by building a firmer foundation for reproducible, sustainable science at all scales of computing, from laptops to clusters to exascale and beyond

eScholarship - University of California