6 research outputs found
The Role of Data Filtering in Open Source Software Ranking and Selection
Faced with over 100M open source projects most empirical investigations
select a subset. Most research papers in leading venues investigated filtering
projects by some measure of popularity with explicit or implicit arguments that
unpopular projects are not of interest, may not even represent "real" software
projects, or that less popular projects are not worthy of study. However, such
filtering may have enormous effects on the results of the studies if and
precisely because the sought-out response or prediction is in any way related
to the filtering criteria.
We exemplify the impact of this practice on research outcomes: how filtering
of projects listed on GitHub affects the assessment of their popularity. We
randomly sample over 100,000 repositories and use multiple regression to model
the number of stars (a proxy for popularity) based on the number of commits,
the duration of the project, the number of authors, and the number of core
developers. Comparing control with the entire dataset with a filtered model
projects having ten or more authors we find that while certain characteristics
of the repository consistently predict popularity, the filtering process
significantly alters the relation ships between these characteristics and the
response. The number of commits exhibited a positive correlation with
popularity in the control sample but showed a negative correlation in the
filtered sample. These findings highlight the potential biases introduced by
data filtering and emphasize the need for careful sample selection in empirical
research of mining software repositories. We recommend that empirical work
should either analyze complete datasets such as World of Code, or employ
stratified random sampling from a complete dataset to ensure that filtering is
not biasing the results.Comment: International Workshop on Methodological Issues with Empirical
Studies in Software Engineering (WSESE 2024
An Open Community-Driven Model For Sustainable Research Software: Sustainable Research Software Institute
Research software plays a crucial role in advancing scientific knowledge, but
ensuring its sustainability, maintainability, and long-term viability is an
ongoing challenge. To address these concerns, the Sustainable Research Software
Institute (SRSI) Model presents a comprehensive framework designed to promote
sustainable practices in the research software community. This white paper
provides an in-depth overview of the SRSI Model, outlining its objectives,
services, funding mechanisms, collaborations, and the significant potential
impact it could have on the research software community. It explores the wide
range of services offered, diverse funding sources, extensive collaboration
opportunities, and the transformative influence of the SRSI Model on the
research software landscapeComment: 13 pages, 1 figur
Transitioning ECP Software Technology into a Foundation for Sustainable Research Software
Research software plays a crucial role in advancing scientific knowledge, but
ensuring its sustainability, maintainability, and long-term viability is an
ongoing challenge. The Sustainable Research Software Institute (SRSI) Model has
been designed to address the concerns, and presents a comprehensive framework
designed to promote sustainable practices in the research software community.
However the SRSI Model does not address the transitional requirements for the
Exascale Computing Project (ECP) Software Technology (ECP-ST) focus area
specifically. This white paper provides an overview and detailed description of
how ECP-ST will transition into the SRSI in a compressed time frame that a)
meets the needs of the ECP end-of-technical-activities deadline; and b) ensures
the continuity of the sustainability efforts that are already underway.Comment: 7 pages, 1 figur
Giving RSEs a Larger Stage through the Better Scientific Software Fellowship
The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to
foster and promote practices, processes, and tools to improve developer
productivity and software sustainability of scientific codes. BSSwF's vision is
to grow the community with practitioners, leaders, mentors, and consultants to
increase the visibility of scientific software production and sustainability.
Over the last five years, many fellowship recipients and honorable mentions
have identified as research software engineers (RSEs). This paper provides case
studies from several of the program's participants to illustrate some of the
diverse ways BSSwF has benefited both the RSE and scientific communities. In an
environment where the contributions of RSEs are too often undervalued, we
believe that programs such as BSSwF can be a valuable means to recognize and
encourage community members to step outside of their regular commitments and
expand on their work, collaborations and ideas for a larger audience.Comment: submitted to Computing in Science & Engineering (CiSE), Special Issue
on the Future of Research Software Engineers in the U
Recommended from our members
Giving Research Software Engineers a Larger Stage Through the Better Scientific Software Fellowship
The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. The BSSwF’s vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). Case studies from several of the program’s participants illustrate the diverse ways the BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as the BSSwF can help recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations, and ideas for a larger audience
Recommended from our members
A Cast of Thousands: How the IDEAS Productivity Project Has Advanced Software Productivity and Sustainability
Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. Department of Energy's Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-generation applications and addresses disruptive changes in computer architectures. However, concerns are growing about the productivity of the developers of scientific software. Members of the Interoperable Design of Extreme-scale Application Software project serve as catalysts to address these challenges through fostering software communities, incubating and curating methodologies and resources, and disseminating knowledge to advance developer productivity and software sustainability. This article discusses how these synergistic activities are advancing scientific discovery-mitigating technical risks by building a firmer foundation for reproducible, sustainable science at all scales of computing, from laptops to clusters to exascale and beyond