10 research outputs found
Addressing Research Software Sustainability via Institutes
Research software is essential to modern research, but it requires ongoing
human effort to sustain: to continually adapt to changes in dependencies, to
fix bugs, and to add new features. Software sustainability institutes, amongst
others, develop, maintain, and disseminate best practices for research software
sustainability, and build community around them. These practices can both
reduce the amount of effort that is needed and create an environment where the
effort is appreciated and rewarded. The UK SSI is such an institute, and the US
URSSI and the Australian AuSSI are planning to become institutes, and this
extended abstract discusses them and the strengths and weaknesses of this
approach.Comment: accepted by ICSE 2021 BokSS Workshop
(https://bokss.github.io/bokss2021/
Community Organizations: Changing the Culture in Which Research Software Is Developed and Sustained
Software is the key crosscutting technology that enables advances in
mathematics, computer science, and domain-specific science and engineering to
achieve robust simulations and analysis for science, engineering, and other
research fields. However, software itself has not traditionally received
focused attention from research communities; rather, software has evolved
organically and inconsistently, with its development largely as by-products of
other initiatives. Moreover, challenges in scientific software are expanding
due to disruptive changes in computer hardware, increasing scale and complexity
of data, and demands for more complex simulations involving multiphysics,
multiscale modeling and outer-loop analysis. In recent years, community members
have established a range of grass-roots organizations and projects to address
these growing technical and social challenges in software productivity,
quality, reproducibility, and sustainability. This article provides an overview
of such groups and discusses opportunities to leverage their synergistic
activities while nurturing work toward emerging software ecosystems
Research Software Engineering in 2030
This position paper for an invited talk on the "Future of eScience" discusses
the Research Software Engineering Movement and where it might be in 2030.
Because of the authors' experiences, it is aimed globally but with examples
that focus on the United States and United Kingdom.Comment: Invited paper for 2023 IEEE Conference on eScienc
The Global Impact of Science Gateways, Virtual Research Environments and Virtual Laboratories
Science gateways, virtual laboratories and virtual research environments are all terms used to refer to community-developed digital environments that are designed to meet a set of needs for a research community. Specifically, they refer to integrated access to research community resources including software, data, collaboration tools, workflows, instrumentation and high-performance computing, usually via Web and mobile applications. Science gateways, virtual laboratories and virtual research environments are enabling significant contributions to many research domains, facilitating more efficient, open, reproducible research in bold new ways. This paper explores the global impact achieved by the sum effects of these programs in increasing research impact, demonstrates their value in the broader digital landscape and discusses future opportunities. This is evidenced through examination of national and international programs in this field
Towards computational reproducibility: researcher perspectives on the use and sharing of software
Research software, which includes both source code and executables used as part of the research process, presents a significant challenge for efforts aimed at ensuring reproducibility. In order to inform such efforts, we conducted a survey to better understand the characteristics of research software as well as how it is created, used, and shared by researchers. Based on the responses of 215 participants, representing a range of research disciplines, we found that researchers create, use, and share software in a wide variety of forms for a wide variety of purposes, including data collection, data analysis, data visualization, data cleaning and organization, and automation. More participants indicated that they use open source software than commercial software. While a relatively small number of programming languages (e.g., Python, R, JavaScript, C++, MATLAB) are used by a large number, there is a long tail of languages used by relatively few. Between-group comparisons revealed that significantly more participants from computer science write source code and create executables than participants from other disciplines. Differences between researchers from computer science and other disciplines related to the knowledge of best practices of software creation and sharing were not statistically significant. While many participants indicated that they draw a distinction between the sharing and preservation of software, related practices and perceptions were often not aligned with those of the broader scholarly communications community
Recommended from our members
The role of model implementation in neuroscientific applications of machine learning
In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation.
Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings.
First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS.
Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context
Conceptualizing a US Research Software Sustainability Institute
Modern research is inescapably digital, with data and publications most often created, analyzed, and stored electronically, using tools and methods expressed in software. This "research software" is essential to progress in science, engineering, and all other fields, but it is not developed in an efficient or sustainable way. The researchers who develop this software, while well-versed in their discipline, generally do not have sufficient training and understanding of best practices that ease development and maintainability and that encourage sustainability and reproducibility. In response, this project is conceptualizing a US Research Software Sustainability Institute that will validate and address at least three classes of concerns (functioning of the individual and team, the research software, and the research field itself), impacting all software development and maintenance projects across all of NSF. URSSI conceptualization includes workshops and a widely-distributed survey that engages important stakeholder communities to learn about the software they produce and use, and the ways they contemplate sustaining it, following the paths blazed by other successful software institutes