Search CORE

193 research outputs found

"We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects

Author: Feitelson Dror G.
Publication venue
Publication date: 02/06/2023
Field of study

A tenet of open source software development is to accept contributions from users-developers (typically after appropriate vetting). But should this also include interventions done as part of research on open source development? Following an incident in which buggy code was submitted to the Linux kernel to see whether it would be caught, we conduct a survey among open source developers and empirical software engineering researchers to see what behaviors they think are acceptable. This covers two main issues: the use of publicly accessible information, and conducting active experimentation. The survey had 224 respondents. The results indicate that open-source developers are largely open to research, provided it is done transparently. In other words, many would agree to experiments on open-source projects if the subjects were notified and provided informed consent, and in special cases also if only the project leaders agree. While researchers generally hold similar opinions, they sometimes fail to appreciate certain nuances that are important to developers. Examples include observing license restrictions on publishing open-source code and safeguarding the code. Conversely, researchers seem to be more concerned than developers about privacy issues. Based on these results, it is recommended that open source repositories and projects address use for research in their access guidelines, and that researchers take care to ask permission also when not formally required to do so. We note too that the open source community wants to be heard, so professional societies and IRBs should consult with them when formulating ethics codes.Comment: 15 pages with 42 charts and 3 tables; accepted versio

arXiv.org e-Print Archive

When Are Names Similar Or the Same? Introducing the Code Names Matcher Library

Author: Feitelson Dror G.
Munk Moshe
Publication venue
Publication date: 07/09/2022
Field of study

Program code contains functions, variables, and data structures that are represented by names. To promote human understanding, these names should describe the role and use of the code elements they represent. But the names given by developers show high variability, reflecting the tastes of each developer, with different words used for the same meaning or the same words used for different meanings. This makes comparing names hard. A precise comparison should be based on matching identical words, but also take into account possible variations on the words (including spelling and typing errors), reordering of the words, matching between synonyms, and so on. To facilitate this we developed a library of comparison functions specifically targeted to comparing names in code. The different functions calculate the similarity between names in different ways, so a researcher can choose the one appropriate for his specific needs. All of them share an attempt to reflect human perceptions of similarity, at the possible expense of lexical matching.Comment: 20 pages. Download from https://pypi.org/project/namecompare

arXiv.org e-Print Archive

Reproducing, Extending, and Analyzing Naming Experiments

Author: Alpern Rachel
Feitelson Dror G.
Hakim Hanit
Lazer Ido
Tzachor Issar
Weissbuch Sapir
Publication venue
Publication date: 15/02/2024
Field of study

Naming is very important in software development, as names are often the only vehicle of meaning about what the code is intended to do. A recent study on how developers choose names collected the names given by different developers for the same objects. This enabled a study of these names' diversity and structure, and the construction of a model of how names are created. We reproduce different parts of this study in three independent experiments. Importantly, we employ methodological variations rather than striving of an exact replication. When the same results are obtained this then boosts our confidence in their validity by demonstrating that they do not depend on the methodology. Our results indeed corroborate those of the original study in terms of the diversity of names, the low probability of two developers choosing the same name, and the finding that experienced developers tend to use slightly longer names than inexperienced students. We explain name diversity by performing a new analysis of the names, classifying the concepts represented in them as universal (agreed upon), alternative (reflecting divergent views on a topic), or optional (reflecting divergent opinions on whether to include this concept at all). This classification enables new research directions concerning the considerations involved in naming decisions. We also show that explicitly using the model proposed in the original study to guide naming leads to the creation of better names, whereas the simpler approach of just asking participants to use longer and more detailed names does not.Comment: 35 pages with 10 figures and 6 table

arXiv.org e-Print Archive

System noise, OS clock ticks, and fine-grained parallel applications

Author: Dan Tsafrir
Dror G. Feitelson
Scott Kirkpatrick
Yoav Etsion
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

As parallel jobs get bigger in size and finer in granularity, “system noise ” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP nodes run faster if a processor is intentionally left idle (per node), thus enabling a separation of “system noise ” from the com-putation. Paying a cost in average processing speed at a node for the sake of eliminating occasional processes delays is (unfortunately) beneficial, as such delays are enormously magnified when one late process holds up thousands of peers with which it synchronizes. We provide a probabilistic argument showing that, under certain conditions, the effect of such noise is linearly pro-portional to the size of the cluster (as is often empirically observed). We then identify a major source of noise to be indirect overhead of periodic OS clock interrupts (“ticks”), that are used by all general-purpose OSs as a means of main-taining control. This is shown for various grain sizes, plat-forms, tick frequencies, and OSs. To eliminate such noise, we suggest replacing ticks with an alternative mechanism we call “smart timers”. This turns out to also be in line with needs of desktop and mobile computing, increasing the chances of the suggested change to be accepted. 1

CiteSeerX

Crossref

Benchmarks and Standards for the Evaluation of Parallel Job Schedulers

Author: Chapin Steve J.
Cirne Walfredo
Feitelson Dror G.
Jones James Patton
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1999
Field of study

The evaluation of parallel job schedulers hinges on the workloads used. It is suggested that this be standardized, in terms of both format and content, so as to ease the evaluation and comparison of different systems. The question remains whether this can encompass both traditional parallel systems and metacomputing systems. This paper is based on a panel on this subject that was held at the workshop, and the ensuing discussion; its authors are both the panel members and participants from the audience. Naturally, not all of us agree with all the opinions expressed here..

Syracuse University Research Facility and Collaborative Environment