Search CORE

1,592 research outputs found

Relating Developers’ Concepts and Artefact Vocabulary in a Financial Software Module

Author: Dilshener Tezcan
Wermelinger Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2011
Field of study

Developers working on unfamiliar systems are challenged to accurately identify where and how high-level concepts are implemented in the source code. Without additional help, concept location can become a tedious, time-consuming and error-prone task. In this paper we study an industrial financial application for which we had access to the user guide, the source code, and some change requests. We compared the relative importance of the domain concepts, as understood by developers, in the user manual and in the source code. We also searched the code for the concepts occurring in change requests, to see if they could point developers to code to be modified. We varied the searches (using exact and stem matching, discarding stop-words, etc.) and present the precision and recall. We discuss the implication of our results for maintenance

Crossref

Open Research Online (The Open University)

Investigating naming convention adherence in Java references

Author: Butler Simon
Wermelinger Michel
Yu Yijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2015
Field of study

Naming conventions can help the readability and comprehension of code, and thus the onboarding of new developers. Conventions also provide cues that help developers and tools extract information from identifier names to support software maintenance. Tools exist to automatically check naming conventions but they are often limited to simple checks, e.g. regarding typography. The adherence to more elaborate conventions, such as the use of noun and verbal phrases in names, is not checked. We present NOMINAL, a naming convention checking library for Java that allows the declarative specification of conventions regarding typography and the use of abbreviations and phrases. To test NOMINAL, and to investigate the extent to which developers follow conventions, we extract 3.5 million reference — field, formal argument and local variable — name declarations from 60 FLOSS projects and determine their adherence to two well- known Java naming convention guidelines that give developers scope to choose a variety of forms of name, and sometimes offer conflicting advice

CiteSeerX

Crossref

Open Research Online (The Open University)

Recommended from our members

Analysing Java Identifier Names

Author: Butler Simon Jonathan
Publication venue
Publication date: 13/06/2016
Field of study

Identifier names are the principal means of recording and communicating ideas in source code and are a significant source of information for software developers and maintainers, and the tools that support their work. This research aims to increase understanding of identifier name content types - words, abbreviations, etc. - and phrasal structures - noun phrases, verb phrases, etc. - by improving techniques for the analysis of identifier names. The techniques and knowledge acquired can be applied to improve program comprehension tools that support internal code quality, concept location, traceability and model extraction. Previous detailed investigations of identifier names have focused on method names, and the content and structure of Java class and reference (field, parameter, and variable) names are less well understood. I developed improved algorithms to tokenise names, and trained part-of-speech tagger models on identifier names to support the analysis of class and reference names in a corpus of 60 open source Java projects. I confirm that developers structure the majority of names according to identifier naming conventions, and use phrasal structures reported in the literature. I also show that developers use a wider variety of content types and phrasal structures than previously understood. Unusually structured class names are largely project-specific naming conventions, but could indicate design issues. Analysis of phrasal reference names showed that developers most often use the phrasal structures described in the literature and used to support the extraction of information from names, but also choose unexpected phrasal structures, and complex, multi-phrasal, names. Using Nominal - software I created to evaluate adherence to naming conventions - I found developers tend to follow naming conventions, but that adherence to published conventions varies between projects because developers also establish new conventions for the use of typography, content types and phrasal structure to support their work: particularly to distinguish the roles of Java field names

Open Research Online (The Open University)

Mining Java Class Naming Conventions

Author: Butler Simon
Sharp Helen
Wermelinger Michel
Yu Yijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2011
Field of study

Class names represent the concepts implemented in object-oriented source code and are key elements in program comprehension and, thus, software maintenance. Programming conventions often state that class names should be noun-phrases, but there is little further guidance for developers on the composition of class names. Other researchers have observed that the majority of Java class identifier names are composed of one or more nouns preceded, optionally, by one or more adjectives. However, no detailed analysis of class identifier name structure has been undertaken that could be leveraged to support program comprehension activities. We investigate the lexical and syntactic composition of Java class identifier names in two ways. Firstly, as others have done for C function and Java method names, we identify conventional patterns found in the use of parts of speech. Secondly, we identify the origin of words used in class names within the name of any super class and implemented interfaces to identify patterns of class name construction related to inheritance. Through the analysis of 120,000 unique class names found in 60 open source projects we identify both common and project specific class naming conventions. We apply this knowledge in a case study of the mind-mapping tool Freemind to investigate whether class names that follow unconventional naming schemes are candidates for refactoring – either a name refactoring that conforms to established naming conventions within the code base, or refactoring of the class that results in conventionally named classes

Crossref

Open Research Online (The Open University)

Exploring the Influence of Identifier Names on Code Quality: An empirical study

Author: Butler Simon
Sharp Helen
Wermelinger Michel
Yu Yijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2010
Field of study

Given the importance of identifier names and the value of naming conventions to program comprehension, we speculated in previous work whether a connection exists between the quality of identifier names and software quality. We found that flawed identifiers in Java classes were associated with source code found to be of low quality by static analysis. This paper extends that work in three directions. First, we show that the association also holds at the finer granularity level of Java methods. This in turn makes it possible to, secondly, apply existing method-level quality and readability metrics, and see that flawed identifiers still impact on this richer notion of code quality and comprehension. Third, we check whether the association can be used in a practical way. We adopt techniques used to evaluate medical diagnostic tests in order to identify which particular identifier naming flaws could be used as a light-weight diagnostic of potentially problematic Java source code for maintenance

Crossref

Open Research Online (The Open University)

Improving the tokenisation of identifier names

Author: A. Kuhn
A. Marcus
A. Vermeulen
B. Caprile
D. Lawrie
D. Raţiu
E. Enslen
E.W. Høst
E.W. Høst
G. Antoniol
G. Antoniol
J. Singer
N. Madani
S. Abebe
S. Butler
V.I. Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Identifier names are the main vehicle for semantic information during program comprehension. For tool-supported program comprehension tasks, including concept location and requirements traceability, identifier names need to be tokenised into their semantic constituents. In this paper we present an approach to the automated tokenisation of identifier names that improves on existing techniques in two ways. First, it improves the tokenisation accuracy for single-case identifier names and for identifier names containing digits, which existing techniques largely ignore. Second, performance gains over existing techniques are achieved using smaller oracles, making the approach easier to deploy. Accuracy was evaluated by comparing our algorithm to manual tokenizations of 28,000 identifier names drawn from 60 well-known open source Java projects totalling 16.5 MSLOC. Moreover, the projects were used to perform a study of identifier tokenisation features (single case, camel case, use of digits, etc.) per object-oriented construct (class names, method names, local variable names, etc.), thus providing an insight into naming conventions in industrial-scale object-oriented code. Our tokenisation tool and datasets are publicly available

Crossref

Open Research Online (The Open University)

A survey of the forms of Java reference names

Author: Butler Simon
Wermelinger Michel
Yu Yijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2015
Field of study

The readability of identifiers is a major factor of program comprehension and an aim of naming convention guidelines. Due to their semantic content, identifiers are also used in feature and bug location, among other software maintenance tasks. Looking at how names are used in practice may lead to insights on potential problems for comprehension and for programming support tools that process identifiers. Class and method names are already well represented in the literature. This paper presents an investigation of Java field, formal argument and local variable names, which we collectively call reference names. These names cannot be ignored because they constitute over half the unique names and almost 70 of the name declarations in the corpus investigated. We analysed the forms of 3.5 million reference name declarations in 60 well known Java projects, examining the phrasal structure of names composed of known words and acronyms. The structures found in practice were evaluated against those given in the literature. The use of unknown abbreviations and words, which may pose a problem for program comprehension, was also identified. Based on our observations of the rich diversity of reference names, we suggest issues to be taken into account for future academic research and for improving tools that rely on names as sources of information

CiteSeerX

Crossref

Open Research Online (The Open University)