3 research outputs found
Recommended from our members
Analysing Java Identifier Names
Identifier names are the principal means of recording and communicating ideas in source code and are a significant source of information for software developers and maintainers, and the tools that support their work. This research aims to increase understanding of identifier name content types - words, abbreviations, etc. - and phrasal structures - noun phrases, verb phrases, etc. - by improving techniques for the analysis of identifier names. The techniques and knowledge acquired can be applied to improve program comprehension tools that support internal code quality, concept location, traceability and model extraction. Previous detailed investigations of identifier names have focused on method names, and the content and structure of Java class and reference (field, parameter, and variable) names are less well understood.
I developed improved algorithms to tokenise names, and trained part-of-speech tagger models on identifier names to support the analysis of class and reference names in a corpus of 60 open source Java projects. I confirm that developers structure the majority of names according to identifier naming conventions, and use phrasal structures reported in the literature. I also show that developers use a wider variety of content types and phrasal structures than previously understood. Unusually structured class names are largely project-specific naming conventions, but could indicate design issues. Analysis of phrasal reference names showed that developers most often use the phrasal structures described in the literature and used to support the extraction of information from names, but also choose unexpected phrasal structures, and complex, multi-phrasal, names.
Using Nominal - software I created to evaluate adherence to naming conventions - I found developers tend to follow naming conventions, but that adherence to published conventions varies between projects because developers also establish new conventions for the use of typography, content types and phrasal structure to support their work: particularly to distinguish the roles of Java field names
Development of a System for Analysing Method Names in Java Source Code
Naming code can seem like a simple task, however finding a good name can be rather challenging. Entity names should be consistent and brief yet comprehensive when representing the information each entity hold. What is considered a good name can be highly debatable, although it usually involves descriptive names that can contribute to readability and comprehensibility of source code. Bad code names can cause uncertainty, potential future bugs and be misleading. For this reason, the task of naming code is vital, hence there is a need of a system to improve and maintain it. To develop such a system, there are requirements required to be specified to define the expected implementation for certain entity names. These requirements are encoded into software in a domain-specific language, granting executable code to be generated from the expressed requirements. As a result, this name analysis tool provides programmers to perform code analysis on Java source code checking if the entities act in accordance with the requirements of their names. Additionally, the result shows insights of how contributions from linguistics can be valuable for software development and can be used to analyse software languages, such as entity names.Masteroppgave i informatikkINF399MAMN-PROGMAMN-IN
Recommended from our members
Improving Information Retrieval Bug Localisation Using Contextual Heuristics
Software developers working on unfamiliar systems are challenged to identify where and how high-level concepts are implemented in the source code prior to performing maintenance tasks. Bug localisation is a core program comprehension activity in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code?
Information retrieval (IR) approaches see the bug report as the query, and the source files as the documents to be retrieved, ranked by relevance. Current approaches rely on project history, in particular previously fixed bugs and versions of the source code. Existing IR techniques fall short of providing adequate solutions in finding all the source code files relevant for a bug. Without additional help, bug localisation can become a tedious, time- consuming and error-prone task.
My research contributes a novel algorithm that, given a bug report and the application’s source files, uses a combination of lexical and structural information to suggest, in a ranked order, files that may have to be changed to resolve the reported bug without requiring past code and similar reports.
I study eight applications for which I had access to the user guide, the source code, and some bug reports. I compare the relative importance and the occurrence of the domain concepts in the project artefacts and measure the effectiveness of using only concept key words to locate files relevant for a bug compared to using all the words of a bug report.
Measuring my approach against six others, using their five metrics and eight projects, I position an effected file in the top-1, top-5 and top-10 ranks on average for 44%, 69% and 76% of the bug reports respectively. This is an improvement of 23%, 16% and 11% respectively over the best performing current state-of-the-art tool.
Finally, I evaluate my algorithm with a range of industrial applications in user studies, and found that it is superior to simple string search, as often performed by developers. These results show the applicability of my approach to software projects without history and offers a simpler light-weight solution