62 research outputs found
Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions
Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with respect to SLOC due to strong linear correlation.
We conducted an extensive literature study of the CC/SLOC correlation results. Next, we tested correlation on large Java (17.6 M methods) and C (6.3 M functions) corpora. Our results show that linear correlation between SLOC and CC is only moderate as caused by increasingly high variance. We further observe that aggregating CC and SLOC as well as performing a power transform improves the correlation.
Our conclusion is that the observed linear correlation between CC and SLOC of Java methods or C functions is not strong enough to conclude that CC is redundant with SLOC. This conclusion contradicts earlier claims from literature, but concurs with the widely accepted practice of measuring of CC next to SLOC
How Scale Affects Structure in Java Programs
Many internal software metrics and external quality attributes of Java
programs correlate strongly with program size. This knowledge has been used
pervasively in quantitative studies of software through practices such as
normalization on size metrics. This paper reports size-related super- and
sublinear effects that have not been known before. Findings obtained on a very
large collection of Java programs -- 30,911 projects hosted at Google Code as
of Summer 2011 -- unveils how certain characteristics of programs vary
disproportionately with program size, sometimes even non-monotonically. Many of
the specific parameters of nonlinear relations are reported. This result gives
further insights for the differences of "programming in the small" vs.
"programming in the large." The reported findings carry important consequences
for OO software metrics, and software research in general: metrics that have
been known to correlate with size can now be properly normalized so that all
the information that is left in them is size-independent.Comment: ACM Conference on Object-Oriented Programming, Systems, Languages and
Applications (OOPSLA), October 2015. (Preprint
Comparing the performance of string operations across programming languages
Abstract. In this thesis, the performance of string operations are compared across programming languages. Handling strings effectively is important especially when performance is a crucial factor and large string sizes may emerge. Common examples where large string sizes emerge are during digitalization of a product, reading string data from a database, reading and handling large CSV-files and Excel-files, converting file format to another file format (e.g. CSV to Excel and vice versa), and reading and handling a DOM-tree of a website.
There has been a lot of corresponding research where programming languages are benchmarked, but none of them focus directly on string operations. The main goal of this thesis is to fill this gap in literature and try to find out which programming languages have the best results on string operations in terms of execution time and memory (maximum RSS) usage.
The test environment was formed by creating randomly generated string files with sizes varying from ten thousand characters to 100 million characters. The generated characters were ‘a’, ‘b’, and ‘ ‘ (whitespace character). The programming languages selected for this thesis were Python, C, C++, Java, Perl, Ruby, Go, and Swift.
Go seemed to be the most effective language in execution times, although it was not the fastest in many operations. C used very little memory, but only five operations were implemented in it. Every operation was implemented in Python, and it used additional memory to loading the string file in only one operation, which was sorting a string. Swift had quite bad results, and this could be caused by the Linux version of Swift that was used. In regular expressions, Perl and C++ were overwhelmingly effective. Java used the most memory in every operation
Understanding the impact of introducing Lambda expressions in Java Programs
Background: The Java programming language version eight introduced several features that encourage the func
tional style of programming, including the support for lambda expressions and the Stream API. Currently, there is
a common wisdom that refactoring legacy code to introduce lambda expressions, besides other potential benefits,
simplifies the code and improves program comprehension. Aims: The purpose of this work is to investigate this
belief, conducting an indepth study to evaluate the effect of introducing lambda expressions on program comprehension. Method: We conducted this research using a mixedmethod approach. For the quantitative method, we
quantitatively analyzed 158 pairs of code snippets extracted directly either from GitHub or from recommendations
from three tools (RJTL, NetBeans, and IntelliJ). We also surveyed practitioners to collect their perceptions about the
benefits on program comprehension when introducing lambda expressions. We asked practitioners to evaluate and
rate sets of pairs of code snippets. Results: We found contradictory results in our research. Based on the quantitative
assessment, we could not find evidence that the introduction of lambda expressions improves software readability—
one of the components of program comprehension. Our results suggest that the transformations recommended by
the aforementioned tools decrease program comprehension when assessed by two stateoftheart models to estimate readability. Differently, our findings of the qualitative assessment suggest that the introduction of lambda
expression improves program comprehension in three scenarios when: we convert anonymous inner classes to a
lambda expression, use structural loops with inner conditional to an anyMatch operator, and apply structural loops
to filter operator combined with a collect method. Implications: We argue in this paper that one can improve
program comprehension when he/she applies particular transformations to introduce lambda expressions (e.g., replacing anonymous inner classes with lambda expressions). Also, the opinion of the participants highlights which
kind of transformation for introducing lambda might be advantageous. This might support the implementation of
effective tools for automatic program transformations
Recommended from our members
Beyond Similar Code: Leveraging Social Coding Websites
Programmers often write code with similarity to existing code written somewhere. Code search tools can help developers find similar solutions and identify possible improvements. For code search tools, good search results rely on valid data collection. Social coding websites, such as Question & Answer forum Stack Overflow (SO) and project repository GitHub, are popular destinations when programmers look for how to achieve certain programming tasks. Over the years, SO and GitHub have accumulated an enormous knowledge base of, and around, code. Since these software artifacts are publicly available, it is possible to leverage them in code search tools. This dissertation explores the opportunities of leveraging software artifacts from the social coding websites in searching for not just similar, but related, code. Programmers query SO and GitHub extensively to search for suitable code for reuse, however, not much is known about the usability or quality of the available code from each website. This dissertation first investigates under what circumstances the software artifacts found in social coding websites can be leveraged for purposes other than their immediate use by developers. It points out a number of problems that need to be addressed before those artifacts can be leveraged for code search and development tools. Specifically, triviality, fragility, and duplication, dominate these artifacts. However, when these problems are addressed, there is still a considerable amount of good quality artifacts that can be leveraged.SO and GitHub are not only two separate data resources, moreover, they together, belong to a larger system of software development process: the same users that rely on facilities of GitHub often seeks support on SO for their problems, and return to GitHub to apply the knowledge acquired. This dissertation further studies the crossover of software artifacts between SO and GitHub, and categorizes the adaptations from a SO code snippet to its GitHub counterparts. Existing search tools only recommend other code locations that are syntactically or semantically similar to the given code but do not reason about other kinds of relevant code that a developer should also pay attention to, e.g., auxiliary code to accomplish a complete task. With the good quality software artifacts and crossover between the two systems available, this dissertation presents two approaches that leverage these artifacts in searching for related code. Aroma indexes GitHub projects, takes a partial code snippet as input, searches the corpus for methods containing the partial code snippet, and clusters and intersects the results of the search to recommend. Aroma is evaluated on randomly selected queries created from the GitHub corpus, as well as queries derived from SO code snippets. It recommends related code for error checking and handling, objects configuring, etc. Furthermore, a user study is conducted where industrial developers are asked to complete programming tasks using Aroma and provide feedback. The results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently. CodeAid reuses the crossover between SO and GitHub and recommends related code outside of a method body. For each SO snippet as a query, CodeAid retrieves the co-occurring code fragments for its GitHub counterparts and clusters them to recommend common ones. 74% of the common co-occurring code fragments represent related functionality that should be included in code search results. Three major types of relevancy--complementary, supplementary, and alternative methods, are identified
Recommended from our members
A Large-Scale Study of Modern Code Review and Security in Open Source Projects.
- …