Many internal software metrics and external quality attributes of Java
programs correlate strongly with program size. This knowledge has been used
pervasively in quantitative studies of software through practices such as
normalization on size metrics. This paper reports size-related super- and
sublinear effects that have not been known before. Findings obtained on a very
large collection of Java programs -- 30,911 projects hosted at Google Code as
of Summer 2011 -- unveils how certain characteristics of programs vary
disproportionately with program size, sometimes even non-monotonically. Many of
the specific parameters of nonlinear relations are reported. This result gives
further insights for the differences of "programming in the small" vs.
"programming in the large." The reported findings carry important consequences
for OO software metrics, and software research in general: metrics that have
been known to correlate with size can now be properly normalized so that all
the information that is left in them is size-independent.Comment: ACM Conference on Object-Oriented Programming, Systems, Languages and
Applications (OOPSLA), October 2015. (Preprint