14,183 research outputs found
Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics
In any sufficiently complex software system there are experts, having a
deeper understanding of parts of the system than others. However, it is not
always clear who these experts are and which particular parts of the system
they can provide help with. We propose a framework to elicit the expertise of
developers and recommend experts by analyzing complexity measures over time.
Furthermore, teams can detect those parts of the software for which currently
no, or only few experts exist and take preventive actions to keep the
collective code knowledge and ownership high. We employed the developed
approach at a medium-sized company. The results were evaluated with a survey,
comparing the perceived and the computed expertise of developers. We show that
aggregated code metrics can be used to identify experts for different software
components. The identified experts were rated as acceptable candidates by
developers in over 90% of all cases
How Scale Affects Structure in Java Programs
Many internal software metrics and external quality attributes of Java
programs correlate strongly with program size. This knowledge has been used
pervasively in quantitative studies of software through practices such as
normalization on size metrics. This paper reports size-related super- and
sublinear effects that have not been known before. Findings obtained on a very
large collection of Java programs -- 30,911 projects hosted at Google Code as
of Summer 2011 -- unveils how certain characteristics of programs vary
disproportionately with program size, sometimes even non-monotonically. Many of
the specific parameters of nonlinear relations are reported. This result gives
further insights for the differences of "programming in the small" vs.
"programming in the large." The reported findings carry important consequences
for OO software metrics, and software research in general: metrics that have
been known to correlate with size can now be properly normalized so that all
the information that is left in them is size-independent.Comment: ACM Conference on Object-Oriented Programming, Systems, Languages and
Applications (OOPSLA), October 2015. (Preprint
Evaluation of Software Product Quality Metrics
Computing devices and associated software govern everyday life, and form the
backbone of safety critical systems in banking, healthcare, automotive and
other fields. Increasing system complexity, quickly evolving technologies and
paradigm shifts have kept software quality research at the forefront. Standards
such as ISO's 25010 express it in terms of sub-characteristics such as
maintainability, reliability and security. A significant body of literature
attempts to link these subcharacteristics with software metric values, with the
end goal of creating a metric-based model of software product quality. However,
research also identifies the most important existing barriers. Among them we
mention the diversity of software application types, development platforms and
languages. Additionally, unified definitions to make software metrics truly
language-agnostic do not exist, and would be difficult to implement given
programming language levels of variety. This is compounded by the fact that
many existing studies do not detail their methodology and tooling, which
precludes researchers from creating surveys to enable data analysis on a larger
scale. In our paper, we propose a comprehensive study of metric values in the
context of three complex, open-source applications. We align our methodology
and tooling with that of existing research, and present it in detail in order
to facilitate comparative evaluation. We study metric values during the entire
18-year development history of our target applications, in order to capture the
longitudinal view that we found lacking in existing literature. We identify
metric dependencies and check their consistency across applications and their
versions. At each step, we carry out comparative evaluation with existing
research and present our results.Comment: Published in: Molnar AJ., Neam\c{t}u A., Motogna S. (2020) Evaluation
of Software Product Quality Metrics. In: Damiani E., Spanoudakis G.,
Maciaszek L. (eds) Evaluation of Novel Approaches to Software Engineering.
ENASE 2019. Communications in Computer and Information Science, vol 1172.
Springer, Cham. https://doi.org/10.1007/978-3-030-40223-5_
Bayesian Hierarchical Modelling for Tailoring Metric Thresholds
Software is highly contextual. While there are cross-cutting `global'
lessons, individual software projects exhibit many `local' properties. This
data heterogeneity makes drawing local conclusions from global data dangerous.
A key research challenge is to construct locally accurate prediction models
that are informed by global characteristics and data volumes. Previous work has
tackled this problem using clustering and transfer learning approaches, which
identify locally similar characteristics. This paper applies a simpler approach
known as Bayesian hierarchical modeling. We show that hierarchical modeling
supports cross-project comparisons, while preserving local context. To
demonstrate the approach, we conduct a conceptual replication of an existing
study on setting software metrics thresholds. Our emerging results show our
hierarchical model reduces model prediction error compared to a global approach
by up to 50%.Comment: Short paper, published at MSR '18: 15th International Conference on
Mining Software Repositories May 28--29, 2018, Gothenburg, Swede
An Extended Stable Marriage Problem Algorithm for Clone Detection
Code cloning negatively affects industrial software and threatens
intellectual property. This paper presents a novel approach to detecting cloned
software by using a bijective matching technique. The proposed approach focuses
on increasing the range of similarity measures and thus enhancing the precision
of the detection. This is achieved by extending a well-known stable-marriage
problem (SMP) and demonstrating how matches between code fragments of different
files can be expressed. A prototype of the proposed approach is provided using
a proper scenario, which shows a noticeable improvement in several features of
clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
On the State of the Art of Coupling and Cohesion Measures for Service-Oriented System Design
Service-oriented computing has encountered an increasing importance for enterprises over the last years. With Web services, the major underlying technical basis is already in an advanced state. The service design area, on the other hand, still provides several research gaps such as the field of service identification and in particular the determination of an optimal granularity level for services. Granularity, assessed through coupling and cohesion considerations, is yet a rather unexplored domain when it comes to service-orientation, although several results from earlier design principles are available. In this paper we summarize the current state of the art in granularity measures and identify the implications emerging for practice and re-search. As we reveal, several existing measures for other paradigms, which might be adapted for service-orientation, are left unconsidered. Further research gaps, as the mainly missing empirical evaluation or a tighter inclusion in the development process, are also detected
Principles and Concepts of Agent-Based Modelling for Developing Geospatial Simulations
The aim of this paper is to outline fundamental concepts and principles of the Agent-Based Modelling (ABM) paradigm, with particular reference to the development of geospatial simulations. The paper begins with a brief definition of modelling, followed by a classification of model types, and a comment regarding a shift (in certain circumstances) towards modelling systems at the individual-level. In particular, automata approaches (e.g. Cellular Automata, CA, and ABM) have been particularly popular, with ABM moving to the fore. A definition of agents and agent-based models is given; identifying their advantages and disadvantages, especially in relation to geospatial modelling. The potential use of agent-based models is discussed, and how-to instructions for developing an agent-based model are provided. Types of simulation / modelling systems available for ABM are defined, supplemented with criteria to consider before choosing a particular system for a modelling endeavour. Information pertaining to a selection of simulation / modelling systems (Swarm, MASON, Repast, StarLogo, NetLogo, OBEUS, AgentSheets and AnyLogic) is provided, categorised by their licensing policy (open source, shareware / freeware and proprietary systems). The evaluation (i.e. verification, calibration, validation and analysis) of agent-based models and their output is examined, and noteworthy applications are discussed.Geographical Information Systems (GIS) are a particularly useful medium for representing model input and output of a geospatial nature. However, GIS are not well suited to dynamic modelling (e.g. ABM). In particular, problems of representing time and change within GIS are highlighted. Consequently, this paper explores the opportunity of linking (through coupling or integration / embedding) a GIS with a simulation / modelling system purposely built, and therefore better suited to supporting the requirements of ABM. This paper concludes with a synthesis of the discussion that has proceeded. The aim of this paper is to outline fundamental concepts and principles of the Agent-Based Modelling (ABM) paradigm, with particular reference to the development of geospatial simulations. The paper begins with a brief definition of modelling, followed by a classification of model types, and a comment regarding a shift (in certain circumstances) towards modelling systems at the individual-level. In particular, automata approaches (e.g. Cellular Automata, CA, and ABM) have been particularly popular, with ABM moving to the fore. A definition of agents and agent-based models is given; identifying their advantages and disadvantages, especially in relation to geospatial modelling. The potential use of agent-based models is discussed, and how-to instructions for developing an agent-based model are provided. Types of simulation / modelling systems available for ABM are defined, supplemented with criteria to consider before choosing a particular system for a modelling endeavour. Information pertaining to a selection of simulation / modelling systems (Swarm, MASON, Repast, StarLogo, NetLogo, OBEUS, AgentSheets and AnyLogic) is provided, categorised by their licensing policy (open source, shareware / freeware and proprietary systems). The evaluation (i.e. verification, calibration, validation and analysis) of agent-based models and their output is examined, and noteworthy applications are discussed.Geographical Information Systems (GIS) are a particularly useful medium for representing model input and output of a geospatial nature. However, GIS are not well suited to dynamic modelling (e.g. ABM). In particular, problems of representing time and change within GIS are highlighted. Consequently, this paper explores the opportunity of linking (through coupling or integration / embedding) a GIS with a simulation / modelling system purposely built, and therefore better suited to supporting the requirements of ABM. This paper concludes with a synthesis of the discussion that has proceeded
- …