1,310 research outputs found
Management Aspects of Software Clone Detection and Analysis
Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies.
To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest.
A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications.
Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code.
We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision
Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm
International audienceIn this paper, we address the design and implementation of GPU-accelerated Branch-and-Bound algorithms (B&B) for solving Flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multi-threaded accelerators using the SIMD model at execution. A major issue which arises when executing on GPU a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP which contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU-based execution, accelerations up to Ă77.46 are achieved for large problem instances
Pattern-based refactoring in model-driven engineering
LâingĂ©nierie dirigĂ©e par les modĂšles (IDM) est un paradigme du gĂ©nie logiciel qui utilise les
modĂšles comme concepts de premier ordre Ă partir desquels la validation, le code, les tests
et la documentation sont dérivés. Ce paradigme met en jeu divers artefacts tels que les
modÚles, les méta-modÚles ou les programmes de transformation des modÚles. Dans un
contexte industriel, ces artefacts sont de plus en plus complexes. En particulier, leur
maintenance demande beaucoup de temps et de ressources. Afin de réduire la complexité
des artefacts et le coût de leur maintenance, de nombreux chercheurs se sont intéressés au
refactoring de ces artefacts pour améliorer leur qualité.
Dans cette thĂšse, nous proposons dâĂ©tudier le refactoring dans lâIDM dans sa
globalité, par son application à ces différents artefacts. Dans un premier temps, nous
utilisons des patrons de conception spécifiques, comme une connaissance a priori, appliqués
aux transformations de modÚles comme un véhicule pour le refactoring. Nous procédons
dâabord par une phase de dĂ©tection des patrons de conception avec diffĂ©rentes formes et
différents niveaux de complétude. Les occurrences détectées forment ainsi des opportunités
de refactoring qui seront exploitées pour aboutir à des formes plus souhaitables et/ou plus
complĂštes de ces patrons de conceptions.
Dans le cas dâabsence de connaissance a priori, comme les patrons de conception,
nous proposons une approche basée sur la programmation génétique, pour apprendre des
rÚgles de transformations, capables de détecter des opportunités de refactoring et de les
corriger. Comme alternative Ă la connaissance disponible a priori, lâapproche utilise des
exemples de paires dâartefacts dâavant et dâaprĂšs le refactoring, pour ainsi apprendre les
rĂšgles de refactoring. Nous illustrons cette approche sur le refactoring de modĂšles.Model-Driven Engineering (MDE) is a software engineering paradigm that uses models as
first-class concepts from which validation, code, testing, and documentation are derived.
This paradigm involves various artifacts such as models, meta-models, or model
transformation programs. In an industrial context, these artifacts are increasingly complex.
In particular, their maintenance is time and resources consuming. In order to reduce the
complexity of artifacts and the cost of their maintenance, many researchers have been
interested in refactoring these artifacts to improve their quality.
In this thesis, we propose to study refactoring in MDE holistically, by its application
to these different artifacts. First, we use specific design patterns, as an example of prior
knowledge, applied to model transformations to enable refactoring. We first proceed with a
detecting phase of design patterns, with different forms and levels of completeness. The
detected occurrences thus form refactoring opportunities that will be exploited to implement
more desirable and/or more complete forms of these design patterns.
In the absence of prior knowledge, such as design patterns, we propose an approach
based on genetic programming, to learn transformation rules, capable of detecting
refactoring opportunities and correcting them. As an alternative to prior knowledge, our
approach uses examples of pairs of artifacts before and after refactoring, in order to learn
refactoring rules. We illustrate this approach on model refactoring
Programming Heterogeneous Parallel Machines Using Refactoring and Monte-Carlo Tree Search
Funding: This work was supported by the EU Horizon 2020 project, TeamPlay, Grant Number 779882, and UK EPSRC Discovery, Grant Number EP/P020631/1.This paper presents a new technique for introducing and tuning parallelism for heterogeneous shared-memory systems (comprising a mixture of CPUs and GPUs), using a combination of algorithmic skeletons (such as farms and pipelines), MonteâCarlo tree search for deriving mappings of tasks to available hardware resources, and refactoring tool support for applying the patterns and mappings in an easy and effective way. Using our approach, we demonstrate easily obtainable, significant and scalable speedups on a number of case studies showing speedups of up to 41 over the sequential code on a 24-core machine with one GPU. We also demonstrate that the speedups obtained by mappings derived by the MCTS algorithm are within 5â15% of the best-obtained manual parallelisation.Publisher PDFPeer reviewe
RePOR: Mimicking humans on refactoring tasks. Are we there yet?
Refactoring is a maintenance activity that aims to improve design quality
while preserving the behavior of a system. Several (semi)automated approaches
have been proposed to support developers in this maintenance activity, based on
the correction of anti-patterns, which are `poor' solutions to recurring design
problems. However, little quantitative evidence exists about the impact of
automatically refactored code on program comprehension, and in which context
automated refactoring can be as effective as manual refactoring. Leveraging
RePOR, an automated refactoring approach based on partial order reduction
techniques, we performed an empirical study to investigate whether automated
refactoring code structure affects the understandability of systems during
comprehension tasks. (1) We surveyed 80 developers, asking them to identify
from a set of 20 refactoring changes if they were generated by developers or by
a tool, and to rate the refactoring changes according to their design quality;
(2) we asked 30 developers to complete code comprehension tasks on 10 systems
that were refactored by either a freelancer or an automated refactoring tool.
To make comparison fair, for a subset of refactoring actions that introduce new
code entities, only synthetic identifiers were presented to practitioners. We
measured developers' performance using the NASA task load index for their
effort, the time that they spent performing the tasks, and their percentages of
correct answers. Our findings, despite current technology limitations, show
that it is reasonable to expect a refactoring tools to match developer code
Searchâbased model transformations
Model transformations are an important cornerstone of modelâdriven engineering, a discipline which facilitates the abstraction of relevant information of a system as models. The success of the final system mainly depends on the optimization of these models through model transformations. Currently, the application of transformations is realized either by following the applyâasâlongâasâpossible strategy or by the provision of explicit rule orchestrations. This implies two main limitations. First, the optimization objectives are implicitly hidden in the transformation rules and their orchestration. Second, manually finding the best orchestration for a particular scenario is a major challenge due to the high number of possible combinations.
To overcome these limitations, we present a novel framework that builds on the nonâintrusive integration of optimization and model transformation technologies. In particular, we formulate the transformation orchestration task as an optimization problem, which allows for the efficient exploration of the transformation space and explication of the transformation objectives. Our generic framework provides several search algorithms and guides the user in providing a proper search configuration. We present different instantiations of our framework to demonstrate its feasibility, applicability, and benefits using several case studiesEuropean Commission ICT Policy Support Programme 317859Ministerio de Economia y Competitividad TIN2015-70560-RJunta de AndalucĂa P10-TIC-5960Junta de AndalucĂa P12-TIC-186
Premise Selection for Mathematics by Corpus Analysis and Kernel Methods
Smart premise selection is essential when using automated reasoning as a tool
for large-theory formal proof development. A good method for premise selection
in complex mathematical libraries is the application of machine learning to
large corpora of proofs. This work develops learning-based premise selection in
two ways. First, a newly available minimal dependency analysis of existing
high-level formal mathematical proofs is used to build a large knowledge base
of proof dependencies, providing precise data for ATP-based re-verification and
for training premise selection algorithms. Second, a new machine learning
algorithm for premise selection based on kernel methods is proposed and
implemented. To evaluate the impact of both techniques, a benchmark consisting
of 2078 large-theory mathematical problems is constructed,extending the older
MPTP Challenge benchmark. The combined effect of the techniques results in a
50% improvement on the benchmark over the Vampire/SInE state-of-the-art system
for automated reasoning in large theories.Comment: 26 page
- âŠ