5 research outputs found

    Light-Weight Ontology Alignment using Best-Match Clone Detection

    Get PDF
    Abstract-Ontologies are a key component of the Semantic Web, providing a common basis for representing and exchanging domain meaning in web documents and resources. Ontology alignment is the problem of relating the elements of two formal ontologies for a semantic domain, in order to identify common concepts and relationships represented using different terminology or language, and thus allow meaningful communication and exchange of documents and resources represented using different ontologies for the same domain. Many algorithms have been proposed for ontology alignment, each with their own strengths and weaknesses. The problem is in many ways similar to nearmiss clone detection: while much of the description of concepts in two ontologies may be similar, there can be differences in structure or vocabulary that make similarity detection challenging. Based on our previous work extending clone detection to modelling languages such as WSDL using contextualization, in this work we apply near-miss clone detection to the problem of ontology alignment, and use the new notion of "best-match" clone detection to achieve results similar to many existing ontology alignment algorithms when applied to standard benchmarks

    Inferring Repository File Structure Modifications Using Nearest-Neighbor Clone Detection

    No full text

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Leveraging Software Clones for Software Comprehension: Techniques and Practice

    Get PDF
    RÉSUMÉ Le corps de cette thèse est centré sur deux aspects de la détection de clones logiciels: la détection et l’application. En détection, la contribution principale de cette thèse est un nouveau détecteur de clones conçu avec la librairie mtreelib, elle-même développée expressément pour ce travail. Cette librairie implémente un arbre de métrique général, une structure de donnée spécialisée dans la division des espaces de métriques dans le but d’accélérer certaines requêtes communes, comme les requêtes par intervalles ou les requêtes de plus proche voisin. Cette structure est utilisée pour construire un détecteur de clones qui approxime la distance de Levenshtein avec une forte précision. Une brève évaluation est présentée pour soutenir cette précision. D’autres résultats pertinents sur les métriques et la détection incrémentale de clones sont également présentés. Plusieurs applications du nouveau détecteur de clones sont présentés. Tout d’abord, un algorithme original pour la reconstruction d’informations perdus dans les systèmes de versionnement est proposé et testé sur plusieurs grands systèmes. Puis, une évaluation qualitative et quantitative de Firefox est faite sur la base d’une analyse du plus proche voisin; les courbes obtenues sont utilisées pour mettre en lumière les difficultés d’effectuer une transition entre un cycle de développement lent et rapide. Ensuite, deux expériences industrielles d’utilisation et de déploiement d’une technologie de détection de clonage sont présentés. Ces deux expériences concernent les langages C/C++, Java et TTCN-3. La grande différence de population de clones entre C/C++ et Java et TTCN-3 est présentée. Finalement, un résultat obtenu grâce au croisement d’une analyse de clones et d’une analyse de flux de sécurité met en lumière l’utilité des clones dans l’identification des failles de sécurité. Le travail se termine par une conclusion et quelques perspectives futures.----------ABSTRACT This thesis explores two topics in clone analysis: detection and application. The main contribution in clone detection is a new clone detector based on a library called mtreelib. This library is a package developed for clone detection that implements the metric data structure. This structure is used to build a clone detector that approximates the Levenshtein distance with high accuracy. A small benchmark is produced to assess the accuracy. Other results from these regarding metrics and incremental clone detection are also presented. Many applications of the clone detector are introduced. An original algorithm to reconstruct missing information in the structure of software repositories is described and tested with data sourced from large existing software. An insight into Firefox is exposed showing the quantity of change between versions and the link between different release cycle types and the number of bugs. Also, an analysis crossing the results from pattern traversal, flow analysis and clone detection is presented. Two industrial experiments using a different clone detector, CLAN, are also presented with some developers’ perspectives. One of the experiments is done on a language never explored in clone detection, TTCN-3, and the results show that the clone population in that language differs greatly from other well-known languages, like C/C++ and Java. The thesis concludes with a summary of the findings and some perspectives for future research
    corecore