7 research outputs found

    Identifying Similar Pages in Web Applications using a Competitive Clustering Algorithm

    Get PDF
    We present an approach based on Winner Takes All (WTA), a competitive clustering algorithm, to support the comprehension of static and dynamic Web applications during Web application reengineering. This approach adopts a process that first computes the distance between Web pages and then identifies and groups similar pages using the considered clustering algorithm. We present an instance of application of the clustering process to identify similar pages at the structural level. The page structure is encoded into a string of HTML tags and then the distance between Web pages at the structural level is computed using the Levenshtein string edit distance algorithm. A prototype to automate the clustering process has been implemented that can be extended to other instances of the process, such as the identification of groups of similar pages at content level. The approach and the tool have been evaluated in two case studies. The results have shown that the WTA clustering algorithm suggests heuristics to easily identify the best partition of Web pages into clusters among the possible partitions

    Identifying Cloned Navigational Patterns in Web Applications

    Get PDF
    Web Applications are subject to continuous and rapid evolution. Often programmers indiscriminately duplicate Web pages without considering systematic development and maintenance methods. This practice creates code clones that make Web Applications hard to maintain and reuse. We present an approach to identify duplicated functionalities in Web Applications through cloned navigational pattern analysis. Cloned patterns can be generalized in a reengineering process, thus to simplify the structure and future maintenance of the Web Applications. The proposed method first identifies pairs of cloned pages by analyzing similarity at structure, content, and scripting code. Two pages are considered clones if their similarity is greater than a given threshold. Cloned pages are then grouped into clusters and the links connecting pages of two clusters are grouped too. An interconnection metric has been defined on the links between two clusters to express the effort required to reengineer them as well as to select the patterns of interest. To further reduce the comprehension effort, we filter out links and nodes of the clustered navigational schema that do not contribute to the identification of cloned navigational patterns. A tool supporting the proposed approach has been developed and validated in a case study

    Coherent Dependence Cluster

    Get PDF
    This thesis introduces coherent dependence clusters and shows their relevance in areas of software engineering such as program comprehension and mainte- nance. All statements in a coherent dependence cluster depend upon the same set of statements and affect the same set of statements; a coherent cluster’s statements have ‘coherent’ shared backward and forward dependence. We introduce an approximation to efficiently locate coherent clusters and show that its precision significantly improves over previous approximations. Our empirical study also finds that, despite their tight coherence constraints, coherent dependence clusters are to be found in abundance in production code. Studying patterns of clustering in several open-source and industrial programs reveal that most contain multiple significant coherent clusters. A series of case studies reveal that large clusters map to logical functionality and pro- gram structure. Cluster visualisation also reveals subtle deficiencies of program structure and identify potential candidates for refactoring efforts. Supplemen- tary studies of inter-cluster dependence is presented where identification of coherent clusters can help in deriving hierarchical system decomposition for reverse engineering purposes. Furthermore, studies of program faults find no link between existence of coherent clusters and software bugs. Rather, a longi- tudinal study of several systems find that coherent clusters represent the core architecture of programs during system evolution. Due to the inherent conservativeness of static analysis, it is possible for unreachable code and code implementing cross-cutting concerns such as error- handling and debugging to link clusters together. This thesis studies their effect on dependence clusters by using coverage information to remove unexecuted and rarely executed code. Empirical evaluation reveals that code reduction yields smaller slices and clusters

    An Empirical Study on Keyword-based Web Site Clustering

    No full text
    Web site evolution is characterized by a limited support to the understanding activities offered to the developers. In fact, design diagrams are often missing or outdated. A potentially interesting option is to reverse engineer high level views of Web sites from the content of the Web pages. Clustering is a valuable technique that can be used in this respect. Web pages can be clustered together based on the similarity of summary information about their content, represented as a list of automatically extracted keywords. This paper presents an empirical study that was conducted to determine the meaningfulness for Web developers of clusters automatically produced from the analysis of the Web page content. Natural Language Processing (NLP) plays a central role in content analysis and keyword extraction. Thus, a second objective of the study was to assess the contribution of some shallow NLP techniques to the clustering tas

    An empirical study on keyword-based web site clustering

    No full text
    corecore