Identifying Cloned Navigational Patterns in Web Applications

Abstract

Web Applications are subject to continuous and rapid evolution. Often programmers indiscriminately duplicate Web pages without considering systematic development and maintenance methods. This practice creates code clones that make Web Applications hard to maintain and reuse. We present an approach to identify duplicated functionalities in Web Applications through cloned navigational pattern analysis. Cloned patterns can be generalized in a reengineering process, thus to simplify the structure and future maintenance of the Web Applications. The proposed method first identifies pairs of cloned pages by analyzing similarity at structure, content, and scripting code. Two pages are considered clones if their similarity is greater than a given threshold. Cloned pages are then grouped into clusters and the links connecting pages of two clusters are grouped too. An interconnection metric has been defined on the links between two clusters to express the effort required to reengineer them as well as to select the patterns of interest. To further reduce the comprehension effort, we filter out links and nodes of the clustered navigational schema that do not contribute to the identification of cloned navigational patterns. A tool supporting the proposed approach has been developed and validated in a case study

    Similar works