Web Applications are subject to continuous and rapid evolution. Often programmers indiscriminately
duplicate Web pages without considering systematic development and maintenance methods. This practice
creates code clones that make Web Applications hard to maintain and reuse. We present an approach to
identify duplicated functionalities in Web Applications through cloned navigational pattern analysis.
Cloned patterns can be generalized in a reengineering process, thus to simplify the structure and future
maintenance of the Web Applications. The proposed method first identifies pairs of cloned pages by
analyzing similarity at structure, content, and scripting code. Two pages are considered clones if their
similarity is greater than a given threshold. Cloned pages are then grouped into clusters and the links
connecting pages of two clusters are grouped too. An interconnection metric has been defined on the links
between two clusters to express the effort required to reengineer them as well as to select the patterns of
interest. To further reduce the comprehension effort, we filter out links and nodes of the clustered
navigational schema that do not contribute to the identification of cloned navigational patterns. A tool
supporting the proposed approach has been developed and validated in a case study