2 research outputs found

    A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks

    Get PDF

    A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks

    No full text
    Abstract-This paper presents an investigation into the Website Boundary Detection (WBD) problem in the dynamic context. In the dynamic context (as opposed to the static context) the web data to be considered is not fully available prior to the start of the website boundary detection process. The dynamic approaches presented in this paper are all probabilistic and based on the concept of random walks; three variations are considered: (i) the standard Random Walk (RW), (ii) a Self Avoiding RW and (iii) the Metropolis Hastings RW. The reported evaluation demonstrates that the proposed technique produces good WBD solutions while at the same time reducing the amount of "noise" pages visited. The best performing variation was found to be a Metropolis Hastings RW
    corecore