2 research outputs found

    A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

    No full text
    To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website “www.jd.com„, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the PR of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The PR of MFPSDDP is 13.3 percentage points higher than the PR of a Method for Filtering Pages by Containing Strings (MFPCS)

    FilteredWebHtmlDataBase

    No full text
    This dataset is used in my study about my manuscript "A Method for Filtering Pages by Similarity Degree based on Dynamic Programming". The data file is FilteredWebHtmlDataBase.mdf. The log file is FilteredWebHtmlDataBase_log.ldf. The database is SQL Server 2017. The downloaded url of the data file is https://pan.baidu.com/s/1N56EzVhW0M-qPWZQzORGtw. The downloaded url of the log file is https://pan.baidu.com/s/1qwqD7CNCqjwRs1tgvpuYUQ
    corecore