8 research outputs found

    A Review on Extraction and Recommendation of Educational Resources from WWW

    Get PDF
    Keyphrases give a basic method for portraying a report, giving the peruser a few pieces of information about its substance. Wrapper adjustment goes for consequently adjusting a formerly took in wrapper from the source Web webpage to another concealed website for data extraction. It depends on a generative model for the age of content parts identified with characteristic things and designing information in a Web page. To take care of the wrapper adjustment issue, we consider two sorts of data from the source Web webpage. The principal sort of data is the extraction information contained in the already took in wrapper from the source Web webpage. The second sort of data is the beforehand separated or gathered things. Utilize a Bayesian learning way to deal with naturally select an arrangement of preparing cases for adjusting a wrapper for the new concealed site. To take care of the new property revelation issue, we build up a model which breaks down the encompassing content sections of the qualities in the new inconspicuous site. A Bayesian learning strategy is produced to find the new qualities and their headers. The direct broad investigations from various genuine Web locales to show the viability of our structure. Keyphrases can be helpful in a different applications, for example, recovery motors, perusing interfaces, thesaurus development, content mining and so on. There are likewise different errands for which keyphrases are helpful

    Automatic Keyphrase Extraction

    Get PDF
    Increasing number of documents in the Web caused the growth of needs for tools supporting automatic search and classification of texts. Keywords are one of characteristic features of documents that may be used as criteria in automatic document management. In the paper we describe the technique for automatic keyphrase extraction based on the KEA algorithm [1]. The main modifications consist in changes in the stemming method and simplification of the discretization technique. Besides, in the presented algorithm the keyphrase list may contain proper names, and the candidate phrase list may contain number sequences. We describe experiments, that were done on the set of English language documents available in the Internet and that allow for optimization of extraction parameters. The comparison of the efficiency of the algorithm with the KEA technique is presented

    A Comparative Study of the Effect of Word Segmentation On Chinese Terminology Extraction

    Get PDF
    PACLIC 20 / Wuhan, China / 1-3 November, 200

    Efficient Methods for Multigram Compound Discovery

    Get PDF

    觀念史方法與中國研究

    Get PDF

    A generic Chinese PAT tree data structure for Chinese documents clustering.

    Get PDF
    Kwok Chi Leong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 122-127).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgment --- p.viTable of Contents --- p.viiList of Tables --- p.xList of Figures --- p.xiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Contributions --- p.2Chapter 1.2 --- Thesis Overview --- p.3Chapter Chapter 2 --- Background Information --- p.5Chapter 2.1 --- Documents Clustering --- p.5Chapter 2.1.1 --- Review of Clustering Techniques --- p.5Chapter 2.1.2 --- Suffix Tree Clustering --- p.7Chapter 2.2 --- Chinese Information Processing --- p.8Chapter 2.2.1 --- Sentence Segmentation --- p.8Chapter 2.2.2 --- Keyword Extraction --- p.10Chapter Chapter 3 --- The Generic Chinese PAT Tree --- p.12Chapter 3.1 --- PAT Tree --- p.13Chapter 3.1.1 --- Patricia Tree --- p.13Chapter 3.1.2 --- Semi-Infinite String --- p.14Chapter 3.1.3 --- Structure of Tree Nodes --- p.17Chapter 3.1.4 --- Some Examples of PAT Tree --- p.22Chapter 3.1.5 --- Storage Complexity --- p.24Chapter 3.2 --- The Chinese PAT Tree --- p.26Chapter 3.2.1 --- The Chinese PAT Tree Structure --- p.26Chapter 3.2.2 --- Some Examples of Chinese PAT Tree --- p.30Chapter 3.2.3 --- Storage Complexity --- p.33Chapter 3.3 --- The Generic Chinese PAT Tree --- p.34Chapter 3.3.1 --- Structure Overview --- p.34Chapter 3.3.2 --- Structure of Tree Nodes --- p.35Chapter 3.3.3 --- Essential Node --- p.37Chapter 3.3.4 --- Some Examples of the Generic Chinese PAT Tree --- p.41Chapter 3.3.5 --- Storage Complexity --- p.45Chapter 3.4 --- Problems of Embedded Nodes --- p.46Chapter 3.4.1 --- The Reduced Structure --- p.47Chapter 3.4.2 --- Disadvantages of Reduced Structure --- p.48Chapter 3.4.3 --- A Case Study of Reduced Design --- p.50Chapter 3.4.4 --- Experiments on Frequency Mismatch --- p.51Chapter 3.5 --- Strengths of the Generic Chinese PAT Tree --- p.55Chapter Chapter 4 --- Performance Analysis on the Generic Chinese PAT Tree --- p.58Chapter 4.1 --- The Construction of the Generic Chinese PAT Tree --- p.59Chapter 4.2 --- Counting the Essential Nodes --- p.61Chapter 4.3 --- Performance of Various PAT Trees --- p.62Chapter 4.4 --- The Implementation Analysis --- p.64Chapter 4.4.1 --- Pure Dynamic Memory Allocation --- p.64Chapter 4.4.2 --- Node Production Factory Approach --- p.66Chapter 4.4.3 --- Experiment Result of the Factory Approach --- p.68Chapter Chapter 5 --- The Chinese Documents Clustering --- p.70Chapter 5.1 --- The Clustering Framework --- p.70Chapter 5.1.1 --- Documents Cleaning --- p.73Chapter 5.1.2 --- PAT Tree Construction --- p.76Chapter 5.1.3 --- Essential Node Extraction --- p.77Chapter 5.1.4 --- Base Clusters Detection --- p.80Chapter 5.1.5 --- Base Clusters Filtering --- p.86Chapter 5.1.6 --- Base Clusters Combining --- p.94Chapter 5.1.7 --- Documents Assigning --- p.95Chapter 5.1.8 --- Result Presentation --- p.96Chapter 5.2 --- Discussion --- p.96Chapter 5.2.1 --- Flexibility of Our Framework --- p.96Chapter 5.2.2 --- Our Clustering Model --- p.97Chapter 5.2.3 --- More About Clusters Detection --- p.98Chapter 5.2.4 --- Analysis and Complexity --- p.100Chapter Chapter 6 --- Evaluations on the Chinese Documents Clustering --- p.101Chapter 6.1 --- Details of Experiment --- p.101Chapter 6.1.1 --- Parameter of Weighted Frequency --- p.105Chapter 6.1.2 --- Effect of CLP Analysis --- p.105Chapter 6.1.3 --- Result of Clustering --- p.108Chapter 6.2 --- Clustering on Larger Collection --- p.109Chapter 6.2.1 --- Comparing the Base Clusters --- p.109Chapter 6.2.2 --- Result of Clustering --- p.111Chapter 6.2.3 --- Discussion --- p.112Chapter 6.3 --- Clustering with Part of Documents --- p.113Chapter 6.3.1 --- Clustering with News Headlines --- p.114Chapter 6.3.2 --- Clustering with News Abstract --- p.117Chapter Chapter 7 --- Conclusion --- p.119Bibliography --- p.12
    corecore