Search CORE

1 research outputs found

Helmholtz Principle-Based Keyword Extraction

Author: Pradhan Anima
Publication venue
Publication date: 01/01/2013
Field of study

In today’s world of evolving technology, everybody wishes to accomplish tasks in least time. As information available online is perpetuating every day, it becomes very difficult to summarize any more than 100 documents in acceptable time. Thus, ”text summarization” is a challenging problem in the area of Natural Language Processing (NLP) especially in the context of global languages. In this thesis, we survey taxonomy of text summarization from different aspects. It briefly explains different approaches to summarization and the evaluation parameters. Also presented are a thorough details and facts about more than fifty automatic text summarization systems to ease the job of researchers and serve as a short encyclopedia for the investigated systems. Keyword extraction methods plays vital role in text mining and document processing. Keywords represent essential content of a document. Text mining applications take the advantage of keywords for processing documents. A quality Keyword is a word that represents the exact content of the text subsetly. It is very difficult to process large number of documents to get high quality keywords in acceptable time. This thesis gives a comparison between the most popular keyword extractions method, tf-idf and the proposed method that is based on Helmholtz Principle. Helmholtz Principle is based on the ideas from image processing and derived from the Gestalt theory of human perception. We also investigate the run time to extract the keywords by both the methods. Experimental results show that keyword extraction method based on Helmholtz Principle outperformancetf-idf

ethesis@nitr