5 research outputs found
Methods of Text Document Summarization
Diplomová práce se zabĂ˝vá jednodokumentovou sumarizacĂ textovĂ˝ch dat. Část práce je vÄ›nována pĹ™ĂpravÄ› dat, která je tvoĹ™ena hlavnÄ› normalizacĂ. Uvedeny jsou v nĂ nÄ›kterĂ© algoritmy stemizace a obsahuje i popis lematizace. Hlavnà část práce je vÄ›nována LuhnovÄ› sumarizaÄŤnĂ metodÄ› a jejĂmu rozšĂĹ™enĂ za pouĹŁitĂ slovnĂku WordNet. Popsána a implementována byla i Oswaldova metoda. NavrĹŁená a implementovaná aplikace provádĂ automatickou tvorbu abstraktĹŻ za pouĹŁitĂ zmĂnÄ›nĂ˝ch metod. Byla provedena i sada experimentĹŻ, kterĂ˝mi byla ověřena správná funkÄŤnost aplikace.This thesis deals with one-document summarization of text data. Part of it is devoted to data preparation, mainly to the normalization. Listed are some of the stemming algorithms and it contains also description of lemmatization. The main part is devoted to Luhn"s method for summarization and its extension of use WordNet dictionary. Oswald summarization method is described and applied as well. Designed and implemented application performs automatic generation of abstracts using these methods. A set of experiments where developed, which verified correct functionality of the application and of extension of Luhn"s summarization method too.
A text mining approach for Arabic question answering systems
As most of the electronic information available nowadays on the web is stored as text,developing Question Answering systems (QAS) has been the focus of many individualresearchers and organizations. Relatively, few studies have been produced for extractinganswers to “why” and “how to” questions. One reason for this negligence is that when goingbeyond sentence boundaries, deriving text structure is a very time-consuming and complexprocess. This thesis explores a new strategy for dealing with the exponentially large spaceissue associated with the text derivation task. To our knowledge, to date there are no systemsthat have attempted to addressing such type of questions for the Arabic language.We have proposed two analytical models; the first one is the Pattern Recognizer whichemploys a set of approximately 900 linguistic patterns targeting relationships that hold withinsentences. This model is enhanced with three independent algorithms to discover thecausal/explanatory role indicated by the justification particles. The second model is the TextParser which is approaching text from a discourse perspective in the framework of RhetoricalStructure Theory (RST). This model is meant to break away from the sentence limit. TheText Parser model is built on top of the output produced by the Pattern Recognizer andincorporates a set of heuristics scores to produce the most suitable structure representing thewhole text.The two models are combined together in a way to allow for the development of an ArabicQAS to deal with “why” and “how to” questions. The Pattern Recognizer model achieved anoverall recall of 81% and a precision of 78%. On the other hand, our question answeringsystem was able to find the correct answer for 68% of the test questions. Our results revealthat the justification particles play a key role in indicating intrasentential relations