32 research outputs found

    Automatic summarization of Malayalam documents using clause identification method

    Get PDF
    Text summarization is an active research area in the field of natural language processing. Huge amount of information in the internet necessitates the development of automatic summarization systems. There are two types of summarization techniques: Extractive and Abstractive. Extractive summarization selects important sentences from the text and produces summary as it is present in the original document. Abstractive summarization systems will provide a summary of the input text as is generated by human beings. Abstractive summary requires semantic analysis of text. Limited works have been carried out in the area of abstractive summarization in Indian languages especially in Malayalam. Only extractive summarization methods are proposed in Malayalam. In this paper, an abstractive summarization system for Malayalam documents using clause identification method is proposed. As part of this research work, a POS tagger and a morphological analyzer for Malayalam words in cricket domain are also developed. The clauses from input sentences are identified using a modified clause identification algorithm. The clauses are then semantically analyzed using an algorithm to identify semantic triples - subject, object and predicate. The score of each clause is then calculated by using feature extraction and the important clauses which are to be included in the summary are selected based on this score. Finally an algorithm is used to generate the sentences from the semantic triples of the selected clauses which is the abstractive summary of input documents

    Survey of Spell Checking Techniques for Malayalam: NLP

    Get PDF
    Abstract-Spell checking is a well-known task in Natural Language Processing. Nowadays, spell checkers are an important component of a number of computer software such as web browsers, word processors and others. Spelling error detection and correction is the process that will check the spelling of words in a document, and in occurrence of any error, list out the correct spelling in the form of suggestions. This survey paper covers different spelling error detection and correction techniques in various languages

    Exploration of Corpus Augmentation Approach for English-Hindi Bidirectional Statistical Machine Translation System

    Get PDF
    Even though lot of Statistical Machine Translation(SMT) research work is happening for English-Hindi language pair, there is no effort done to standardize the dataset. Each of the research work uses different dataset, different parameters and different number of sentences during various phases of translation resulting in varied translation output. So comparing  these models, understand the result of these models, to get insight into corpus behavior for these models, regenerating the result of these research work  becomes tedious. This necessitates the need for standardization of dataset and to identify the common parameter for the development of model.  The main contribution of this paper is to discuss an approach to standardize the dataset and to identify the best parameter which in combination gives best performance. It also investigates a novel corpus augmentation approach to improve the translation quality of English-Hindi bidirectional statistical machine translation system. This model works well for the scarce resource without incorporating the external parallel data corpus of the underlying language.  This experiment is carried out using Open Source phrase-based toolkit Moses. Indian Languages Corpora Initiative (ILCI) Hindi-English tourism corpus is used.  With limited dataset, considerable improvement is achieved using the corpus augmentation approach for the English-Hindi bidirectional SMT system

    Development and Design of Deep Learning-based Parts-of-Speech Tagging System for Azerbaijani language

    Get PDF
    Parts-of-Speech (POS) tagging, also referred to as word-class disambiguation, is one of the prerequisite techniques that are used as part of the advanced pre-processing stage across pipeline at the majority of natural language processing (NLP) applications. By using this tool as a preliminary step, most NLP software, such as Chat Bots, Translating Engines, Voice Recognitions, etc., assigns a prior part of speech to each word in the given data in order to identify or distinguish the grammatical category, so they can easily decipher the meaning of the word. This thesis addresses the novel approach to the issue related to the clarification of word context for the Azerbaijani language by using a deep learning-based automatic speech tagger on a clean (manually annotated) dataset. Azerbaijani is a member of the Turkish family and an agglutinative language. In contrast to other languages, recent research studies of speech taggers for the Azerbaijani language were unable to deliver efficient state of the art accuracy. Thus, in this thesis, study is being conducted to investigate how deep learning strategies such as simple recurrent neural networks (RNN), long short-term memory (LSTM), bi-directional long short-term memory (Bi-LSTM), and gated recurrent unit (GRU) might be used to enhance the POS tagging capabilities of the Azerbaijani language

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natürlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field
    corecore