19,102 research outputs found

    Feature Subset Selection Using Genetic Algorithm for Named Entity Recognition

    Get PDF

    Named Entity Recognizer for Telugu language using Hybrid approach

    Get PDF
    The main goal of Named Entity Recognition (NER) is to classify all Named Entities (NE) in a document into predefined classes like Person name, Location name, Organization name and Miscellaneous. This paper outlines Named Entity Recognizer using hybrid approach i.e., combination of Rule based approach and one of the Machine learning technique i.e, Conditional Random Field (CRF). In Rule based approach we have prepared Gazetteer lists for names of persons, locations and organizations; some suffix and prefix features and dictionary consisting 350266 words to recognize the category of named entities. If ambiguity is rised while we are using Rule based approach, we use Machine learning technique i.e., CRF in order to improve the accuracy

    Myanmar named entity corpus and its use in syllable-based neural named entity recognition

    Get PDF
    Myanmar language is a low-resource language and this is one of the main reasons why Myanmar Natural Language Processing lagged behind compared to other languages. Currently, there is no publicly available named entity corpus for Myanmar language. As part of this work, a very first manually annotated Named Entity tagged corpus for Myanmar language was developed and proposed to support the evaluation of named entity extraction. At present, our named entity corpus contains approximately 170,000 name entities and 60,000 sentences. This work also contributes the first evaluation of various deep neural network architectures on Myanmar Named Entity Recognition. Experimental results of the 10-fold cross validation revealed that syllable-based neural sequence models without additional feature engineering can give better results compared to baseline CRF model. This work also aims to discover the effectiveness of neural network approaches to textual processing for Myanmar language as well as to promote future research works on this understudied language
    • …
    corecore