3 research outputs found

    Surfing the modeling of pos taggers in low-resource scenarios

    Get PDF
    The recent trend toward the application of deep structured techniques has revealed the limits of huge models in natural language processing. This has reawakened the interest in traditional machine learning algorithms, which have proved still to be competitive in certain contexts, particularly in low-resource settings. In parallel, model selection has become an essential task to boost performance at reasonable cost, even more so when we talk about processes involving domains where the training and/or computational resources are scarce. Against this backdrop, we evaluate the early estimation of learning curves as a practical mechanism for selecting the most appropriate model in scenarios characterized by the use of non-deep learners in resource-lean settings. On the basis of a formal approximation model previously evaluated under conditions of wide availability of training and validation resources, we study the reliability of such an approach in a different and much more demanding operational environment. Using as a case study the generation of pos taggers for Galician, a language belonging to the Western Ibero-Romance group, the experimental results are consistent with our expectations.Ministerio de Ciencia e InnovaciĆ³n | Ref. PID2020-113230RB-C21Ministerio de Ciencia e InnovaciĆ³n | Ref. PID2020-113230RB-C22Xunta de Galicia | Ref. ED431C 2020/1

    Development of Machine Learning Applications: Named Entity Recognizer

    Get PDF
    Machine Learning is described in todayā€™s Information Technology world as one of the most promising research fields with great potential for providing a huge paradigm shift in modern systems. With the growth and the abundant availability of data, the need to structure, analyze and exploit these data has become a necessity for modern systems and a must for the major players within the field. Systems need to discover and structure data with minimal human involvement, while being able to adapt to the nature of the data, handle unseen patterns and still structure the data properly. One of the best-known applications of Machine Learning and one which output is considered the building block upon which more advanced systems rely is Named Entity Recognition. Named Entity Recognition (NER) is a classification task known better as one of the major applications of Natural Language Processing, which consists of classifying and assigning descriptive labels to sequences of text based on predefined classification categories. The presented work aims at the conceptualization, design, implementation and evaluation of a system able to perform Named Entity Recognition on different datasets, with the maximum attainable performance by using the best result-yielding techniques and following the conventions of the field. The developed system implements a well-known statistical prediction framework proven to be best suited for classification tasks similar to NER; Conditional Random Fields (CRF) models were used to perform the initial recognition. Combined with the CRF models, the system developed different postprocessing methods to implement a Hybrid NER system oriented towards achieving performance levels comparable to the state-of-the-art literature in the field. The research achieved language independent NER using the core of the developed system, and satisfying performance levels that were evaluated by conducting different experiments with different datasets and on different types of data

    Named entity recognition using hybrid machine learning approach

    No full text
    This paper presents a hybrid method using machine learning approach for named entity recognition (NER). A system built based on this method is able to achieve reasonable performance with minimal training data and gazetteers. The hybrid machine learning approach differs from previous machine learning-based systems in that it uses maximum entropy model (MEM) and hidden Markov model (HMM) successively. We report on the performance of our proposed NER system using British National Corpus (BNC). In the recognition process, we first use MEM to identify the named entities in the corpus by imposing some temporary tagging as references. The MEM walkthrough can be regarded as a training process for HMM, as we then use HMM for the final tagging. We show that with enough training data and appropriate error correction mechanism, this approach can achieve higher precision and recall than using a single statistical model We conclude with our experimental results that indicate the flexibility of our system in different domains
    corecore