4 research outputs found

    An Evaluation Methodology of Named Entities Recognition in Spanish Language: ECU 911 Case Study

    Get PDF
    The importance of the gathered information in Integrated Security Services as ECU911 in Ecuador is evidenced in terms of its quality and availability in order to perform decision-making tasks. It is a priority to avoid the loss of relevant information such as event address, places references, names, etc. In this context it is present Named Entity Recognition (NER) analysis for discovering information into informal texts. Unlike structured corpus and labeled for NER analysis like CONLL2002 or ANCORA, informal texts generated from emergency call dialogues have a very wide linguistic variety; in addition, there is a strong tending to lose important information in their processing. A relevant aspect to considerate is the identification of texts that denotes entities such as the physical address where emergency events occurred. This study aims to extract the locations in which an emergency event has been issued. A set of experiments was performed with NER models based on Convolutional Neural Network (CNN). The performance of models was evaluated according to parameters such as training dataset size, dropout rate, location dictionary, and denoting location. An experimentation methodology was proposed, with it follows the next steps: i) Data preprocessing, ii) Dataset labeling, iii) Model structuring, and iv) Model evaluating. Results revealed that the performance of a model improves when having more training data, an adequate dropout rate to control overfitting problems, and a combination of a dictionary of locations and replacing words denoting entities

    LANGUAGE MODELS FOR RARE DISEASE INFORMATION EXTRACTION: EMPIRICAL INSIGHTS AND MODEL COMPARISONS

    Get PDF
    End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and significantly lag pipeline models. Our results also hold for a second E2ERE dataset for chemical-protein interactions
    corecore