3 research outputs found

    Improving Low-Resource Named-Entity Recognition and Neural Machine Translation

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Named-entity Recognition (NER) and machine translation (MT) are two very popular and widespread tasks in natural language processing (NLP). The former aims to identify mentions of pre-defined classes (e.g. person name, location, time...) in text. The latter is more complex, as it involves translating text from a language into a language. In recent years, both tasks have been dominated by deep neural networks, which have achieved higher accuracy compared to other traditional machine learning models. However, this is not invariably true. Neural networks often require large human-annotated training datasets to learn the tasks and perform optimally. Such datasets are not always available, as annotating data is often time-consuming and expensive. When human-annotated data are scarce (e.g. low-resource languages, very specific domains), deep neural models suffer from the overfitting problem and perform poorly on new, unseen data. In these cases, traditional machine learning models may still outperform neural models. The focus of this research has been to develop deep learning models that suffer less from overfitting and can generalize better in NER and MT tasks, particularly when they are trained with small labelled datasets. The main findings and contributions of this thesis are the following. First, health-domain word embeddings have been used for health-domain NER tasks such as drug name recognition and clinical concept extraction. The word embeddings have been pretrained over medical domain texts and used as initialization of the input features of a recurrent neural network. Our neural models trained with such embeddings have outperformed previously proposed, traditional machine learning models over small, dedicated datasets. Second, the first systematic comparison of statistical MT and neural MT models over English-Basque, a low-resource language pair, has been conducted. This has shown that statistical models can perform slightly better than the neural models over the available datasets. Third, we have proposed a novel regularization technique for MT, based on regressing word and sentence embeddings. The regularizer has helped to considerably improve the translation quality of strong neural machine translation baselines. Fourth, we have proposed using reinforcement-style training with discourse rewards to improve the performance of document-level neural machine translation models. The proposed training has helped to improve the discourse properties of the translated documents such as the lexical cohesion and coherence over various low- and high-resource language pairs. Finally, a shared attention mechanism has helped to improve translation accuracy and the interpretability of the models

    Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages

    Get PDF
    Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years

    UHF RFID temperature sensor assisted with body-heat dissipation energy harvesting

    No full text
    The number of wireless medical wearables has increased in recent years and is revolutionizing the current healthcare system. However, the state-of-the-art systems still need to be improved, as they are bulky, battery powered, and so require maintenance. On the contrary, battery-free wearables have unlimited lifetimes, are smaller, and are cheaper. This paper describes a design of a battery free wearable system that measures the skin temperature of the human body while at the same time collects energy from body heat. The system is composed of an UHF RFID temperature sensor tag located on the arm of the patient. It is assisted with extra power supply from a power harvesting module that stores the thermal energy dissipated from the neck of the patient. This paper presents the experimental results of the stored thermal energy, and characterizes the module in different conditions, e.g., still, walking indoors, and walking outdoors. Finally, the tag is tested in a fully passive condition and when it is power assisted. Our experimental results show that the communication range of the RFID sensor is improved by 100% when measurements are done every 750 ms and by 75% when measurements are done every 1000 ms when the sensor is assisted with the power harvesting module
    corecore