22 research outputs found

    The effect of word sense disambiguation accuracy on literature based discovery

    Get PDF
    Background The volume of research published in the biomedical domain has increasingly lead to researchers focussing on specific areas of interest and connections between findings being missed. Literature based discovery (LBD) attempts to address this problem by searching for previously unnoticed connections between published information (also known as “hidden knowledge”). A common approach is to identify hidden knowledge via shared linking terms. However, biomedical documents are highly ambiguous which can lead LBD systems to over generate hidden knowledge by hypothesising connections through different meanings of linking terms. Word Sense Disambiguation (WSD) aims to resolve ambiguities in text by identifying the meaning of ambiguous terms. This study explores the effect of WSD accuracy on LBD performance. Methods An existing LBD system is employed and four approaches to WSD of biomedical documents integrated with it. The accuracy of each WSD approach is determined by comparing its output against a standard benchmark. Evaluation of the LBD output is carried out using timeslicing approach, where hidden knowledge is generated from articles published prior to a certain cutoff date and a gold standard extracted from publications after the cutoff date. Results WSD accuracy varies depending on the approach used. The connection between the performance of the LBD and WSD systems are analysed to reveal a correlation between WSD accuracy and LBD performance. Conclusion This study reveals that LBD performance is sensitive to WSD accuracy. It is therefore concluded that WSD has the potential to improve the output of LBD systems by reducing the amount of spurious hidden knowledge that is generated. It is also suggested that further improvements in WSD accuracy have the potential to improve LBD accuracy

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Full text link
    Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License

    Constraint Grammar-Based Swedish-Danish Machine Translation

    No full text

    Effet hall plan dans les couches minces ferromagnétiques déposées sous vide

    No full text
    An experimental study of ferromagnetic thin film conditions of evaporation, film thickness, composition and shape of the electrodes has been undertaken. Anisotropic Ni-Fe films with various additions of Pd, V, Co, Mo, showed a maximum planar Hall effect for the composition Ni-Fe 86/14. The optimization of the geometrical parameters of the electrodes and the magnetic film elements is described, allowing one to design for maximum output voltage or maximum output current in a short circuited loop.Nous avons effectué une étude expérimentale en fonction des conditions d'évaporation, de l'épaisseur du film, de la composition et de la forme des électrodes. Les films anisotropes de Ni-Fe ou contenant différents pourcentages de Pd, V, Co, Mo présentent un effet Hall plan maximum pour la composition Ni-Fe 86/14. L'optimisation des paramètres géométriques des électrodes et de l'élément magnétique permet d'obtenir une tension de sortie maximum ou un courant de sortie en court-circuit maximum

    Demystifying the Semantics of Relevant Objects in Scholarly Collections

    No full text

    Joining Forces To Resolve Lexical Ambiguity: East Meets West In Barcelona

    No full text
    This paper describes the component models and combination model built as a joint effort between Swarthmore College, Hong Kong PolyU, and HKUST. Though other models described else where contributed to the final combination model, this paper focuses solely on the joint contributions to the ”Swat-HK” effort

    UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish

    No full text
    The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in
    corecore