22 research outputs found
The effect of word sense disambiguation accuracy on literature based discovery
Background
The volume of research published in the biomedical domain has increasingly lead to researchers focussing on specific areas of interest and connections between findings being missed. Literature based discovery (LBD) attempts to address this problem by searching for previously unnoticed connections between published information (also known as “hidden knowledge”). A common approach is to identify hidden knowledge via shared linking terms. However, biomedical documents are highly ambiguous which can lead LBD systems to over generate hidden knowledge by hypothesising connections through different meanings of linking terms. Word Sense Disambiguation (WSD) aims to resolve ambiguities in text by identifying the meaning of ambiguous terms. This study explores the effect of WSD accuracy on LBD performance.
Methods
An existing LBD system is employed and four approaches to WSD of biomedical documents integrated with it. The accuracy of each WSD approach is determined by comparing its output against a standard benchmark. Evaluation of the LBD output is carried out using timeslicing approach, where hidden knowledge is generated from articles published prior to a certain cutoff date and a gold standard extracted from publications after the cutoff date.
Results
WSD accuracy varies depending on the approach used. The connection between the performance of the LBD and WSD systems are analysed to reveal a correlation between WSD accuracy and LBD performance.
Conclusion
This study reveals that LBD performance is sensitive to WSD accuracy. It is therefore concluded that WSD has the potential to improve the output of LBD systems by reducing the amount of spurious hidden knowledge that is generated. It is also suggested that further improvements in WSD accuracy have the potential to improve LBD accuracy
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Large language models (LLMs) have been shown to be able to perform new tasks
based on a few demonstrations or natural language instructions. While these
capabilities have led to widespread adoption, most LLMs are developed by
resource-rich organizations and are frequently kept from the public. As a step
towards democratizing this powerful technology, we present BLOOM, a
176B-parameter open-access language model designed and built thanks to a
collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer
language model that was trained on the ROOTS corpus, a dataset comprising
hundreds of sources in 46 natural and 13 programming languages (59 in total).
We find that BLOOM achieves competitive performance on a wide variety of
benchmarks, with stronger results after undergoing multitask prompted
finetuning. To facilitate future research and applications using LLMs, we
publicly release our models and code under the Responsible AI License
Effet hall plan dans les couches minces ferromagnétiques déposées sous vide
An experimental study of ferromagnetic thin film conditions of evaporation, film thickness, composition and shape of the electrodes has been undertaken. Anisotropic Ni-Fe films with various additions of Pd, V, Co, Mo, showed a maximum planar Hall effect for the composition Ni-Fe 86/14. The optimization of the geometrical parameters of the electrodes and the magnetic film elements is described, allowing one to design for maximum output voltage or maximum output current in a short circuited loop.Nous avons effectué une étude expérimentale en fonction des conditions d'évaporation, de l'épaisseur du film, de la composition et de la forme des électrodes. Les films anisotropes de Ni-Fe ou contenant différents pourcentages de Pd, V, Co, Mo présentent un effet Hall plan maximum pour la composition Ni-Fe 86/14. L'optimisation des paramètres géométriques des électrodes et de l'élément magnétique permet d'obtenir une tension de sortie maximum ou un courant de sortie en court-circuit maximum
Joining Forces To Resolve Lexical Ambiguity: East Meets West In Barcelona
This paper describes the component models and combination model built as a joint effort between Swarthmore College, Hong Kong PolyU, and HKUST. Though other models described else where contributed to the final combination model, this paper focuses solely on the joint contributions to the ”Swat-HK” effort
UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish
The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in