Search CORE

49 research outputs found

DS4DH at #SMM4H 2023: Zero-Shot Adverse Drug Events Normalization using Sentence Transformers and Reciprocal-Rank Fusion

Author: Alvarez David Vicente
Rouhizadeh Hossein
Teodoro Douglas
Yazdani Anthony
Publication venue
Publication date: 15/08/2023
Field of study

This paper outlines the performance evaluation of a system for adverse drug event normalization, developed by the Data Science for Digital Health group for the Social Media Mining for Health Applications 2023 shared task 5. Shared task 5 targeted the normalization of adverse drug event mentions in Twitter to standard concepts from the Medical Dictionary for Regulatory Activities terminology. Our system hinges on a two-stage approach: BERT fine-tuning for entity recognition, followed by zero-shot normalization using sentence transformers and reciprocal-rank fusion. The approach yielded a precision of 44.9%, recall of 40.5%, and an F1-score of 42.6%. It outperformed the median performance in shared task 5 by 10% and demonstrated the highest performance among all participants. These results substantiate the effectiveness of our approach and its potential application for adverse drug event normalization in the realm of social media text mining

arXiv.org e-Print Archive

What’s going on in my city? Recommender systems and electronic participatory budgeting

Author: Marsal-Llacuna Maria-Lluïsa
Miori Vittorio
Nelimarkka Matti
Ning Xia
Peixoto Tiago
Álvarez-Sabucedo Luis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2018
Field of study

In this paper, we present electronic participatory budgeting (ePB) as a novel application domain for recommender systems. On public data from the ePB platforms of three major US cities – Cambridge, Miami and New York City–, we evaluate various methods that exploit heterogeneous sources and models of user preferences to provide personalized recommendations of citizen proposals. We show that depending on characteristics of the cities and their participatory processes, particular methods are more effective than others for each city. This result, together with open issues identified in the paper, call for further research in the area

Crossref

Open Research Online (The Open University)

BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion

Author: Banaszewski Blazej
Cattaneo Alberto
Farnsworth Thorin
Fitzgibbon Andrew
Justus Daniel
Liu Zhenying
Luschi Carlo
Maloberti Jerome
Mellor Harry
Orr Douglas
Publication venue
Publication date: 22/11/2022
Field of study

We present the award-winning submission to the WikiKG90Mv2 track of OGB-LSC@NeurIPS 2022. The task is link-prediction on the large-scale knowledge graph WikiKG90Mv2, consisting of 90M+ nodes and 600M+ edges. Our solution uses a diverse ensemble of

85

Knowledge Graph Embedding models combining five different scoring functions (TransE, TransH, RotatE, DistMult, ComplEx) and two different loss functions (log-sigmoid, sampled softmax cross-entropy). Each individual model is trained in parallel on a Graphcore Bow Pod

_{16}

using BESS (Balanced Entity Sampling and Sharing), a new distribution framework for KGE training and inference based on balanced collective communications between workers. Our final model achieves a validation MRR of 0.2922 and a test-challenge MRR of 0.2562, winning the first place in the competition. The code is publicly available at: https://github.com/graphcore/distributed-kge-poplar/tree/2022-ogb-submission.Comment: First place in the WikiKG90Mv2 track of the Open Graph Benchmark Large-Scale Challenge @NeurIPS202

arXiv.org e-Print Archive

Table 8: Total bigram entropy resulting from the proposed and baseline methods for all three datasets.

Author: Aiello
Aktunc
Albano
Alvarez
Arasu
Asur
Berberich
Blondel
Brin
Cazabet
Cazabet
Cormack
Ferlez
Fortunato
Gauvin
Giatsoglou
Giatsoglou
Granell
Greene
Gupta
Jaccard
Kim
Konstantinidis
Lancichinetti
Lazer
Leskovec
Lin
Lu
Marcus
McKelvey
Mucha
Newman
Nguyen
Nikolov
Palla
Papadopoulos
Qu
Rea
Rosvall
Rosvall
Roy Chowdhury
Schinas
Sørensen
Takaffoli
Tantipathananandh
Topirceanu
Volkovs
Wei
Williams
Xiao
Yang
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

PARADE: Passage Representation Aggregation for Document Reranking

Author: He B.
Li C.
MacAvaney S.
Sun Y.
Yates A.
Publication venue
Publication date: 01/01/2020
Field of study

We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document relevance score, overcoming the limitations of previous approaches that perform inference on passages independently. Experiments on two ad-hoc retrieval benchmarks demonstrate PARADE's effectiveness over such methods. We conduct extensive analyses on PARADE's efficiency, highlighting several strategies for improving it. When combined with knowledge distillation, a PARADE model with 72\% fewer parameters achieves effectiveness competitive with previous approaches using BERT-Base. Our code is available at \url{https://github.com/canjiali/PARADE}

MPG.PuRe

Known by the Company it Keeps: Proximity-Based Indexing for Physical Content in Archival Repositories

Author: Oard Douglas W.
Publication venue
Publication date: 29/05/2023
Field of study

Despite the plethora of born-digital content, vast troves of important content remain accessible only on physical media such as paper or microfilm. The traditional approach to indexing undigitized content is using manually created metadata that describes content at some level of aggregation (e.g., folder, box, or collection). Searchers led in this way to some subset of the content often must then manually examine substantial quantities of physical media to find what they are looking for. This paper proposes a complementary approach, in which selective digitization of a small portion of the content is used as a basis for proximity-based indexing as a way of bringing the user closer to the specific content for which they are looking. Experiments with 35 boxes of partially digitized US State Department records indicate that box-level indexes built in this way can provide a useful basis for search

arXiv.org e-Print Archive