49 research outputs found
DS4DH at #SMM4H 2023: Zero-Shot Adverse Drug Events Normalization using Sentence Transformers and Reciprocal-Rank Fusion
This paper outlines the performance evaluation of a system for adverse drug
event normalization, developed by the Data Science for Digital Health group for
the Social Media Mining for Health Applications 2023 shared task 5. Shared task
5 targeted the normalization of adverse drug event mentions in Twitter to
standard concepts from the Medical Dictionary for Regulatory Activities
terminology. Our system hinges on a two-stage approach: BERT fine-tuning for
entity recognition, followed by zero-shot normalization using sentence
transformers and reciprocal-rank fusion. The approach yielded a precision of
44.9%, recall of 40.5%, and an F1-score of 42.6%. It outperformed the median
performance in shared task 5 by 10% and demonstrated the highest performance
among all participants. These results substantiate the effectiveness of our
approach and its potential application for adverse drug event normalization in
the realm of social media text mining
Whatâs going on in my city? Recommender systems and electronic participatory budgeting
In this paper, we present electronic participatory budgeting (ePB) as a novel application domain for recommender systems. On public data from the ePB platforms of three major US cities â Cambridge, Miami and New York Cityâ, we evaluate various methods that exploit heterogeneous sources and models of user preferences to provide personalized recommendations of citizen proposals. We show that depending on characteristics of the cities and their participatory processes, particular methods are more effective than others for each city. This result, together with open issues identified in the paper, call for further research in the area
BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion
We present the award-winning submission to the WikiKG90Mv2 track of
OGB-LSC@NeurIPS 2022. The task is link-prediction on the large-scale knowledge
graph WikiKG90Mv2, consisting of 90M+ nodes and 600M+ edges. Our solution uses
a diverse ensemble of Knowledge Graph Embedding models combining five
different scoring functions (TransE, TransH, RotatE, DistMult, ComplEx) and two
different loss functions (log-sigmoid, sampled softmax cross-entropy). Each
individual model is trained in parallel on a Graphcore Bow Pod using
BESS (Balanced Entity Sampling and Sharing), a new distribution framework for
KGE training and inference based on balanced collective communications between
workers. Our final model achieves a validation MRR of 0.2922 and a
test-challenge MRR of 0.2562, winning the first place in the competition. The
code is publicly available at:
https://github.com/graphcore/distributed-kge-poplar/tree/2022-ogb-submission.Comment: First place in the WikiKG90Mv2 track of the Open Graph Benchmark
Large-Scale Challenge @NeurIPS202
PARADE: Passage Representation Aggregation for Document Reranking
We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document relevance score, overcoming the limitations of previous approaches that perform inference on passages independently. Experiments on two ad-hoc retrieval benchmarks demonstrate PARADE's effectiveness over such methods. We conduct extensive analyses on PARADE's efficiency, highlighting several strategies for improving it. When combined with knowledge distillation, a PARADE model with 72\% fewer parameters achieves effectiveness competitive with previous approaches using BERT-Base. Our code is available at \url{https://github.com/canjiali/PARADE}
Known by the Company it Keeps: Proximity-Based Indexing for Physical Content in Archival Repositories
Despite the plethora of born-digital content, vast troves of important
content remain accessible only on physical media such as paper or microfilm.
The traditional approach to indexing undigitized content is using manually
created metadata that describes content at some level of aggregation (e.g.,
folder, box, or collection). Searchers led in this way to some subset of the
content often must then manually examine substantial quantities of physical
media to find what they are looking for. This paper proposes a complementary
approach, in which selective digitization of a small portion of the content is
used as a basis for proximity-based indexing as a way of bringing the user
closer to the specific content for which they are looking. Experiments with 35
boxes of partially digitized US State Department records indicate that
box-level indexes built in this way can provide a useful basis for search