3 research outputs found
PublishInCovid19 at WNUT 2020 Shared Task-1: Entity Recognition in Wet Lab Protocols using Structured Learning Ensemble and Contextualised Embeddings
In this paper, we describe the approach that we employed to address the task
of Entity Recognition over Wet Lab Protocols -- a shared task in EMNLP
WNUT-2020 Workshop. Our approach is composed of two phases. In the first phase,
we experiment with various contextualised word embeddings (like Flair,
BERT-based) and a BiLSTM-CRF model to arrive at the best-performing
architecture. In the second phase, we create an ensemble composed of eleven
BiLSTM-CRF models. The individual models are trained on random train-validation
splits of the complete dataset. Here, we also experiment with different output
merging schemes, including Majority Voting and Structured Learning Ensembling
(SLE). Our final submission achieved a micro F1-score of 0.8175 and 0.7757 for
the partial and exact match of the entity spans, respectively. We were ranked
first and second, in terms of partial and exact match, respectively
Enhancing Textbooks with Visuals from the Web for Improved Learning
Textbooks are the primary vehicle for delivering quality education to
students. It has been shown that explanatory or illustrative visuals play a key
role in the retention, comprehension and the general transfer of knowledge.
However, many textbooks, especially in the developing world, are low quality
and lack interesting visuals to support student learning. In this paper, we
investigate the effectiveness of vision-language models to automatically
enhance textbooks with images from the web. Specifically, we collect a dataset
of e-textbooks from one of the largest free online publishers in the world. We
rigorously analyse the dataset, and use the resulting analysis to motivate a
task that involves retrieving and appropriately assigning web images to
textbooks, which we frame as a novel optimization problem. Through a
crowd-sourced evaluation, we verify that (1) while the original textbook images
are rated higher, automatically assigned ones are not far behind, and (2) the
choice of the optimization problem matters. We release the dataset of textbooks
with an associated image bank to spur further research in this area.Comment: 17 pages, 27 figure
Forgotten Knowledge: Examining the Citational Amnesia in NLP
Citing papers is the primary method through which modern scientific writing
discusses and builds on past work. Collectively, citing a diverse set of papers
(in time and area of study) is an indicator of how widely the community is
reading. Yet, there is little work looking at broad temporal patterns of
citation. This work systematically and empirically examines: How far back in
time do we tend to go to cite papers? How has that changed over time, and what
factors correlate with this citational attention/amnesia? We chose NLP as our
domain of interest and analyzed approximately 71.5K papers to show and quantify
several key trends in citation. Notably, around 62% of cited papers are from
the immediate five years prior to publication, whereas only about 17% are more
than ten years old. Furthermore, we show that the median age and age diversity
of cited papers were steadily increasing from 1990 to 2014, but since then, the
trend has reversed, and current NLP papers have an all-time low temporal
citation diversity. Finally, we show that unlike the 1990s, the highly cited
papers in the last decade were also papers with the least citation diversity,
likely contributing to the intense (and arguably harmful) recency focus. Code,
data, and a demo are available on the project homepage.Comment: ACL 2023 Main Conferenc