Search CORE

9 research outputs found

Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

Author: Haizhou Li
Rafael E Banchs
Seokhwan Kim
Publication venue
Publication date: 06/03/2020
Field of study

Abstract While most previous work on Wikification has focused on written texts, this paper presents a Wikification approach for spoken dialogues. A set of analyzers are proposed to learn dialogue-specific properties along with domain knowledge of conversations from Wikipedia. Then, the analyzed properties are used as constraints for generating candidates, and the candidates are ranked to find the appropriate links. The experimental results show that our proposed approach can significantly improve the performances of the task in human-human dialogues

CiteSeerX

Entity Linking in Low-Annotation Data Settings

Author: Schumacher Elliot Paul
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 05/07/2023
Field of study

Recent advances in natural language processing have focused on applying and adapting large pretrained language models to specific tasks. These models, such as BERT (Devlin et al., 2019) and BART (Lewis et al., 2020a), are pretrained on massive amounts of unlabeled text across a variety of domains. The impact of these pretrained models is visible in the task of entity linking, where a mention of an entity in unstructured text is matched to the relevant entry in a knowledge base. State-of-the-art linkers, such as Wu et al. (2020) and De Cao et al. (2021), leverage pretrained models as a foundation for their systems. However, these models are also trained on large amounts of annotated data, which is crucial to their performance. Often these large datasets consist of domains that are easily annotated, such as Wikipedia or newswire text. However, tailoring NLP tools to a narrow variety of textual domains severely restricts their use in the real world. Many other domains, such as medicine or law, do not have large amounts of entity linking annotations available. Entity linking, which serves to bridge the gap between massive unstructured amounts of text and structured repositories of knowledge, is equally crucial in these domains. Yet tools trained on newswire or Wikipedia annotations are unlikely to be well-suited for identifying medical conditions mentioned in clinical notes. As most annotation efforts focus on English, similar challenges can be noted in building systems for non-English text. There is often a relatively small amount of annotated data in these domains. With this being the case, looking to other types of domain-specific data, such as unannotated text or highly-curated structured knowledge bases, is often required. In these settings, it is crucial to translate lessons taken from tools tailored for high-annotation domains into algorithms that are suited for low-annotation domains. This requires both leveraging broader types of data and understanding the unique challenges present in each domain

JScholarship

Advanced Location-Based Technologies and Services

Author
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Since the publication of the first edition in 2004, advances in mobile devices, positioning sensors, WiFi fingerprinting, and wireless communications, among others, have paved the way for developing new and advanced location-based services (LBSs). This second edition provides up-to-date information on LBSs, including WiFi fingerprinting, mobile computing, geospatial clouds, geospatial data mining, location privacy, and location-based social networking. It also includes new chapters on application areas such as LBSs for public health, indoor navigation, and advertising. In addition, the chapter on remote sensing has been revised to address advancements

OAPEN Library