Search CORE

6 research outputs found

Expansion de requêtes à base de motifs et de Word Embeddings pour améliorer la recherche de microblogs

Author: Bendella Meryem
Quafafou Mohamed
Publication venue: HAL CCSD
Publication date: 27/03/2019
Field of study

International audienceSocial microblogging services have an especially significant role in our society. Twitter is one of the most popular microblogging sites used by people to find relevant information (e.g., breaking news, popular trends, information about people of interest, etc). In this context, retrieving information from such data has recently gained growing attention and opening new challenges. However, the size of such data and queries is usually short and may impact the search result. Query Expansion (QE) has the main task in this issue. In fact, words can have different meanings where only one is used for a given context. In this paper, we propose a QE method by considering the meaning of the context. Thus, we use patterns and Word Embeddings to expand users' queries. We experiment and evaluate the proposed method on the TREC dataset. Results show the effectiveness of the proposed approach and signify the combination of patterns and word embedding for enhanced microblog retrieval.Les services sociaux de microblogging jouent un rôle important dans notre société. Twitter est l'une des plateformes de microblogging les plus populaires, utilisées par les internautes pour trouver des informations pertinentes (sujets d'actualité, tendances populaires, informations sur certains internautes, etc.). Dans ce contexte, la recherche d'information provenant de telles données a récemment gagné un intérêt majeur et ouvert de nouveaux défis. Cependant, la taille de ces données ainsi que des requêtes est généralement courte et peut avoir un impact sur le résultat de la recherche. Cette dernière peut être améliorée à l'aide de l'expansion de requêtes. En effet, les mots peuvent avoir plusieurs sens dont un seul est utilisé pour un contexte donné. Dans cet article, nous proposons une méthode d'expansion de requêtes prenant en compte le sens du contexte. Nous utilisons les motifs et les plongements de mots pour étendre les requêtes des utilisateurs. L'évaluation expérimentale de la méthode proposée est menée sur la collection TREC. Les résultats montrent l'efficacité de l'approche en combinant des motifs avec des plongements de mots pour améliorer significativement la recherche de microblog

How good are stories told by Twitter? An NLP Approach

Author: Mafalda Rodrigues Castro
Publication venue
Publication date: 11/11/2022
Field of study

Repositório Aberto da Universidade do Porto

Geographic information extraction from texts

Author: Hu Xuke
Hu Yingjie
Kersten Jens
Resch Bernd
Publication venue
Publication date: 05/12/2023
Field of study

A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

Institute of Transport Research:Publications

Ranking and Retrieval under Semantic Relevance

Author: Chen Tongfei
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 16/02/2021
Field of study

This thesis presents a series of conceptual and empirical developments on the ranking and retrieval of candidates under semantic relevance. Part I of the thesis introduces the concept of uncertainty in various semantic tasks (such as recognizing textual entailment) in natural language processing, and the machine learning techniques commonly employed to model these semantic phenomena. A unified view of ranking and retrieval will be presented, and the trade-off between model expressiveness, performance, and scalability in model design will be discussed. Part II of the thesis focuses on applying these ranking and retrieval techniques to text: Chapter 3 examines the feasibility of ranking hypotheses given a premise with respect to a human's subjective probability of the hypothesis happening, effectively extending the traditional categorical task of natural language inference. Chapter 4 focuses on detecting situation frames for documents using ranking methods. Then we extend the ranking notion to retrieval, and develop both sparse (Chapter 5) and dense (Chapter 6) vector-based methods to facilitate scalable retrieval for potential answer paragraphs in question answering. Part III turns the focus to mentions and entities in text, while continuing the theme on ranking and retrieval: Chapter 7 discusses the ranking of fine-grained types that an entity mention could belong to, leading to state-of-the-art performance on hierarchical multi-label fine-grained entity typing. Chapter 8 extends the semantic relation of coreference to a cross-document setting, enabling models to retrieve from a large corpus, instead of in a single document, when resolving coreferent entity mentions

Johns Hopkins University

JScholarship

Coping with low data availability for social media crisis message categorisation

Author: Wang Congcong
Publication venue
Publication date: 26/05/2023
Field of study

During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritise them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an ongoing crisis and is expensive in terms of time and labour to create. This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response. It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events (source domain) and adapting it to categorise messages from an ongoing crisis event (target domain). In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed using pre-trained language models. This approach outperforms baselines and an ensemble approach further improves performance..

arXiv.org e-Print Archive

ECIR 2017 workshop on exploitation of social media for emergency relief and preparedness (SMERP 2017)

Author: Chakraborty Tanmoy
Ganguly Debasis
Ghosh Kripabandhu
Ghosh Saptarshi
Jones Gareth JF
Moens Marie-Francine
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

The first international workshop on Exploitation of Social Media for Emergency Relief and Preparedness (SMERP) was held in conjunction with the 2017 European Conference on Information Retrieval (ECIR) in Aberdeen, Scotland, UK. The aim of the workshop was to explore various technologies for extracting useful information from social media content in disaster situations. The workshop included a peer-reviewed research paper track, a data challenge, two keynote talks, and discussion sessions on the relevant open research challenges. This report presents an overview of the workshop, including the motivations behind organizing the workshop, and summaries of the research papers and keynote talks at the workshop. We also reflect on the future directions as inferred from discussion sessions during the workshop.status: publishe

Lirias