Search CORE

328 research outputs found

From XML Retrieval to Semantic Search and Beyond:The INEX, SBS, and MC2 Labs of CLEF 2012-2018

Author: Bogers T.
Geva S.
Kamps J.
Koolen M.
SanJuan E.
Schenkel R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

What you say and how you say it : joint modeling of topics and discourse in microblog conversations

Author: Gao Cuiyun
He Yulan
King Irwin
Li Jing
Lyu Michael
Zeng Jichuan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 18/03/2019
Field of study

This paper presents an unsupervised framework for jointly modeling topic content and discourse behavior in microblog conversations. Concretely, we propose a neural model to discover word clusters indicating what a conversation concerns (i.e., topics) and those reflecting how participants voice their opinions (i.e., discourse).1 Extensive experiments show that our model can yield both coherent topics and meaningful discourse behavior. Further study shows that our topic and discourse representations can benefit the classification of microblog messages, especially when they are jointly trained with the classifier

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Temporal Information Models for Real-Time Microblog Search

Author: Martins Flávio Nuno Fernandes
Publication venue
Publication date: 01/01/2018
Field of study

Real-time search in Twitter and other social media services is often biased towards the most recent results due to the “in the moment” nature of topic trends and their ephemeral relevance to users and media in general. However, “in the moment”, it is often difficult to look at all emerging topics and single-out the important ones from the rest of the social media chatter. This thesis proposes to leverage on external sources to estimate the duration and burstiness of live Twitter topics. It extends preliminary research where itwas shown that temporal re-ranking using external sources could indeed improve the accuracy of results. To further explore this topic we pursued three significant novel approaches: (1) multi-source information analysis that explores behavioral dynamics of users, such as Wikipedia live edits and page view streams, to detect topic trends and estimate the topic interest over time; (2) efficient methods for federated query expansion towards the improvement of query meaning; and (3) exploiting multiple sources towards the detection of temporal query intent. It differs from past approaches in the sense that it will work over real-time queries, leveraging on live user-generated content. This approach contrasts with previous methods that require an offline preprocessing step

Repositório da Universidade Nova de Lisboa

iAggregator: Multidimensional Relevance Aggregation Based on a Fuzzy Operator

Author: Aczel
Ah-Pine
Akritidis
Arrow
Aslam
Barry
Becker
Ben Jabeur
Berardi
Borlund
Bouidghaghen
Box
Breiman
Burges
Cantera
Cao
Carterette
Chen
Cheverst
Choquet
Church
Condorcet
Cong
Cooper
Cooper
Cosijn
Costa Pereira
Costa Pereira
Cuadra
Damak
Daoud
Daoud
Duan
Dubois
Dubois
Dwork
Eickhoff
Farah
Farah
Fishburn
Fox
Gauch
Gerani
Grabisch
Grabisch
Grabisch
Grabisch
Grabisch
Göker
Harter
Hattori
Hawking
Hwang
James
Jankowski
Jansen
Joachims
Keeney
Kendall
Kishida
Kolmogorov
Larkey
Le Calvè
Leung
Liang
Liu
Liu
Ma
Macdonald
Mata
Menger
Metzler
Miyanishi
Mizzaro
Murofushi
Nagmoti
Neumann
Ounis
Ounis
Palacio
Rees
Renda
Saracevic
Saracevic
Saracevic
Saracevic
Schilit
Schweizer
Schweizer
Shapley
Si
Sieg
Smith
Steuer
Su
Su
Taylor
Taylor
Taylor
Torra
Triantaphyllou
Vickery
Vogt
Wei
Wei
Wolfe
Yager
Yau
Publication venue: ASIS&T/Wiley
Publication date: 01/10/2014
Field of study

International audienceRecently, an increasing number of information retrieval studies have triggered a resurgence of interest in redefining the algorithmic estimation of relevance, which implies a shift from topical to multidimensional relevance assessment. A key underlying aspect that emerged when addressing this concept is the aggregation of the relevance assessments related to each of the considered dimensions. The most commonly adopted forms of aggregation are based on classical weighted means and linear combination schemes to address this issue. Although some initiatives were recently proposed, none was concerned with considering the inherent dependencies and interactions existing among the relevance criteria, as is the case in many real-life applications. In this article, we present a new fuzzy-based operator, called iAggregator, for multidimensional relevance aggregation. Its main originality, beyond its ability to model interactions between different relevance criteria, lies in its generalization of many classical aggregation functions. To validate our proposal, we apply our operator within a tweet search task. Experiments using a standard benchmark, namely, Text REtrieval Conference Microblog,1 emphasize the relevance of our contribution when compared with traditional aggregation schemes. In addition, it outperforms state-of-the-art aggregation operators such as the Scoring and the And prioritized operators as well as some representative learning-to-rank algorithms

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Social Media Text Processing and Semantic Analysis for Smart Cities

Author: Pereira João Filipe Figueiredo
Publication venue
Publication date: 14/07/2017
Field of study

With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to gain valuable insights from these large volumes of freely available user generated content. With the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities, we designed and developed a framework that provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modeling, and transportation-specific text classifiers, as well as, aggregation and data visualization. We performed an exploratory data analysis of geo-located tweets in 5 different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and Melbourne, comprising a total of more than 43 million tweets in a period of 3 months. Furthermore, we performed a large scale topic modelling comparison between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are shared between both cities which despite being in the same country are considered very different regarding population, economy and lifestyle. We take advantage of recent developments in word embeddings and train such representations from the collections of geo-located tweets. We then use a combination of bag-of-embeddings and traditional bag-of-words to train travel-related classifiers in both Portuguese and English to filter travel-related content from non-related. We created specific gold-standard data to perform empirical evaluation of the resulting classifiers. Results are in line with research work in other application areas by showing the robustness of using word embeddings to learn word similarities that bag-of-words is not able to capture

arXiv.org e-Print Archive

Repositório Aberto da Universidade do Porto

A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment Analysis Methods

Author: Cheema Gullal S.
Ewerth Ralph
Hakimov Sherzod
Müller-Budack Eric
Publication venue
Publication date: 01/01/2021
Field of study

Opinion and sentiment analysis is a vital task to characterize subjective information in social media posts. In this paper, we present a comprehensive experimental evaluation and comparison with six state-of-the-art methods, from which we have re-implemented one of them. In addition, we investigate different textual and visual feature embeddings that cover different aspects of the content, as well as the recently introduced multimodal CLIP embeddings. Experimental results are presented for two different publicly available benchmark datasets of tweets and corresponding images. In contrast to the evaluation methodology of previous work, we introduce a reproducible and fair evaluation scheme to make results comparable. Finally, we conduct an error analysis to outline the limitations of the methods and possibilities for the future work.Comment: Accepted in Workshop on Multi-ModalPre-Training for Multimedia Understanding (MMPT 2021), co-located with ICMR 202

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik