140 research outputs found
Time-based Microblog Distillation
This paper presents a simple approach for identifying relevant and reliable news from the Twitter stream, as soon as they emerge. The approach is based on a near-real time systems for sentiment analysis on Twitter, implemented by Fondazione Ugo Bordoni, and properly modified in order to detect the most representative tweets in a specified time slot.
This work represents a first step towards the implementation of a prototype supporting journalists in discovering and finding news on
Twitter
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Despite substantial interest in applications of neural networks to
information retrieval, neural ranking models have only been applied to standard
ad hoc retrieval tasks over web pages and newswire documents. This paper
proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network)
a novel neural ranking model specifically designed for ranking short social
media posts. We identify document length, informal language, and heterogeneous
relevance signals as features that distinguish documents in our domain, and
present a model specifically designed with these characteristics in mind. Our
model uses hierarchical convolutional layers to learn latent semantic
soft-match relevance signals at the character, word, and phrase levels. A
pooling-based similarity measurement layer integrates evidence from multiple
types of matches between the query, the social media post, as well as URLs
contained in the post. Extensive experiments using Twitter data from the TREC
Microblog Tracks 2011--2014 show that our model significantly outperforms prior
feature-based as well and existing neural ranking models. To our best
knowledge, this paper presents the first substantial work tackling search over
social media posts using neural ranking models.Comment: AAAI 2019, 10 page
#Precision: An Exploration of the Utility of User-Generated Metadata for the Creation of Precise Microblog Query-Expansion Systems
Twitter research provides a unique opportunity to answer fundamental questions regarding the best methods for the large-scale retrieval of extremely sparse documents. This study examines the utility of user-generated metadata expansion candidate terms for the creation of precise microblog search engines. Several search engines were created utilizing different genres of candidate expansion terms, confidence thresholds, and document parameters to explore this issue. This study demonstrates that user-generated metadata has utility for the precise retrieval of terse queries with high levels of associated conversation, such as movie awards or current events, but performs poorly on textually rich queries with lower levels of perceived conversation.Master of Science in Information Scienc
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
PREDICTION IN SOCIAL MEDIA FOR MONITORING AND RECOMMENDATION
Social media including blogs and microblogs provide a rich window into user online activity. Monitoring social media datasets can be expensive due to the scale and inherent noise in such data streams. Monitoring and prediction can provide significant benefit for many applications including brand monitoring and making recommendations. Consider a focal topic and posts on multiple blog channels on this topic. Being able to target a few potentially influential blog channels which will contain relevant posts is valuable. Once these channels have been identified, a user can proactively join the conversation themselves to encourage positive word-of-mouth and to mitigate negative word-of-mouth.
Links between different blog channels, and retweets and mentions between different microblog users, are a proxy of information flow and influence. When trying to monitor where information will flow and who will be influenced by a focal user, it is valuable to predict future links, retweets and mentions. Predictions of users who will post on a focal topic or who will be influenced by a focal user can yield valuable recommendations.
In this thesis we address the problem of prediction in social media to select social media channels for monitoring and recommendation. Our analysis focuses on individual authors and linkers. We address a series of prediction problems including future author prediction problem and future link prediction problem in the blogosphere, as well as prediction in microblogs such as twitter.
For the future author prediction in the blogosphere, where there are network properties and content properties, we develop prediction methods inspired by information retrieval approaches that use historical posts in the blog channel for prediction. We also train a ranking support vector machine (SVM) to solve the problem, considering both network properties and content properties. We identify a number of features which have impact on prediction accuracy. For the future link prediction in the blogosphere, we compare multiple link prediction methods, and show that our proposed solution which combines the network properties of the blog with content properties does better than methods which examine network properties or content properties in isolation. Most of the previous work has only looked at either one or the other. For the prediction in microblogs, where there are follower network, retweet network, and mention network, we propose a prediction model to utilize the hybrid network for prediction. In this model, we define a potential function that reflects the likelihood of a candidate user having a specific type of link to a focal user in the future and identify an optimization problem by the principle of maximum likelihood to determine the parameters in the model. We propose different approximate approaches based on the prediction model. Our approaches are demonstrated to outperform the baseline methods which only consider one network or utilize hybrid networks in a naive way. The prediction model can be applied to other similar problems where hybrid networks exist
A Survey of Graph Neural Networks for Social Recommender Systems
Social recommender systems (SocialRS) simultaneously leverage user-to-item
interactions as well as user-to-user social relations for the task of
generating item recommendations to users. Additionally exploiting social
relations is clearly effective in understanding users' tastes due to the
effects of homophily and social influence. For this reason, SocialRS has
increasingly attracted attention. In particular, with the advance of Graph
Neural Networks (GNN), many GNN-based SocialRS methods have been developed
recently. Therefore, we conduct a comprehensive and systematic review of the
literature on GNN-based SocialRS. In this survey, we first identify 80 papers
on GNN-based SocialRS after annotating 2151 papers by following the PRISMA
framework (Preferred Reporting Items for Systematic Reviews and Meta-Analysis).
Then, we comprehensively review them in terms of their inputs and architectures
to propose a novel taxonomy: (1) input taxonomy includes 5 groups of input type
notations and 7 groups of input representation notations; (2) architecture
taxonomy includes 8 groups of GNN encoder, 2 groups of decoder, and 12 groups
of loss function notations. We classify the GNN-based SocialRS methods into
several categories as per the taxonomy and describe their details. Furthermore,
we summarize the benchmark datasets and metrics widely used to evaluate the
GNN-based SocialRS methods. Finally, we conclude this survey by presenting some
future research directions.Comment: GitHub repository with the curated list of papers:
https://github.com/claws-lab/awesome-GNN-social-recsy
Automatic extraction of mobility activities in microblogs
Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
Sentiment analysis and real-time microblog search
This thesis sets out to examine the role played by sentiment in real-time microblog search. The recent prominence of the real-time web is proving both challenging and disruptive for a number of areas of research, notably information retrieval and web data mining. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user query at a given point in time, automated methods are required to enable users to sift through this information. As an area of research reaching maturity, sentiment analysis offers a promising direction for modelling the text content in microblog streams.
In this thesis we review the real-time web as a new area of focus for sentiment analysis, with a specific focus on microblogging. We propose a system and method for evaluating the effect of sentiment on perceived search quality in real-time microblog search scenarios. Initially we provide an evaluation of sentiment analysis using supervised learning for classi- fying the short, informal content in microblog posts. We then evaluate our sentiment-based filtering system for microblog search in a user study with simulated real-time scenarios. Lastly, we conduct real-time user studies for the live broadcast of the popular television programme, the X Factor, and for the Leaders Debate during the Irish General Election. We find that we are able to satisfactorily classify positive, negative and neutral sentiment in microblog posts. We also find a significant role played by sentiment in many microblog search scenarios, observing some detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users’ prior topic sentiment
Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models
Neural text ranking models have witnessed significant advancement and are
increasingly being deployed in practice. Unfortunately, they also inherit
adversarial vulnerabilities of general neural models, which have been detected
but remain underexplored by prior studies. Moreover, the inherit adversarial
vulnerabilities might be leveraged by blackhat SEO to defeat better-protected
search engines. In this study, we propose an imitation adversarial attack on
black-box neural passage ranking models. We first show that the target passage
ranking model can be transparentized and imitated by enumerating critical
queries/candidates and then train a ranking imitation model. Leveraging the
ranking imitation model, we can elaborately manipulate the ranking results and
transfer the manipulation attack to the target ranking model. For this purpose,
we propose an innovative gradient-based attack method, empowered by the
pairwise objective function, to generate adversarial triggers, which causes
premeditated disorderliness with very few tokens. To equip the trigger
camouflages, we add the next sentence prediction loss and the language model
fluency constraint to the objective function. Experimental results on passage
ranking demonstrate the effectiveness of the ranking imitation attack model and
adversarial triggers against various SOTA neural ranking models. Furthermore,
various mitigation analyses and human evaluation show the effectiveness of
camouflages when facing potential mitigation approaches. To motivate other
scholars to further investigate this novel and important problem, we make the
experiment data and code publicly available.Comment: 15 pages, 4 figures, accepted by ACM CCS 2022, Best Paper Nominatio
PARADE: Passage Representation Aggregation for Document Reranking
We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document relevance score, overcoming the limitations of previous approaches that perform inference on passages independently. Experiments on two ad-hoc retrieval benchmarks demonstrate PARADE's effectiveness over such methods. We conduct extensive analyses on PARADE's efficiency, highlighting several strategies for improving it. When combined with knowledge distillation, a PARADE model with 72\% fewer parameters achieves effectiveness competitive with previous approaches using BERT-Base. Our code is available at \url{https://github.com/canjiali/PARADE}
- …