661 research outputs found
The Influence of Frequency, Recency and Semantic Context on the Reuse of Tags in Social Tagging Systems
In this paper, we study factors that influence tag reuse behavior in social
tagging systems. Our work is guided by the activation equation of the cognitive
model ACT-R, which states that the usefulness of information in human memory
depends on the three factors usage frequency, recency and semantic context. It
is our aim to shed light on the influence of these factors on tag reuse. In our
experiments, we utilize six datasets from the social tagging systems Flickr,
CiteULike, BibSonomy, Delicious, LastFM and MovieLens, covering a range of
various tagging settings. Our results confirm that frequency, recency and
semantic context positively influence the reuse probability of tags. However,
the extent to which each factor individually influences tag reuse strongly
depends on the type of folksonomy present in a social tagging system. Our work
can serve as guideline for researchers and developers of tag-based recommender
systems when designing algorithms for social tagging environments.Comment: Accepted by Hypertext 2016 conference as short pape
Studying Confirmation Bias in Hashtag Usage on Twitter
The micro-blogging platform Twitter allows its nearly 320 million monthly
active users to build a network of follower connections to other Twitter users
(i.e., followees) in order to subscribe to content posted by these users. With
this feature, Twitter has become one of the most popular social networks on the
Web and was also the first platform that offered the concept of hashtags.
Hashtags are freely-chosen keywords, which start with the hash character, to
annotate, categorize and contextualize Twitter posts (i.e., tweets).
Although hashtags are widely accepted and used by the Twitter community, the
heavy reuse of hashtags that are popular in the personal Twitter networks
(i.e., own hashtags and hashtags used by followees) can lead to filter bubble
effects and thus, to situations, in which only content associated with these
hashtags are presented to the user. These filter bubble effects are also highly
associated with the concept of confirmation bias, which is the tendency to
favor and reuse information that confirms personal preferences. One example
would be a Twitter user who is interested in political tweets of US president
Donald Trump. Depending on the hashtags used, the user could either be stuck in
a pro-Trump (e.g., #MAGA) or contra-Trump (e.g., #fakepresident) filter bubble.
Therefore, the goal of this paper is to study confirmation bias and filter
bubble effects in hashtag usage on Twitter by treating the reuse of hashtags as
a phenomenon that fosters confirmation bias.Comment: Will be presented at European Computational Social Sciences Symposium
in Cologne, German
High Enough? Explaining and Predicting Traveler Satisfaction Using Airline Review
Air travel is one of the most frequently used means of transportation in our
every-day life. Thus, it is not surprising that an increasing number of
travelers share their experiences with airlines and airports in form of online
reviews on the Web. In this work, we thrive to explain and uncover the features
of airline reviews that contribute most to traveler satisfaction. To that end,
we examine reviews crawled from the Skytrax air travel review portal. Skytrax
provides four review categories to review airports, lounges, airlines and
seats. Each review category consists of several five-star ratings as well as
free-text review content. In this paper, we conducted a comprehensive feature
study and we find that not only five-star rating information such as airport
queuing time and lounge comfort highly correlate with traveler satisfaction but
also textual features in the form of the inferred review text sentiment. Based
on our findings, we created classifiers to predict traveler satisfaction using
the best performing rating features. Our results reveal that given our
methodology, traveler satisfaction can be predicted with high accuracy.
Additionally, we find that training a model on the sentiment of the review text
provides a competitive alternative when no five star rating information is
available. We believe that our work is of interest for researchers in the area
of modeling and predicting user satisfaction based on available review data on
the Web.Comment: 5 pages + references, 2 tables, 7 figure
Assessing the Quality of Web Content
This paper describes our approach towards the ECML/PKDD Discovery Challenge
2010. The challenge consists of three tasks: (1) a Web genre and facet
classification task for English hosts, (2) an English quality task, and (3) a
multilingual quality task (German and French). In our approach, we create an
ensemble of three classifiers to predict unseen Web hosts whereas each
classifier is trained on a different feature set. Our final NDCG on the whole
test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77
(German) for Task 3, which ranks second place in the ECML/PKDD Discovery
Challenge 2010.Comment: 4 pages, ECML/PKDD 2010 Discovery Challenge Worksho
Trust-Based Collaborative Filtering: Tackling the Cold Start Problem Using Regular Equivalence
User-based Collaborative Filtering (CF) is one of the most popular approaches
to create recommender systems. This approach is based on finding the most
relevant k users from whose rating history we can extract items to recommend.
CF, however, suffers from data sparsity and the cold-start problem since users
often rate only a small fraction of available items. One solution is to
incorporate additional information into the recommendation process such as
explicit trust scores that are assigned by users to others or implicit trust
relationships that result from social connections between users. Such
relationships typically form a very sparse trust network, which can be utilized
to generate recommendations for users based on people they trust. In our work,
we explore the use of a measure from network science, i.e. regular equivalence,
applied to a trust network to generate a similarity matrix that is used to
select the k-nearest neighbors for recommending items. We evaluate our approach
on Epinions and we find that we can outperform related methods for tackling
cold-start users in terms of recommendation accuracy
The Impact of Time on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach
In our work [KPL17], we study temporal usage patterns of Twitter hashtags,
and we use the Base-Level Learning (BLL) equation from the cognitive
architecture ACT-R [An04] to model how a person reuses her own, individual
hashtags as well as hashtags from her social network. The BLL equation accounts
for the time-dependent decay of item exposure in human memory. According to
BLL, the usefulness of a piece of information (e.g., a hashtag) is defined by
how frequently and how recently it was used in the past, following a
time-dependent decay that is best modeled with a power-law distribution. We
used the BLL equation in our previous work to recommend tags in social
bookmarking systems [KL16]. Here [KPL17], we adopt the BLL equation to model
temporal reuse patterns of individual (i.e., reusing own hashtags) and social
hashtags (i.e., reusing hashtags, which has been previously used by a followee)
and to build a cognitive-inspired hashtag recommendation algorithm. We
demonstrate the efficacy of our approach in two empirical social networks
crawled from Twitter, i.e., CompSci and Random (for details about the datasets,
see [KPL17]). Our results show that our approach can outperform current
state-of-the-art hashtag recommendation approaches.Comment: 49. GI-Jahrestagung INFORMATIK 2019, Best of Data Science Trac
Research Data Explored: Citations versus Altmetrics
The study explores the citedness of research data, its distribution over time
and how it is related to the availability of a DOI (Digital Object Identifier)
in Thomson Reuters' DCI (Data Citation Index). We investigate if cited research
data "impact" the (social) web, reflected by altmetrics scores, and if there is
any relationship between the number of citations and the sum of altmetrics
scores from various social media-platforms. Three tools are used to collect and
compare altmetrics scores, i.e. PlumX, ImpactStory, and Altmetric.com. In terms
of coverage, PlumX is the most helpful altmetrics tool. While research data
remain mostly uncited (about 85%), there has been a growing trend in citing
data sets published since 2007. Surprisingly, the percentage of the number of
cited research data with a DOI in DCI has decreased in the last years. Only
nine repositories account for research data with DOIs and two or more
citations. The number of cited research data with altmetrics scores is even
lower (4 to 9%) but shows a higher coverage of research data from the last
decade. However, no correlation between the number of citations and the total
number of altmetrics scores is observable. Certain data types (i.e. survey,
aggregate data, and sequence data) are more often cited and receive higher
altmetrics scores.Comment: Accpeted for publication at the 15th International Conference on
Scientometrics and Informetrics (ISSI 2015
Beyond Accuracy Optimization: On the Value of Item Embeddings for Student Job Recommendations
In this work, we address the problem of recommending jobs to university
students. For this, we explore the utilization of neural item embeddings for
the task of content-based recommendation, and we propose to integrate the
factors of frequency and recency of interactions with job postings to combine
these item embeddings. We evaluate our job recommendation system on a dataset
of the Austrian student job portal Studo using prediction accuracy, diversity
and an adapted novelty metric. This paper demonstrates that utilizing frequency
and recency of interactions with job postings for combining item embeddings
results in a robust model with respect to accuracy and diversity, which also
provides the best adapted novelty results.Comment: 4 pages, 2 figures, 1 tabl
Exploring Coverage and Distribution of Identifiers on the Scholarly Web
In a scientific publishing environment that is increasingly moving online,
identifiers of scholarly work are gaining in importance. In this paper, we
analysed identifier distribution and coverage of articles from the discipline
of quantitative biology using arXiv, Mendeley and CrossRef as data sources. The
results show that when retrieving arXiv articles from Mendeley, we were able to
find more papers using the DOI than the arXiv ID. This indicates that DOI may
be a better identifier with respect to findability. We also find that coverage
of articles on Mendeley decreases in the most recent years, whereas the
coverage of DOIs does not decrease in the same order of magnitude. This hints
at the fact that there is a certain time lag involved, before articles are
covered in crowd-sourced services on the scholarly web.Comment: Accepted for publication at the 14th International Symposium of
Information Science (ISI 2015
Modeling Artist Preferences of Users with Different Music Consumption Patterns for Fair Music Recommendations
Music recommender systems have become central parts of popular streaming
platforms such as Last.fm, Pandora, or Spotify to help users find music that
fits their preferences. These systems learn from the past listening events of
users to recommend music a user will likely listen to in the future. Here,
current algorithms typically employ collaborative filtering (CF) utilizing
similarities between users' listening behaviors. Some approaches also combine
CF with content features into hybrid recommender systems. While music
recommender systems can provide quality recommendations to listeners of
mainstream music artists, recent research has shown that they tend to
discriminate listeners of unorthodox, low-mainstream artists. This is foremost
due to the scarcity of usage data of low-mainstream music as music consumption
patterns are biased towards popular artists. Thus, the objective of our work is
to provide a novel approach for modeling artist preferences of users with
different music consumption patterns and listening habits.Comment: EuroCSS'2019 Symposium, Zurich, Switzerlan
- …