661 research outputs found

    The Influence of Frequency, Recency and Semantic Context on the Reuse of Tags in Social Tagging Systems

    Full text link
    In this paper, we study factors that influence tag reuse behavior in social tagging systems. Our work is guided by the activation equation of the cognitive model ACT-R, which states that the usefulness of information in human memory depends on the three factors usage frequency, recency and semantic context. It is our aim to shed light on the influence of these factors on tag reuse. In our experiments, we utilize six datasets from the social tagging systems Flickr, CiteULike, BibSonomy, Delicious, LastFM and MovieLens, covering a range of various tagging settings. Our results confirm that frequency, recency and semantic context positively influence the reuse probability of tags. However, the extent to which each factor individually influences tag reuse strongly depends on the type of folksonomy present in a social tagging system. Our work can serve as guideline for researchers and developers of tag-based recommender systems when designing algorithms for social tagging environments.Comment: Accepted by Hypertext 2016 conference as short pape

    Studying Confirmation Bias in Hashtag Usage on Twitter

    Full text link
    The micro-blogging platform Twitter allows its nearly 320 million monthly active users to build a network of follower connections to other Twitter users (i.e., followees) in order to subscribe to content posted by these users. With this feature, Twitter has become one of the most popular social networks on the Web and was also the first platform that offered the concept of hashtags. Hashtags are freely-chosen keywords, which start with the hash character, to annotate, categorize and contextualize Twitter posts (i.e., tweets). Although hashtags are widely accepted and used by the Twitter community, the heavy reuse of hashtags that are popular in the personal Twitter networks (i.e., own hashtags and hashtags used by followees) can lead to filter bubble effects and thus, to situations, in which only content associated with these hashtags are presented to the user. These filter bubble effects are also highly associated with the concept of confirmation bias, which is the tendency to favor and reuse information that confirms personal preferences. One example would be a Twitter user who is interested in political tweets of US president Donald Trump. Depending on the hashtags used, the user could either be stuck in a pro-Trump (e.g., #MAGA) or contra-Trump (e.g., #fakepresident) filter bubble. Therefore, the goal of this paper is to study confirmation bias and filter bubble effects in hashtag usage on Twitter by treating the reuse of hashtags as a phenomenon that fosters confirmation bias.Comment: Will be presented at European Computational Social Sciences Symposium in Cologne, German

    High Enough? Explaining and Predicting Traveler Satisfaction Using Airline Review

    Full text link
    Air travel is one of the most frequently used means of transportation in our every-day life. Thus, it is not surprising that an increasing number of travelers share their experiences with airlines and airports in form of online reviews on the Web. In this work, we thrive to explain and uncover the features of airline reviews that contribute most to traveler satisfaction. To that end, we examine reviews crawled from the Skytrax air travel review portal. Skytrax provides four review categories to review airports, lounges, airlines and seats. Each review category consists of several five-star ratings as well as free-text review content. In this paper, we conducted a comprehensive feature study and we find that not only five-star rating information such as airport queuing time and lounge comfort highly correlate with traveler satisfaction but also textual features in the form of the inferred review text sentiment. Based on our findings, we created classifiers to predict traveler satisfaction using the best performing rating features. Our results reveal that given our methodology, traveler satisfaction can be predicted with high accuracy. Additionally, we find that training a model on the sentiment of the review text provides a competitive alternative when no five star rating information is available. We believe that our work is of interest for researchers in the area of modeling and predicting user satisfaction based on available review data on the Web.Comment: 5 pages + references, 2 tables, 7 figure

    Assessing the Quality of Web Content

    Full text link
    This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.Comment: 4 pages, ECML/PKDD 2010 Discovery Challenge Worksho

    Trust-Based Collaborative Filtering: Tackling the Cold Start Problem Using Regular Equivalence

    Full text link
    User-based Collaborative Filtering (CF) is one of the most popular approaches to create recommender systems. This approach is based on finding the most relevant k users from whose rating history we can extract items to recommend. CF, however, suffers from data sparsity and the cold-start problem since users often rate only a small fraction of available items. One solution is to incorporate additional information into the recommendation process such as explicit trust scores that are assigned by users to others or implicit trust relationships that result from social connections between users. Such relationships typically form a very sparse trust network, which can be utilized to generate recommendations for users based on people they trust. In our work, we explore the use of a measure from network science, i.e. regular equivalence, applied to a trust network to generate a similarity matrix that is used to select the k-nearest neighbors for recommending items. We evaluate our approach on Epinions and we find that we can outperform related methods for tackling cold-start users in terms of recommendation accuracy

    The Impact of Time on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach

    Full text link
    In our work [KPL17], we study temporal usage patterns of Twitter hashtags, and we use the Base-Level Learning (BLL) equation from the cognitive architecture ACT-R [An04] to model how a person reuses her own, individual hashtags as well as hashtags from her social network. The BLL equation accounts for the time-dependent decay of item exposure in human memory. According to BLL, the usefulness of a piece of information (e.g., a hashtag) is defined by how frequently and how recently it was used in the past, following a time-dependent decay that is best modeled with a power-law distribution. We used the BLL equation in our previous work to recommend tags in social bookmarking systems [KL16]. Here [KPL17], we adopt the BLL equation to model temporal reuse patterns of individual (i.e., reusing own hashtags) and social hashtags (i.e., reusing hashtags, which has been previously used by a followee) and to build a cognitive-inspired hashtag recommendation algorithm. We demonstrate the efficacy of our approach in two empirical social networks crawled from Twitter, i.e., CompSci and Random (for details about the datasets, see [KPL17]). Our results show that our approach can outperform current state-of-the-art hashtag recommendation approaches.Comment: 49. GI-Jahrestagung INFORMATIK 2019, Best of Data Science Trac

    Research Data Explored: Citations versus Altmetrics

    Full text link
    The study explores the citedness of research data, its distribution over time and how it is related to the availability of a DOI (Digital Object Identifier) in Thomson Reuters' DCI (Data Citation Index). We investigate if cited research data "impact" the (social) web, reflected by altmetrics scores, and if there is any relationship between the number of citations and the sum of altmetrics scores from various social media-platforms. Three tools are used to collect and compare altmetrics scores, i.e. PlumX, ImpactStory, and Altmetric.com. In terms of coverage, PlumX is the most helpful altmetrics tool. While research data remain mostly uncited (about 85%), there has been a growing trend in citing data sets published since 2007. Surprisingly, the percentage of the number of cited research data with a DOI in DCI has decreased in the last years. Only nine repositories account for research data with DOIs and two or more citations. The number of cited research data with altmetrics scores is even lower (4 to 9%) but shows a higher coverage of research data from the last decade. However, no correlation between the number of citations and the total number of altmetrics scores is observable. Certain data types (i.e. survey, aggregate data, and sequence data) are more often cited and receive higher altmetrics scores.Comment: Accpeted for publication at the 15th International Conference on Scientometrics and Informetrics (ISSI 2015

    Beyond Accuracy Optimization: On the Value of Item Embeddings for Student Job Recommendations

    Full text link
    In this work, we address the problem of recommending jobs to university students. For this, we explore the utilization of neural item embeddings for the task of content-based recommendation, and we propose to integrate the factors of frequency and recency of interactions with job postings to combine these item embeddings. We evaluate our job recommendation system on a dataset of the Austrian student job portal Studo using prediction accuracy, diversity and an adapted novelty metric. This paper demonstrates that utilizing frequency and recency of interactions with job postings for combining item embeddings results in a robust model with respect to accuracy and diversity, which also provides the best adapted novelty results.Comment: 4 pages, 2 figures, 1 tabl

    Exploring Coverage and Distribution of Identifiers on the Scholarly Web

    Full text link
    In a scientific publishing environment that is increasingly moving online, identifiers of scholarly work are gaining in importance. In this paper, we analysed identifier distribution and coverage of articles from the discipline of quantitative biology using arXiv, Mendeley and CrossRef as data sources. The results show that when retrieving arXiv articles from Mendeley, we were able to find more papers using the DOI than the arXiv ID. This indicates that DOI may be a better identifier with respect to findability. We also find that coverage of articles on Mendeley decreases in the most recent years, whereas the coverage of DOIs does not decrease in the same order of magnitude. This hints at the fact that there is a certain time lag involved, before articles are covered in crowd-sourced services on the scholarly web.Comment: Accepted for publication at the 14th International Symposium of Information Science (ISI 2015

    Modeling Artist Preferences of Users with Different Music Consumption Patterns for Fair Music Recommendations

    Full text link
    Music recommender systems have become central parts of popular streaming platforms such as Last.fm, Pandora, or Spotify to help users find music that fits their preferences. These systems learn from the past listening events of users to recommend music a user will likely listen to in the future. Here, current algorithms typically employ collaborative filtering (CF) utilizing similarities between users' listening behaviors. Some approaches also combine CF with content features into hybrid recommender systems. While music recommender systems can provide quality recommendations to listeners of mainstream music artists, recent research has shown that they tend to discriminate listeners of unorthodox, low-mainstream artists. This is foremost due to the scarcity of usage data of low-mainstream music as music consumption patterns are biased towards popular artists. Thus, the objective of our work is to provide a novel approach for modeling artist preferences of users with different music consumption patterns and listening habits.Comment: EuroCSS'2019 Symposium, Zurich, Switzerlan
    • …
    corecore