109 research outputs found
The Role of Cores in Recommender Benchmarking for Social Bookmarking Systems
Social bookmarking systems have established themselves as an important part in today’s Web. In such systems, tag recommender systems support users during the posting of a resource by suggesting suitable tags. Tag recommender algorithms have often been evaluated in offline benchmarking experiments. Yet, the particular setup of such experiments has rarely been analyzed. In particular, since the recommendation quality usually suffers from difficulties such as the sparsity of the data or the cold-start problem for new resources or users, datasets have often been pruned to so-called cores (specific subsets of the original datasets), without much consideration of the implications on the benchmarking results. In this article, we generalize the notion of a core by introducing the new notion of a set-core, which is independent of any graph structure, to overcome a structural drawback in the previous constructions of cores on tagging data. We show that problems caused by some types of cores can be eliminated using set-cores. Further, we present a thorough analysis of tag recommender benchmarking setups using cores. To that end, we conduct a large-scale experiment on four real-world datasets, in which we analyze the influence of different cores on the evaluation of recommendation algorithms. We can show that the results of the comparison of different recommendation approaches depends on the selection of core type and level. For the benchmarking of tag recommender algorithms, our results suggest that the evaluation must be set up more carefully and should not be based on one arbitrarily chosen core type and level
Posted, Visited, Exported: Altmetrics in the Social Tagging System BibSonomy
In social tagging systems, like Mendeley, CiteULike, and BibSonomy, users can post, tag, visit, or export scholarly publications. In this paper, we compare citations with metrics derived from users’ activities (altmetrics) in the popular social bookmarking system BibSonomy. Our analysis, using a corpus of more than 250,000 publications published before 2010, reveals that overall, citations and altmetrics in BibSonomy are mildly correlated. Furthermore, grouping publications by user-generated tags results in topic-homogeneous subsets that exhibit higher correlations with citations than the full corpus. We find that posts, exports, and visits of publications are correlated with citations and even bear predictive power over future impact. Machine learning classifiers predict whether the number of citations that a publication receives in a year exceeds the median number of citations in that year, based on the usage counts of the preceding year. In that setup, a Random Forest predictor outperforms the baseline on average by seven percentage points
It’s all about information? The Following Behaviour of Professors and PhD Students on Twitter
In this paper we investigate the role of the academic status in the following behaviour of computer scientists on Twitter. Based on a uses and gratifications perspective, we focus on the activity of a Twitter account and the reciprocity of following relationships. We propose that the account activity addresses the users' information motive only, whereas the user's academic status relates to both the information motive and community development (as in peer networking or career planning). Variables were extracted from Twitter user data. We applied a biographical approach to correctly identify the academic status (professor versus PhD student). We calculated a MANOVA on the influence of the activity of the account and the academic status (on different groups of followers) to differentiate the influence of the information motive versus the motive for community development. Results suggest that for computer scientists Twitter is mainly an information network. However, we found significant effects in the sense of career planning, that is, the accounts of professors had even in the case of low activity a relatively high number of researcher followers -- both PhD followers as well as professor followers. Additionally, there was also some weak evidence for community development gratifications in the sense of peer-networking of professors. Overall, we conclude that the academic use of Twitter is not only about information, but also about career planning and networking
On the Complexity of Shared Conceptualizations
In the Social Web, folksonomies and other similar knowledge
organization techniques may suffer limitations due to both different
users’ tagging behaviours and semantic heterogeneity. In order to estimate
how a social tagging network organizes its resources, focusing on
sharing (implicit) conceptual schemes, we apply an agent-based reconciliation
knowledge system based on Formal Concept Analysis. This article
describes various experiments that focus on conceptual structures of the
reconciliation process as applied to Delicious bookmarking service. Results
will show the prevalence of sharing tagged resources in order to be
used by other users as recommendations.Ministerio de Ciencia e Innovación TIN2009-09492Junta de Andalucía TIC-606
Towards a Soft Evaluation and Refinement of Tagging in Digital Humanities
In this paper we estimate the soundness of tagging in digital repositories
within the field of Digital Humanities by studying the (semantic) conceptual structure
behind the folksnonomy. The use of association rules associated to this conceptual
structure (Stem and Luxenburger basis) allows to faithfully (from a semantic
point of view) complete the tagging (or suggest such a completion).Ministerio de Economía y Competitividad TIN2013-41086-PJunta de Andalucía TIC-606
Tag Recommendation for Large-Scale Ontology-Based Information Systems
We tackle the problem of improving the relevance of automatically selected tags in large-scale ontology-based information systems. Contrary to traditional settings where tags can be chosen arbitrarily, we focus on the problem of recommending tags (e.g., concepts) directly from a collaborative, user-driven ontology. We compare the effectiveness of a series of approaches to select the best tags ranging from traditional IR techniques such as TF/IDF weighting to novel techniques based on ontological distances and latent Dirichlet allocation. All our experiments are run against a real corpus of tags and documents extracted from the ScienceWise portal, which is connected to ArXiv.org and is currently used by growing number of researchers. The datasets for the experiments are made available online for reproducibility purposes
- …