37 research outputs found

    Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia

    Full text link
    Hyperlinks are an essential feature of the World Wide Web. They are especially important for online encyclopedias such as Wikipedia: an article can often only be understood in the context of related articles, and hyperlinks make it easy to explore this context. But important links are often missing, and several methods have been proposed to alleviate this problem by learning a linking model based on the structure of the existing links. Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia's navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality

    Effective and Efficient Similarity Index for Link Prediction of Complex Networks

    Get PDF
    Predictions of missing links of incomplete networks like protein-protein interaction networks or very likely but not yet existent links in evolutionary networks like friendship networks in web society can be considered as a guideline for further experiments or valuable information for web users. In this paper, we introduce a local path index to estimate the likelihood of the existence of a link between two nodes. We propose a network model with controllable density and noise strength in generating links, as well as collect data of six real networks. Extensive numerical simulations on both modeled networks and real networks demonstrated the high effectiveness and efficiency of the local path index compared with two well-known and widely used indices, the common neighbors and the Katz index. Indeed, the local path index provides competitively accurate predictions as the Katz index while requires much less CPU time and memory space, which is therefore a strong candidate for potential practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table

    Empirical analysis of web-based user-object bipartite networks

    Get PDF
    Understanding the structure and evolution of web-based user-object networks is a significant task since they play a crucial role in e-commerce nowadays. This Letter reports the empirical analysis on two large-scale web sites, audioscrobbler.com and del.icio.us, where users are connected with music groups and bookmarks, respectively. The degree distributions and degree-degree correlations for both users and objects are reported. We propose a new index, named collaborative clustering coefficient, to quantify the clustering behavior based on the collaborative selection. Accordingly, the clustering properties and clustering-degree correlations are investigated. We report some novel phenomena well characterizing the selection mechanism of web users and outline the relevance of these phenomena to the information recommendation problem.Comment: 6 pages, 7 figures and 1 tabl

    Predicting Missing Links via Local Information

    Get PDF
    Missing link prediction of networks is of both theoretical interest and practical significance in modern science. In this paper, we empirically investigate a simple framework of link prediction on the basis of node similarity. We compare nine well-known local similarity measures on six real networks. The results indicate that the simplest measure, namely common neighbors, has the best overall performance, and the Adamic-Adar index performs the second best. A new similarity measure, motivated by the resource allocation process taking place on networks, is proposed and shown to have higher prediction accuracy than common neighbors. It is found that many links are assigned same scores if only the information of the nearest neighbors is used. We therefore design another new measure exploited information of the next nearest neighbors, which can remarkably enhance the prediction accuracy.Comment: For International Workshop: "The Physics Approach To Risk: Agent-Based Models and Networks", http://intern.sg.ethz.ch/cost-p10

    An Effective Algorithm for Dimensional Reduction in Collaborative Filtering

    No full text

    Improving Content-based and Hybrid Music Recommendation using Deep Learning

    No full text

    Combining content and sentiment analysis on lyrics for a lightweight emotion-aware Chinese song recommendation system

    No full text
    Traditional music recommendation systems (RS) rely on collaborative filtering technique (CF) to recommend songs or artists in which recommendations are made based on the neighboring analysis of items/users. It is computationally efficient and performs well when the data is ideally full, when there are limited user inputs or few user/item inputs, it immediately lost its competitive advantage. Additionally, traditional RS techniques including content-based one heavily rely on explicit user feedback (e.g. user rating) to generate recommendations. In music/song recommendation, however, implicit feedbacks such as play frequency, play list prevail. Making recommendation on such implicit feedbacks requires efficient and accurate latent factor learning techniques to construct user or item feature space, which is inherently computationally costly. This paper presents a new and lightweight classification model for Chinese song RS based on computational analysis of the lingual part of song lyrics. Through extracting and combining the term frequency and inverse document frequency (tf∗idf) from song lyrics, we construct a composite emotion point matrix for each song which can then be used to further classify songs based on its inherent emotion and make recommendation accordingly
    corecore