140 research outputs found

    Scale-free network growth by ranking

    Full text link
    Network growth is currently explained through mechanisms that rely on node prestige measures, such as degree or fitness. In many real networks those who create and connect nodes do not know the prestige values of existing nodes, but only their ranking by prestige. We propose a criterion of network growth that explicitly relies on the ranking of the nodes according to any prestige measure, be it topological or not. The resulting network has a scale-free degree distribution when the probability to link a target node is any power law function of its rank, even when one has only partial information of node ranks. Our criterion may explain the frequency and robustness of scale-free degree distributions in real networks, as illustrated by the special case of the Web graph.Comment: 4 pages, 2 figures. We extended the model to account for ranking by arbitrarily distributed fitness. Final version to appear on Physical Review Letter

    Agents, Bookmarks and Clicks: A topical model of Web traffic

    Full text link
    Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in Proceedings of the 21th ACM conference on Hypertext and Hypermedi

    Scholarometer: A Social Framework for Analyzing Impact across Disciplines

    Get PDF
    The use of quantitative metrics to gauge the impact of scholarly publications, authors, and disciplines is predicated on the availability of reliable usage and annotation data. Citation and download counts are widely available from digital libraries. However, current annotation systems rely on proprietary labels, refer to journals but not articles or authors, and are manually curated. To address these limitations, we propose a social framework based on crowdsourced annotations of scholars, designed to keep up with the rapidly evolving disciplinary and interdisciplinary landscape. We describe a system called Scholarometer, which provides a service to scholars by computing citation-based impact measures. This creates an incentive for users to provide disciplinary annotations of authors, which in turn can be used to compute disciplinary metrics. We first present the system architecture and several heuristics to deal with noisy bibliographic and annotation data. We report on data sharing and interactive visualization services enabled by Scholarometer. Usage statistics, illustrating the data collected and shared through the framework, suggest that the proposed crowdsourcing approach can be successful. Secondly, we illustrate how the disciplinary bibliometric indicators elicited by Scholarometer allow us to implement for the first time a universal impact measure proposed in the literature. Our evaluation suggests that this metric provides an effective means for comparing scholarly impact across disciplinary boundaries. © 2012 Kaur et al

    Human dynamics revealed through Web analytics

    Full text link
    When the World Wide Web was first conceived as a way to facilitate the sharing of scientific information at the CERN (European Center for Nuclear Research) few could have imagined the role it would come to play in the following decades. Since then, the increasing ubiquity of Internet access and the frequency with which people interact with it raise the possibility of using the Web to better observe, understand, and monitor several aspects of human social behavior. Web sites with large numbers of frequently returning users are ideal for this task. If these sites belong to companies or universities, their usage patterns can furnish information about the working habits of entire populations. In this work, we analyze the properly anonymized logs detailing the access history to Emory University's Web site. Emory is a medium size university located in Atlanta, Georgia. We find interesting structure in the activity patterns of the domain and study in a systematic way the main forces behind the dynamics of the traffic. In particular, we show that both linear preferential linking and priority based queuing are essential ingredients to understand the way users navigate the Web.Comment: 7 pages, 8 figure

    Clustering and the hyperbolic geometry of complex networks

    Get PDF
    Clustering is a fundamental property of complex networks and it is the mathematical expression of a ubiquitous phenomenon that arises in various types of self-organized networks such as biological networks, computer networks or social networks. In this paper, we consider what is called the global clustering coefficient of random graphs on the hyperbolic plane. This model of random graphs was proposed recently by Krioukov et al. as a mathematical model of complex networks, under the fundamental assumption that hyperbolic geometry underlies the structure of these networks. We give a rigorous analysis of clustering and characterize the global clustering coefficient in terms of the parameters of the model. We show how the global clustering coefficient can be tuned by these parameters and we give an explicit formula for this function.Comment: 51 pages, 1 figur

    Towards the characterization of individual users through Web analytics

    Full text link
    We perform an analysis of the way individual users navigate in the Web. We focus primarily in the temporal patterns of they return to a given page. The return probability as a function of time as well as the distribution of time intervals between consecutive visits are measured and found to be independent of the level of activity of single users. The results indicate a rich variety of individual behaviors and seem to preclude the possibility of defining a characteristic frequency for each user in his/her visits to a single site.Comment: 8 pages, 4 figures. To appear in Proceeding of Complex'0

    Large-scale structural organization of social networks

    Full text link
    The characterization of large-scale structural organization of social networks is an important interdisciplinary problem. We show, by using scaling analysis and numerical computation, that the following factors are relevant for models of social networks: the correlation between friendship ties among people and the position of their social groups, as well as the correlation between the positions of different social groups to which a person belongs.Comment: 5 pages, 3 figures, Revte

    Bridging the demand and the offer in data science

    Get PDF
    During the last several years, we have observed an exponential increase in the demand for Data Scientists in the job market. As a result, a number of trainings, courses, books, and university educational programs (both at undergraduate, graduate and postgraduate levels) have been labeled as “Big data” or “Data Science”; the fil‐rouge of each of them is the aim at forming people with the right competencies and skills to satisfy the business sector needs. In this paper, we report on some of the exercises done in analyzing current Data Science education offer and matching with the needs of the job markets to propose a scalable matching service, ie, COmpetencies ClassificatiOn (E‐CO‐2), based on Data Science techniques. The E‐CO‐2 service can help to extract relevant information from Data Science–related documents (course descriptions, job Ads, blogs, or papers), which enable the comparison of the demand and offer in the field of Data Science Education and HR management, ultimately helping to establish the profession of Data Scientist.publishedVersio
    • 

    corecore