142 research outputs found
Scale-free network growth by ranking
Network growth is currently explained through mechanisms that rely on node
prestige measures, such as degree or fitness. In many real networks those who
create and connect nodes do not know the prestige values of existing nodes, but
only their ranking by prestige. We propose a criterion of network growth that
explicitly relies on the ranking of the nodes according to any prestige
measure, be it topological or not. The resulting network has a scale-free
degree distribution when the probability to link a target node is any power law
function of its rank, even when one has only partial information of node ranks.
Our criterion may explain the frequency and robustness of scale-free degree
distributions in real networks, as illustrated by the special case of the Web
graph.Comment: 4 pages, 2 figures. We extended the model to account for ranking by
arbitrarily distributed fitness. Final version to appear on Physical Review
Letter
Agents, Bookmarks and Clicks: A topical model of Web traffic
Analysis of aggregate and individual Web traffic has shown that PageRank is a
poor model of how people navigate the Web. Using the empirical traffic patterns
generated by a thousand users, we characterize several properties of Web
traffic that cannot be reproduced by Markovian models. We examine both
aggregate statistics capturing collective behavior, such as page and link
traffic, and individual statistics, such as entropy and session size. No model
currently explains all of these empirical observations simultaneously. We show
that all of these traffic patterns can be explained by an agent-based model
that takes into account several realistic browsing behaviors. First, agents
maintain individual lists of bookmarks (a non-Markovian memory mechanism) that
are used as teleportation targets. Second, agents can retreat along visited
links, a branching mechanism that also allows us to reproduce behaviors such as
the use of a back button and tabbed browsing. Finally, agents are sustained by
visiting novel pages of topical interest, with adjacent pages being more
topically related to each other than distant ones. This modulates the
probability that an agent continues to browse or starts a new session, allowing
us to recreate heterogeneous session lengths. The resulting model is capable of
reproducing the collective and individual behaviors we observe in the empirical
data, reconciling the narrowly focused browsing patterns of individual users
with the extreme heterogeneity of aggregate traffic measurements. This result
allows us to identify a few salient features that are necessary and sufficient
to interpret the browsing patterns observed in our data. In addition to the
descriptive and explanatory power of such a model, our results may lead the way
to more sophisticated, realistic, and effective ranking and crawling
algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in
Proceedings of the 21th ACM conference on Hypertext and Hypermedi
Scholarometer: A Social Framework for Analyzing Impact across Disciplines
The use of quantitative metrics to gauge the impact of scholarly publications, authors, and disciplines is predicated on the availability of reliable usage and annotation data. Citation and download counts are widely available from digital libraries. However, current annotation systems rely on proprietary labels, refer to journals but not articles or authors, and are manually curated. To address these limitations, we propose a social framework based on crowdsourced annotations of scholars, designed to keep up with the rapidly evolving disciplinary and interdisciplinary landscape. We describe a system called Scholarometer, which provides a service to scholars by computing citation-based impact measures. This creates an incentive for users to provide disciplinary annotations of authors, which in turn can be used to compute disciplinary metrics. We first present the system architecture and several heuristics to deal with noisy bibliographic and annotation data. We report on data sharing and interactive visualization services enabled by Scholarometer. Usage statistics, illustrating the data collected and shared through the framework, suggest that the proposed crowdsourcing approach can be successful. Secondly, we illustrate how the disciplinary bibliometric indicators elicited by Scholarometer allow us to implement for the first time a universal impact measure proposed in the literature. Our evaluation suggests that this metric provides an effective means for comparing scholarly impact across disciplinary boundaries. © 2012 Kaur et al
Human dynamics revealed through Web analytics
When the World Wide Web was first conceived as a way to facilitate the
sharing of scientific information at the CERN (European Center for Nuclear
Research) few could have imagined the role it would come to play in the
following decades. Since then, the increasing ubiquity of Internet access and
the frequency with which people interact with it raise the possibility of using
the Web to better observe, understand, and monitor several aspects of human
social behavior. Web sites with large numbers of frequently returning users are
ideal for this task. If these sites belong to companies or universities, their
usage patterns can furnish information about the working habits of entire
populations. In this work, we analyze the properly anonymized logs detailing
the access history to Emory University's Web site. Emory is a medium size
university located in Atlanta, Georgia. We find interesting structure in the
activity patterns of the domain and study in a systematic way the main forces
behind the dynamics of the traffic. In particular, we show that both linear
preferential linking and priority based queuing are essential ingredients to
understand the way users navigate the Web.Comment: 7 pages, 8 figure
Clustering and the hyperbolic geometry of complex networks
Clustering is a fundamental property of complex networks and it is the
mathematical expression of a ubiquitous phenomenon that arises in various types
of self-organized networks such as biological networks, computer networks or
social networks. In this paper, we consider what is called the global
clustering coefficient of random graphs on the hyperbolic plane. This model of
random graphs was proposed recently by Krioukov et al. as a mathematical model
of complex networks, under the fundamental assumption that hyperbolic geometry
underlies the structure of these networks. We give a rigorous analysis of
clustering and characterize the global clustering coefficient in terms of the
parameters of the model. We show how the global clustering coefficient can be
tuned by these parameters and we give an explicit formula for this function.Comment: 51 pages, 1 figur
Towards the characterization of individual users through Web analytics
We perform an analysis of the way individual users navigate in the Web. We
focus primarily in the temporal patterns of they return to a given page. The
return probability as a function of time as well as the distribution of time
intervals between consecutive visits are measured and found to be independent
of the level of activity of single users. The results indicate a rich variety
of individual behaviors and seem to preclude the possibility of defining a
characteristic frequency for each user in his/her visits to a single site.Comment: 8 pages, 4 figures. To appear in Proceeding of Complex'0
Large-scale structural organization of social networks
The characterization of large-scale structural organization of social
networks is an important interdisciplinary problem. We show, by using scaling
analysis and numerical computation, that the following factors are relevant for
models of social networks: the correlation between friendship ties among people
and the position of their social groups, as well as the correlation between the
positions of different social groups to which a person belongs.Comment: 5 pages, 3 figures, Revte
Bridging the demand and the offer in data science
During the last several years, we have observed an exponential increase in the demand for Data Scientists in the job market. As a result, a number of trainings, courses, books, and university educational programs (both at undergraduate, graduate and postgraduate levels) have been labeled as âBig dataâ or âData Scienceâ; the filârouge of each of them is the aim at forming people with the right competencies and skills to satisfy the business sector needs. In this paper, we report on some of the exercises done in analyzing current Data Science education offer and matching with the needs of the job markets to propose a scalable matching service, ie, COmpetencies ClassificatiOn (EâCOâ2), based on Data Science techniques. The EâCOâ2 service can help to extract relevant information from Data Scienceârelated documents (course descriptions, job Ads, blogs, or papers), which enable the comparison of the demand and offer in the field of Data Science Education and HR management, ultimately helping to establish the profession of Data Scientist.publishedVersio
- âŠ