76 research outputs found
Measuring social dynamics in a massive multiplayer online game
Quantification of human group-behavior has so far defied an empirical,
falsifiable approach. This is due to tremendous difficulties in data
acquisition of social systems. Massive multiplayer online games (MMOG) provide
a fascinating new way of observing hundreds of thousands of simultaneously
socially interacting individuals engaged in virtual economic activities. We
have compiled a data set consisting of practically all actions of all players
over a period of three years from a MMOG played by 300,000 people. This
large-scale data set of a socio-economic unit contains all social and economic
data from a single and coherent source. Players have to generate a virtual
income through economic activities to `survive' and are typically engaged in a
multitude of social activities offered within the game. Our analysis of
high-frequency log files focuses on three types of social networks, and tests a
series of social-dynamics hypotheses. In particular we study the structure and
dynamics of friend-, enemy- and communication networks. We find striking
differences in topological structure between positive (friend) and negative
(enemy) tie networks. All networks confirm the recently observed phenomenon of
network densification. We propose two approximate social laws in communication
networks, the first expressing betweenness centrality as the inverse square of
the overlap, the second relating communication strength to the cube of the
overlap. These empirical laws provide strong quantitative evidence for the Weak
ties hypothesis of Granovetter. Further, the analysis of triad significance
profiles validates well-established assertions from social balance theory. We
find overrepresentation (underrepresentation) of complete (incomplete) triads
in networks of positive ties, and vice versa for networks of negative ties...Comment: 23 pages 19 figure
Quantifying the impact of weak, strong, and super ties in scientific careers
Scientists are frequently faced with the important decision to start or
terminate a creative partnership. This process can be influenced by strategic
motivations, as early career researchers are pursuers, whereas senior
researchers are typically attractors, of new collaborative opportunities.
Focusing on the longitudinal aspects of scientific collaboration, we analyzed
473 collaboration profiles using an ego-centric perspective which accounts for
researcher-specific characteristics and provides insight into a range of
topics, from career achievement and sustainability to team dynamics and
efficiency. From more than 166,000 collaboration records, we quantify the
frequency distributions of collaboration duration and tie-strength, showing
that collaboration networks are dominated by weak ties characterized by high
turnover rates. We use analytic extreme-value thresholds to identify a new
class of indispensable `super ties', the strongest of which commonly exhibit
>50% publication overlap with the central scientist. The prevalence of super
ties suggests that they arise from career strategies based upon cost, risk, and
reward sharing and complementary skill matching. We then use a combination of
descriptive and panel regression methods to compare the subset of publications
coauthored with a super tie to the subset without one, controlling for
pertinent features such as career age, prestige, team size, and prior group
experience. We find that super ties contribute to above-average productivity
and a 17% citation increase per publication, thus identifying these
partnerships - the analog of life partners - as a major factor in science
career development.Comment: 13 pages, 5 figures, 1 Tabl
The impact of author name disambiguation on knowledge discovery from large-scale scholarly data
In this study, I demonstrate that the choice of disambiguation methods for resolving author name ambiguity can adversely affect our understanding of scholarly collaboration patterns and coauthorship network structures extracted from large-scale scholarly data. By utilizing large-scale bibliometric data, scholars in many fields have gleaned knowledge for use in scholarly evaluation, collaborator recommendations, research policy evaluation, and network-evolution modeling. A common challenge has been that author names in bibliometric data are not properly disambiguated: authors may share the same name (i.e., different authors are sometimes misrepresented to be a single author which can lead to a “merging of identities”). In addition, one author may use name variations (i.e., an author may be represented as two or more different authors which can lead to a “splitting of identities”). When faced with these challenges, most scholars have pre-processed bibliometric data using simple heuristics (e.g., if two author names share the same surname and given name initials, they are presumed to represent the same author identity) and assumed that their findings are robust to errors due to author name ambiguity. I test this long-held assumption in bibliometrics by measuring the impact of author name ambiguity on network properties. I accomplish this under varying conditions, including network size and cumulative time window (from 1991 to 2009) using four large-scale bibliometric datasets that cover: biomedicine, computer science, psychology and neuroscience, and one nation’s entire domestic publication output. For this task, I collate the statistical properties of coauthorship networks constructed from algorithmically disambiguated data (i.e., close to clean data) against those that come from the same networks, but are compromised by misidentified authors via first-initial and all-initials disambiguation methods. In addition, I simulate the levels of merging and splitting incrementally using those empirical datasets. My findings show that initial-based name disambiguation methods can severely distort our understanding of given networks and such distortion gets worse over time. Moreover, the distortion sometimes leads to biased or false knowledge of coauthorship network formation and evolution mechanisms such as preferential attachment generating the power-law distribution of vertex degree and to false validation of theories about the choice of collaborators in scientific research. This may result in ill-informed decisions about research policy and resource allocation. Besides measuring the impact of name ambiguity on network properties, I also test how name ambiguity can be estimated using simple heuristics such as dataset size and how merged author identities can be detected via an author’s ego-network properties to provide a practical guidance for corrective measures. My research calls for further studying the effects of author name ambiguity on coauthorship network properties and is expected to help scholars establish better practices for knowledge discovery from large-scale scholarly data
Author-Based Analysis of Conference versus Journal Publication in Computer Science
Conference publications in computer science (CS) have attracted scholarly
attention due to their unique status as a main research outlet unlike other
science fields where journals are dominantly used for communicating research
findings. One frequent research question has been how different conference and
journal publications are, considering a paper as a unit of analysis. This study
takes an author-based approach to analyze publishing patterns of 517,763
scholars who have ever published both in CS conferences and journals for the
last 57 years, as recorded in DBLP. The analysis shows that the majority of CS
scholars tend to make their scholarly debut, publish more papers, and
collaborate with more coauthors in conferences than in journals. Importantly,
conference papers seem to serve as a distinct channel of scholarly
communication, not a mere preceding step to journal publications: coauthors and
title words of authors across conferences and journals tend not to overlap
much. This study corroborates findings of previous studies on this topic from a
distinctive perspective and suggests that conference authorship in CS calls for
more special attention from scholars and administrators outside CS who have
focused on journal publications to mine authorship data and evaluate scholarly
performance
Theoretical Tools for Network Analysis: Game Theory, Graph Centrality, and Statistical Inference.
A computer-driven data explosion has made the difficulty of interpreting large data sets of interconnected entities ever more salient. My work focuses on theoretical tools for summarizing, analyzing, and understanding network data sets, or data sets of things and their pairwise connections. I address four network science issues, improving our ability to analyze networks from a variety of domains.
I first show that the sophistication of game-theoretic agent decision making can crucially effect network cascades: differing decision making assumptions can lead to dramatically different cascade outcomes. This highlights the importance of diligence when making assumptions about agent behavior on networks and in general. I next analytically demonstrate a significant irregularity in the popular eigenvector centrality, and propose a new spectral centrality measure, nonbacktracking centrality, showing that it avoids this irregularity. This tool contributes a more robust way of ranking nodes, as well as an additional mathematical understanding of the effects of network localization. I next give a new model for uncertain networks, networks in which one has no access to true network data but instead observes only probabilistic information about edge existence. I give a fast maximum-likelihood algorithm for recovering edges and communities in this model, and show that it outperforms a typical approach of thresholding to an unweighted network. This model gives a better tool for understanding and analyzing real-world uncertain networks such as those arising in the experimental sciences. Lastly, I give a new lens for understanding scientific literature, specifically as a hybrid coauthorship and citation network. I use this for exploratory analysis of the Physical Review journals over a hundred-year period, and I make new observations about the interplay between these two networks and how this relationship has changed over time.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133463/1/travisbm_1.pd
The Interdependence of Scientists in the Era of Team Science: An Exploratory Study Using Temporal Network Analysis
How is the rise in team science and the emergence of the research group as the fundamental unit of organization of science affecting scientists’ opportunities to collaborate? Are the majority of scientists becoming dependent on a select subset of their peers to organize the intergroup collaborations that are becoming the norm in science? This dissertation set out to explore the evolving nature of scientists’ interdependence in team-based research environments. The research was motivated by the desire to reconcile emerging views on the organization of scientific collaboration with the theoretical and methodological tendencies to think about and study scientists as autonomous actors who negotiate collaboration in a dyadic manner. Complex Adaptive Social Systems served as the framework for understanding the dynamics involved in the formation of collaborative relationships. Temporal network analysis at the mesoscopic level was used to study the collaboration dynamics of a specific research community, in this case the genomic research community emerging around GenBank, the international nucleotide sequence databank. The investigation into the dynamics of the mesoscopic layer of a scientific collaboration networked revealed the following—(1) there is a prominent half-life to collaborative relationships; (2) the half-life can be used to construct weighted decay networks for extracting the group structure influencing collaboration; (3) scientists across all levels of status are becoming increasingly interdependent, with the qualification that interdependence is highly asymmetrical, and (4) the group structure is increasingly influential on the collaborative interactions of scientists. The results from this study advance theoretical and empirical understanding of scientific collaboration in team-based research environments and methodological approaches to studying temporal networks at the mesoscopic level. The findings also have implications for policy researchers interested in the career cycles of scientists and the maintenance and building of scientific capacity in research areas of national interest
Blockmodeling Techniques for Complex Networks.
The class of network models known as stochastic blockmodels has recently been gaining popularity. In this dissertation, we present new work that uses blockmodels to answer questions about networks. We create a blockmodel based on the idea of link communities, which naturally gives rise to overlapping vertex communities. We derive a fast and accurate algorithm to fit the model to networks. This model can be related to another blockmodel, which allows the method to efficiently find nonoverlapping communities as well. We then create a heuristic based on the link community model whose use is to find the correct number of communities in a network. The heuristic is based on intuitive corrections to likelihood ratio tests. It does a good job finding the correct number of communities in both real networks and synthetic networks generated from the link communities model. Two commonly studied types of networks are citation networks, where research papers cite other papers, and coauthorship networks, where authors are connected if they've written a paper together. We study a multi-modal network from a large dataset of Physics publications that is the combination of the two, allowing for directed links between papers as citations, and an undirected edge between a scientist and a paper if they helped to write it. This allows for new insights on the relation between social interaction and scientific production. We also have the publication dates of papers, which lets us track our measures over time. Finally, we create a stochastic model for ranking vertices in a semi-directed network. The probability of connection between two vertices depends on the difference of their ranks. When this model is fit to high school friendship networks, the ranks appear to correspond with a measure of social status. Students have reciprocated and some unreciprocated edges with other students of closely similar rank that correspond to true friendship, and claim an aspirational friendship with a much higher ranked individual a fraction of the time. In general, students with more friends have higher ranks than those with fewer friends, and older students have higher ranks than younger students.PhDPhysicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108855/1/briball_1.pd
- …