32 research outputs found

    On the Stability of Community Detection Algorithms on Longitudinal Citation Data

    Full text link
    There are fundamental differences between citation networks and other classes of graphs. In particular, given that citation networks are directed and acyclic, methods developed primarily for use with undirected social network data may face obstacles. This is particularly true for the dynamic development of community structure in citation networks. Namely, it is neither clear when it is appropriate to employ existing community detection approaches nor is it clear how to choose among existing approaches. Using simulated data, we attempt to clarify the conditions under which one should use existing methods and which of these algorithms is appropriate in a given context. We hope this paper will serve as both a useful guidepost and an encouragement to those interested in the development of more targeted approaches for use with longitudinal citation data.Comment: 17 pages, 7 figures, presenting at Applications of Social Network Analysis 2009, ETH Zurich Edit, August 17, 2009: updated abstract, figures, text clarification

    Accounting for Uncertainty During a Pandemic

    Get PDF
    We discuss several issues of statistical design, data collection, analysis, communication, and decision making that have arisen in recent and ongoing coronavirus studies, focusing on tools for assessment and propagation of uncertainty. This paper does not purport to be a comprehensive survey of the research literature; rather, we use examples to illustrate statistical points that we think are important.Comment: 16 page

    Distance Measures for Dynamic Citation Networks

    Full text link
    Acyclic digraphs arise in many natural and artificial processes. Among the broader set, dynamic citation networks represent a substantively important form of acyclic digraphs. For example, the study of such networks includes the spread of ideas through academic citations, the spread of innovation through patent citations, and the development of precedent in common law systems. The specific dynamics that produce such acyclic digraphs not only differentiate them from other classes of graphs, but also provide guidance for the development of meaningful distance measures. In this article, we develop and apply our sink distance measure together with the single-linkage hierarchical clustering algorithm to both a two-dimensional directed preferential attachment model as well as empirical data drawn from the first quarter century of decisions of the United States Supreme Court. Despite applying the simplest combination of distance measures and clustering algorithms, analysis reveals that more accurate and more interpretable clusterings are produced by this scheme.Comment: 7 pages, 5 figures. Revision: Added application to the network of the first quarter-century of Supreme Court citations. Revision 2: Significantly expanded, includes application on random model as wel

    Reproduction of Hierarchy? A Social Network Analysis of the American Law Professoriate

    Get PDF
    Article published in the Journal of Legal Education

    Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics

    Full text link
    Abstract Background Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of several different data perturbation methods on key spatial statistics and patient privacy. Methods We analyzed the impact of perturbation on spatial patterns in the full set of address-level mortality data from Lawrence, MA during the period from 1911 to 1913. The original death locations were perturbed using seven different published approaches to stochastic and deterministic spatial data anonymization. Key spatial descriptive statistics were calculated for each perturbation, including changes in spatial pattern center, Global Moran’s I, Local Moran’s I, distance to the k-th nearest neighbors, and the L-function (a normalized form of Ripley’s K). A spatially adapted form of k-anonymity was used to measure the privacy protection conferred by each method, and its compliance with HIPAA and GDPR privacy standards. Results Random perturbation at 50 m, donut masking between 5 and 50 m, and Voronoi masking maintain the validity of descriptive spatial statistics better than other perturbations. Grid center masking with both 100 × 100 and 250 × 250 m cells led to large changes in descriptive spatial statistics. None of the perturbation methods adhered to the HIPAA standard that all points have a k-anonymity > 10. All other perturbation methods employed had at least 265 points, or over 6%, not adhering to the HIPAA standard. Conclusions Using the set of published perturbation methods applied in this analysis, HIPAA and GDPR compliant de-identification was not compatible with maintaining key spatial patterns as measured by our chosen summary statistics. Further research should investigate alternate methods to balancing tradeoffs between spatial data privacy and preservation of key patterns in public health data that are of scientific and medical importance.http://deepblue.lib.umich.edu/bitstream/2027.42/173714/1/12942_2020_Article_256.pd

    Has the relationship between wealth and HIV risk in Sub-Saharan Africa changed over time? A temporal, gendered and hierarchical analysis

    No full text
    This study examines the relationship between wealth and HIV infection in Sub-Saharan Africa to determine whether and how this relationship has varied over time, within and across countries, by gender, and urban environment. The analysis draws on DHS and AIS data from 27 Sub-Saharan African countries, which spanned the 14 years between 2003 and 2016. We first use logistic regression analyses to assess the relationship between individual wealth, HIV infection and gender by country and year stratified on urban environment. We then use meta-regression analyses to assess the relationship between country level measures of wealth and the odds of HIV infection by gender and individual level wealth, stratified on urban environment. We find that there is a persistent and positive relationship between wealth and the odds of HIV infection across countries, but that the strength of this association has weakened over time. The rate of attenuation does not appear to differ between urban/rural strata. Likewise, we also find that these associations were most pronounced for women and that this relationship was persistent over the study period and across urban and rural strata. Overall, our findings suggest that the relationship between wealth and HIV infection is beginning to reverse and that in the coming years, the relationship between wealth and HIV infection in Sub-Saharan Africa may more clearly mirror the predominant global picture

    Sociodemographic correlates of greenness within public parks in three U.S. cities

    No full text
    Equitable access to urban vegetation (also known as greenness), an environmental determinant of health, is an important environmental justice issue. Researchers often consider parks a source of greenness but there is a lack of research explicitly investigating the sociodemographic correlates of greenness within city parks. Using a high-resolution landcover dataset, publicly available park boundaries, and American Community Survey data, we investigated the relationship between greenness within urban public parks and the sociodemographic characteristics of surrounding neighborhoods. We found that parks were substantially greener than surrounding neighborhoods and that Black race, Hispanic/Latino ethnicity, and socioeconomic deprivation was associated with less tree canopy and more grass and impervious surface or soil within parks. Public parks are indeed an important source of greenness for urban populations, but the type of vegetation in those spaces depends on the city and sociodemographic characteristics of neighborhoods. Future research and interventions should consider both the type and amount of vegetation within parks
    corecore