32 research outputs found
On the Stability of Community Detection Algorithms on Longitudinal Citation Data
There are fundamental differences between citation networks and other classes
of graphs. In particular, given that citation networks are directed and
acyclic, methods developed primarily for use with undirected social network
data may face obstacles. This is particularly true for the dynamic development
of community structure in citation networks. Namely, it is neither clear when
it is appropriate to employ existing community detection approaches nor is it
clear how to choose among existing approaches. Using simulated data, we attempt
to clarify the conditions under which one should use existing methods and which
of these algorithms is appropriate in a given context. We hope this paper will
serve as both a useful guidepost and an encouragement to those interested in
the development of more targeted approaches for use with longitudinal citation
data.Comment: 17 pages, 7 figures, presenting at Applications of Social Network
Analysis 2009, ETH Zurich Edit, August 17, 2009: updated abstract, figures,
text clarification
Accounting for Uncertainty During a Pandemic
We discuss several issues of statistical design, data collection, analysis,
communication, and decision making that have arisen in recent and ongoing
coronavirus studies, focusing on tools for assessment and propagation of
uncertainty. This paper does not purport to be a comprehensive survey of the
research literature; rather, we use examples to illustrate statistical points
that we think are important.Comment: 16 page
Distance Measures for Dynamic Citation Networks
Acyclic digraphs arise in many natural and artificial processes. Among the
broader set, dynamic citation networks represent a substantively important form
of acyclic digraphs. For example, the study of such networks includes the
spread of ideas through academic citations, the spread of innovation through
patent citations, and the development of precedent in common law systems. The
specific dynamics that produce such acyclic digraphs not only differentiate
them from other classes of graphs, but also provide guidance for the
development of meaningful distance measures. In this article, we develop and
apply our sink distance measure together with the single-linkage hierarchical
clustering algorithm to both a two-dimensional directed preferential attachment
model as well as empirical data drawn from the first quarter century of
decisions of the United States Supreme Court. Despite applying the simplest
combination of distance measures and clustering algorithms, analysis reveals
that more accurate and more interpretable clusterings are produced by this
scheme.Comment: 7 pages, 5 figures. Revision: Added application to the network of the
first quarter-century of Supreme Court citations. Revision 2: Significantly
expanded, includes application on random model as wel
Reproduction of Hierarchy? A Social Network Analysis of the American Law Professoriate
Article published in the Journal of Legal Education
Recommended from our members
A mechanistic spatio-temporal framework for modelling individual-to-individual transmission—With an application to the 2014-2015 West Africa Ebola outbreak
In recent years there has been growing availability of individual-level spatio-temporal disease data, particularly due to the use of modern communicating devices with GPS tracking functionality. These detailed data have been proven useful for inferring disease transmission to a more refined level than previously. However, there remains a lack of statistically sound frameworks to model the underlying transmission dynamic in a mechanistic manner. Such a development is particularly crucial for enabling a general epidemic predictive framework at the individual level. In this paper we propose a new statistical framework for mechanistically modelling individual-to-individual disease transmission in a landscape with heterogeneous population density. Our methodology is first tested using simulated datasets, validating our inferential machinery. The methodology is subsequently applied to data that describes a regional Ebola outbreak in Western Africa (2014-2015). Our results show that the methods are able to obtain estimates of key epidemiological parameters that are broadly consistent with the literature, while revealing a significantly shorter distance of transmission. More importantly, in contrast to existing approaches, we are able to perform a more general model prediction that takes into account the susceptible population. Finally, our results show that, given reasonable scenarios, the framework can be an effective surrogate for susceptible-explicit individual models which are often computationally challenging
Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
Abstract
Background
Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of several different data perturbation methods on key spatial statistics and patient privacy.
Methods
We analyzed the impact of perturbation on spatial patterns in the full set of address-level mortality data from Lawrence, MA during the period from 1911 to 1913. The original death locations were perturbed using seven different published approaches to stochastic and deterministic spatial data anonymization. Key spatial descriptive statistics were calculated for each perturbation, including changes in spatial pattern center, Global Moran’s I, Local Moran’s I, distance to the k-th nearest neighbors, and the L-function (a normalized form of Ripley’s K). A spatially adapted form of k-anonymity was used to measure the privacy protection conferred by each method, and its compliance with HIPAA and GDPR privacy standards.
Results
Random perturbation at 50 m, donut masking between 5 and 50 m, and Voronoi masking maintain the validity of descriptive spatial statistics better than other perturbations. Grid center masking with both 100 × 100 and 250 × 250 m cells led to large changes in descriptive spatial statistics. None of the perturbation methods adhered to the HIPAA standard that all points have a k-anonymity > 10. All other perturbation methods employed had at least 265 points, or over 6%, not adhering to the HIPAA standard.
Conclusions
Using the set of published perturbation methods applied in this analysis, HIPAA and GDPR compliant de-identification was not compatible with maintaining key spatial patterns as measured by our chosen summary statistics. Further research should investigate alternate methods to balancing tradeoffs between spatial data privacy and preservation of key patterns in public health data that are of scientific and medical importance.http://deepblue.lib.umich.edu/bitstream/2027.42/173714/1/12942_2020_Article_256.pd
Has the relationship between wealth and HIV risk in Sub-Saharan Africa changed over time? A temporal, gendered and hierarchical analysis
This study examines the relationship between wealth and HIV infection in Sub-Saharan Africa to determine whether and how this relationship has varied over time, within and across countries, by gender, and urban environment. The analysis draws on DHS and AIS data from 27 Sub-Saharan African countries, which spanned the 14 years between 2003 and 2016. We first use logistic regression analyses to assess the relationship between individual wealth, HIV infection and gender by country and year stratified on urban environment. We then use meta-regression analyses to assess the relationship between country level measures of wealth and the odds of HIV infection by gender and individual level wealth, stratified on urban environment. We find that there is a persistent and positive relationship between wealth and the odds of HIV infection across countries, but that the strength of this association has weakened over time. The rate of attenuation does not appear to differ between urban/rural strata. Likewise, we also find that these associations were most pronounced for women and that this relationship was persistent over the study period and across urban and rural strata. Overall, our findings suggest that the relationship between wealth and HIV infection is beginning to reverse and that in the coming years, the relationship between wealth and HIV infection in Sub-Saharan Africa may more clearly mirror the predominant global picture
Sociodemographic correlates of greenness within public parks in three U.S. cities
Equitable access to urban vegetation (also known as greenness), an environmental determinant of health, is an important environmental justice issue. Researchers often consider parks a source of greenness but there is a lack of research explicitly investigating the sociodemographic correlates of greenness within city parks. Using a high-resolution landcover dataset, publicly available park boundaries, and American Community Survey data, we investigated the relationship between greenness within urban public parks and the sociodemographic characteristics of surrounding neighborhoods. We found that parks were substantially greener than surrounding neighborhoods and that Black race, Hispanic/Latino ethnicity, and socioeconomic deprivation was associated with less tree canopy and more grass and impervious surface or soil within parks. Public parks are indeed an important source of greenness for urban populations, but the type of vegetation in those spaces depends on the city and sociodemographic characteristics of neighborhoods. Future research and interventions should consider both the type and amount of vegetation within parks