28 research outputs found
Recommended from our members
Estimating Latent Processes on a Network From Indirect Measurements
In a communication network, point-to-point traffic volumes over time are critical for designing protocols that route information efficiently and for maintaining security, whether at the scale of an Internet service provider or within a corporation. While technically feasible, the direct measurement of point-to-point traffic imposes a heavy burden on network performance and is typically not implemented. Instead, indirect aggregate traffic volumes are routinely collected. We consider the problem of estimating point-to-point traffic volumes, , from aggregate traffic volumes, , given information about the network routing protocol encoded in a matrix A. This estimation task can be reformulated as finding the solutions to a sequence of ill-posed linear inverse problems, , since the number of origin-destination routes of interest is higher than the number of aggregate measurements available.
Here, we introduce a novel multilevel state-space model (SSM) of aggregate traffic volumes with realistic features. We implement a naïve strategy for estimating unobserved point-to-point traffic volumes from indirect measurements of aggregate traffic, based on particle filtering. We then develop a more efficient two-stage inference strategy that relies on model-based regularization: a simple model is used to calibrate regularization parameters that lead to efficient/scalable inference in the multilevel SSM. We apply our methods to corporate and academic networks, where we show that the proposed inference strategy outperforms existing approaches and scales to larger networks. We also design a simulation study to explore the factors that influence the performance. Our results suggest that model-based regularization may be an efficient strategy for inference in other complex multilevel models. Supplementary materials for this article are available online.Statistic
Recommended from our members
Multi-way blockmodels for analyzing coordinated high-dimensional responses
We consider the problem of quantifying temporal coordination between multiple high-dimensional responses. We introduce a family of multi-way stochastic blockmodels suited for this problem, which avoids preprocessing steps such as binning and thresholding com- monly adopted for this type of data, in biology. We develop two in- ference procedures based on collapsed Gibbs sampling and variational methods. We provide a thorough evaluation of the proposed methods on simulated data, in terms of membership and blockmodel estima- tion, predictions out-of-sample and run-time. We also quantify the effects of censoring procedures such as binning and thresholding on the estimation tasks. We use these models to carry out an empirical analysis of the functional mechanisms driving the coordination be- tween gene expression and metabolite concentrations during carbon and nitrogen starvation, in S. cerevisiae.Statistic
A natural experiment of social network formation and dynamics
10.1073/pnas.1404770112Proceedings of the National Academy of Sciences of the United States of America112216595-660
Recommended from our members
A Survey of Statistical Network Models
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active “network community” and a substantial liter- ature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning net- work literature in statistical physics and computer science. The growthof the World Wide Web and the emergence of online “networking com- munities” such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize for- mal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Statistic
Recommended from our members
Asymptotic and finite-sample properties of estimators based on stochastic gradients
Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statis- tical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key param- eters. Here, we introduce implicit stochastic gradient descent proce- dures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient de- scent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly com- puted; thus, implicit procedures increase stability without increas- ing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both stan- dard and implicit stochastic gradient descent-based estimators, in- cluding finite-sample error bounds. Importantly, analytical expres- sions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to be- come a workhorse for approximate inference from large data sets.Statistic
Recommended from our members
Whose Ideas? Whose Words? Authorship of Ronald Reagan's Radio Addresses
Statistic
Recommended from our members
Who wrote Ronald Reagan's radio addresses?
In his campaign for the U.S. presidency from 1975 to 1979, Ronald Reagan delivered over 1000 radio broadcasts. For over 600 of these we have direct evidence of Reagan's authorship. The aim of this study was to determine the authorship of 312 of the broadcasts for which no direct evidence is available. We addressed the prediction problem for speeches delivered in different epochs and we explored a wide range of off-the-shelf classification methods and fully Bayesian generative models. Eventually we produced separate sets of predictions using the most accurate classifiers, based on non-contextual words as well as on semantic features, for the 312 speeches of uncertain authorship. All the predictions agree on 135 of the "unknown" speeches, whereas the fully Bayesian models agree on an additional 154 of them.
The magnitude of the posterior odds of authorship led us to conclude that Ronald Reagan drafted 167 speeches and was aided in the preparation of the remaining 145. Our inferences were not sensitive to "reasonable" variations in the sets of constants underlying the prior distributions, and the cross-validated accuracy of our best fully Bayesian model was above 90 percent in all cases. The agreement of multiple methods for predicting the authorship for the "unknown" speeches reinforced our confidence in the accuracy of our classifications.Statistic
Recommended from our members
Predicting traffic volumes and estimating the effects of shocks in massive transportation systems
Public transportation systems are an essential component of major cities. The widespread use of smart cards for automated fare collection in these systems offers a unique opportunity to under- stand passenger behavior at a massive scale. In this study, we use network-wide data obtained from smart cards in the London transport system to predict future traffic volumes, and to estimate the effects of disruptions due to unplanned closures of stations or lines. Disruptions, or shocks, force passengers to make different decisions concerning which stations to enter or exit. We describe how these changes in passenger behavior lead to possible over- crowding and model how stations will be affected by given disruptions. This information can then be used to mitigate the effects of these shocks because transport authorities may prepare in advance alternative solutions such as additional buses near the most affected stations. We describe statistical methods that lever- age the large amount of smart-card data collected under the natural state of the system, where no shocks take place, as variables that are indicative of behavior under disruptions. We find that features extracted from the natural regime data can be successfully exploited to describe different disruption regimes, and that our framework can be used as a general tool for any similar complex transporta- tion system.Statistic
Reliability assessment of ultrasound muscle echogenicity in patients with rheumatic diseases: Results of a multicenter international web-based study
ObjectivesTo investigate the inter/intra-reliability of ultrasound (US) muscle echogenicity in patients with rheumatic diseases.MethodsForty-two rheumatologists and 2 radiologists from 13 countries were asked to assess US muscle echogenicity of quadriceps muscle in 80 static images and 20 clips from 64 patients with different rheumatic diseases and 8 healthy subjects. Two visual scales were evaluated, a visual semi-quantitative scale (0–3) and a continuous quantitative measurement (“VAS echogenicity,” 0–100). The same assessment was repeated to calculate intra-observer reliability. US muscle echogenicity was also calculated by an independent research assistant using a software for the analysis of scientific images (ImageJ). Inter and intra reliabilities were assessed by means of prevalence-adjusted bias-adjusted Kappa (PABAK), intraclass correlation coefficient (ICC) and correlations through Kendall’s Tau and Pearson’s Rho coefficients.ResultsThe semi-quantitative scale showed a moderate inter-reliability [PABAK = 0.58 (0.57–0.59)] and a substantial intra-reliability [PABAK = 0.71 (0.68–0.73)]. The lowest inter and intra-reliability results were obtained for the intermediate grades (i.e., grade 1 and 2) of the semi-quantitative scale. “VAS echogenicity” showed a high reliability both in the inter-observer [ICC = 0.80 (0.75–0.85)] and intra-observer [ICC = 0.88 (0.88–0.89)] evaluations. A substantial association was found between the participants assessment of the semi-quantitative scale and “VAS echogenicity” [ICC = 0.52 (0.50–0.54)]. The correlation between these two visual scales and ImageJ analysis was high (tau = 0.76 and rho = 0.89, respectively).ConclusionThe results of this large, multicenter study highlighted the overall good inter and intra-reliability of the US assessment of muscle echogenicity in patients with different rheumatic diseases