613 research outputs found

    Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models

    Get PDF
    Motivated by a real-life problem of sharing social network data that contain sensitive personal information, we propose a novel approach to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. A case study using a version of the Enron e-mail corpus dataset demonstrates the application and usefulness of the proposed techniques in solving the challenging problem of maintaining privacy \emph{and} supporting open access to network data to ensure reproducibility of existing studies and discovering new scientific insights that can be obtained by analyzing such data. We use a simple yet effective randomized response mechanism to generate synthetic networks under ϵ\epsilon-edge differential privacy, and then use likelihood based inference for missing data and Markov chain Monte Carlo techniques to fit exponential-family random graph models to the generated synthetic networks.Comment: Updated, 39 page

    Differentially Private Model Selection with Penalized and Constrained Likelihood

    Full text link
    In statistical disclosure control, the goal of data analysis is twofold: The released information must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning, and statistics. It provides a rigorous and strong notion of protection for individuals' sensitive information. A fundamental question is how to incorporate differential privacy into traditional statistical inference procedures. In this paper we study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and propose two algorithms to do so. We show that our private procedures are consistent under essentially the same conditions as the corresponding non-private procedures. We also find that under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method using simulation studies and two real data examples

    R.A.Fisher, design theory, and the Indian connection

    Get PDF
    Design Theory, a branch of mathematics, was born out of the experimental statistics research of the population geneticist R. A. Fisher and of Indian mathematical statisticians in the 1930s. The field combines elements of combinatorics, finite projective geometries, Latin squares, and a variety of further mathematical structures, brought together in surprising ways. This essay will present these structures and ideas as well as how the field came together, in itself an interesting story.Comment: 11 pages, 3 figure

    Test of candidate light distributors for the muon (g-2) laser calibration system

    Full text link
    The new muon (g-2) experiment E989 at Fermilab will be equipped with a laser calibration system for all the 1296 channels of the calorimeters. An integrating sphere and an alternative system based on an engineered diffuser have been considered as possible light distributors for the experiment. We present here a detailed comparison of the two based on temporal response, spatial uniformity, transmittance and time stability.Comment: accepted to Nucl.Instrum.Meth.

    Dopamine D1 vs D5 receptor-dependent induction of seizures in relation to DARPP-32, ERK1/2 and GluR1-AMPA signalling.

    Get PDF
    Recent reports have shown that the selective dopamine D(1)-like agonist SKF 83822 [which stimulates adenylate cyclase, but not phospholipase C] induces prominent behavioral seizures in mice, whereas its benzazepine congener SKF 83959 [which stimulates phospholipase C, but not adenylate cyclase] does not. To investigate the relative involvement of D(1) vs D(5) receptors in mediating seizures, ethological behavioral topography and cortical EEGs were recorded in D(1), D(5) and DARPP-32 knockout mice in response to a convulsant dose of SKF 83822. SKF 83822-induced behavioral and EEG seizures were gene dose-dependently abolished in D(1) knockouts. In both heterozygous and homozygous D(5) knockouts, the latency to first seizure was significantly increased and total EEG seizures were reduced relative to wild-types. The majority (60%) of homozygous DARPP-32 knockouts did not have seizures; of those having seizures (40%), the latency to first seizure was significantly increased and the number of high amplitude, high frequency polyspike EEG events was reduced. In addition, immunoblotting was performed to investigate downstream intracellular signalling mechanisms at D(1)-like receptors following challenge with SKF 83822 and SKF 83959. In wild-types administered SKF 83822, levels of ERK1/2 and GluR1 AMPA receptor phosphorylation increased two-fold in both the striatum and hippocampus; in striatal slices DARPP-32 phosphorylation at Thr34 increased five-fold relative to vehicle-treated controls. These findings indicate that D(1), and to a lesser extent D(5), receptor coupling to DARPP-32, ERK1/2 and glutamatergic signalling is involved in mediating the convulsant effects of SKF 83822

    A Heuristic Solution of the Identifiability Problem of the Age-Period-Cohort Analysis of Cancer Occurrence: Lung Cancer Example

    Get PDF
    Background: The Age–Period–Cohort (APC) analysis is aimed at estimating the following effects on disease incidence: (i) the age of the subject at the time of disease diagnosis; (ii) the time period, when the disease occurred; and (iii) the date of birth of the subject. These effects can help in evaluating the biological events leading to the disease, in estimating the influence of distinct risk factors on disease occurrence, and in the development of new strategies for disease prevention and treatment. Methodology/Principal Findings: We developed a novel approach for estimating the APC effects on disease incidence rates in the frame of the Log-Linear Age-Period-Cohort (LLAPC) model. Since the APC effects are linearly interdependent and cannot be uniquely estimated, solving this identifiability problem requires setting four redundant parameters within a set of unknown parameters. By setting three parameters (one of the time-period and the birth-cohort effects and the corresponding age effect) to zero, we reduced this problem to the problem of determining one redundant parameter and, used as such, the effect of the time-period adjacent to the anchored time period. By varying this identification parameter, a family of estimates of the APC effects can be obtained. Using a heuristic assumption that the differences between the adjacent birth-cohort effects are small, we developed a numerical method for determining the optimal value of the identification parameter, by which a unique set of all APC effects is determined and the identifiability problem is solved

    The harvest plot: A method for synthesising evidence about the differential effects of interventions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One attraction of meta-analysis is the forest plot, a compact overview of the essential data included in a systematic review and the overall 'result'. However, meta-analysis is not always suitable for synthesising evidence about the effects of interventions which may influence the wider determinants of health. As part of a systematic review of the effects of population-level tobacco control interventions on social inequalities in smoking, we designed a novel approach to synthesis intended to bring aspects of the graphical directness of a forest plot to bear on the problem of synthesising evidence from a complex and diverse group of studies.</p> <p>Methods</p> <p>We coded the included studies (n = 85) on two methodological dimensions (suitability of study design and quality of execution) and extracted data on effects stratified by up to six different dimensions of inequality (income, occupation, education, gender, race or ethnicity, and age), distinguishing between 'hard' (behavioural) and 'intermediate' (process or attitudinal) outcomes. Adopting a hypothesis-testing approach, we then assessed which of three competing hypotheses (positive social gradient, negative social gradient, or no gradient) was best supported by each study for each dimension of inequality.</p> <p>Results</p> <p>We plotted the results on a matrix ('harvest plot') for each category of intervention, weighting studies by the methodological criteria and distributing them between the competing hypotheses. These matrices formed part of the analytical process and helped to encapsulate the output, for example by drawing attention to the finding that increasing the price of tobacco products may be more effective in discouraging smoking among people with lower incomes and in lower occupational groups.</p> <p>Conclusion</p> <p>The harvest plot is a novel and useful method for synthesising evidence about the differential effects of population-level interventions. It contributes to the challenge of making best use of all available evidence by incorporating all relevant data. The visual display assists both the process of synthesis and the assimilation of the findings. The method is suitable for adaptation to a variety of questions in evidence synthesis and may be particularly useful for systematic reviews addressing the broader type of research question which may be most relevant to policymakers.</p

    Studies of an array of PbF2 Cherenkov crystals with large-area SiPM readout

    Get PDF
    The electromagnetic calorimeter for the new muon (g-2) experiment at Fermilab will consist of arrays of PbF2 Cherenkov crystals read out by large-area silicon photo-multiplier (SiPM) sensors. We report here on measurements and simulations using 2.0 -- 4.5 GeV electrons with a 28-element prototype array. All data were obtained using fast waveform digitizers to accurately capture signal pulse shapes versus energy, impact position, angle, and crystal wrapping. The SiPMs were gain matched using a laser-based calibration system, which also provided a stabilization procedure that allowed gain correction to a level of 1e-4 per hour. After accounting for longitudinal fluctuation losses, those crystals wrapped in a white, diffusive wrapping exhibited an energy resolution sigma/E of (3.4 +- 0.1) % per sqrt(E/GeV), while those wrapped in a black, absorptive wrapping had (4.6 +- 0.3) % per sqrt(E/GeV). The white-wrapped crystals---having nearly twice the total light collection---display a generally wider and impact-position-dependent pulse shape owing to the dynamics of the light propagation, in comparison to the black-wrapped crystals, which have a narrower pulse shape that is insensitive to impact position.Comment: 14 pages, 19 figures, accepted to Nucl.Instrum.Meth. A. In v2, edited Figures 14,15, and 17 for clarity, improved explanation of energy resolution systematics, added reference to SiP

    Multi-source statistics:Basic situations and methods

    Get PDF
    Many National Statistical Institutes (NSIs), especially in Europe, are moving from single‐source statistics to multi‐source statistics. By combining data sources, NSIs can produce more detailed and more timely statistics and respond more quickly to events in society. By combining survey data with already available administrative data and Big Data, NSIs can save data collection and processing costs and reduce the burden on respondents. However, multi‐source statistics come with new problems that need to be overcome before the resulting output quality is sufficiently high and before those statistics can be produced efficiently. What complicates the production of multi‐source statistics is that they come in many different varieties as data sets can be combined in many different ways. Given the rapidly increasing importance of producing multi‐source statistics in Official Statistics, there has been considerable research activity in this area over the last few years, and some frameworks have been developed for multi‐source statistics. Useful as these frameworks are, they generally do not give guidelines to which method could be applied in a certain situation arising in practice. In this paper, we aim to fill that gap, structure the world of multi‐source statistics and its problems and provide some guidance to suitable methods for these problems
    corecore