737 research outputs found

    Shortest Paths and Distances with Differential Privacy

    Full text link
    We introduce a model for differentially private analysis of weighted graphs in which the graph topology (V,E)(V,E) is assumed to be public and the private information consists only of the edge weights w:ER+w:E\to\mathbb{R}^+. This can express hiding congestion patterns in a known system of roads. Differential privacy requires that the output of an algorithm provides little advantage, measured by privacy parameters ϵ\epsilon and δ\delta, for distinguishing between neighboring inputs, which are thought of as inputs that differ on the contribution of one individual. In our model, two weight functions w,ww,w' are considered to be neighboring if they have 1\ell_1 distance at most one. We study the problems of privately releasing a short path between a pair of vertices and of privately releasing approximate distances between all pairs of vertices. We are concerned with the approximation error, the difference between the length of the released path or released distance and the length of the shortest path or actual distance. For privately releasing a short path between a pair of vertices, we prove a lower bound of Ω(V)\Omega(|V|) on the additive approximation error for fixed ϵ,δ\epsilon,\delta. We provide a differentially private algorithm that matches this error bound up to a logarithmic factor and releases paths between all pairs of vertices. The approximation error of our algorithm can be bounded by the number of edges on the shortest path, so we achieve better accuracy than the worst-case bound for vertex pairs that are connected by a low-weight path with o(V)o(|V|) vertices. For privately releasing all-pairs distances, we show that for trees we can release all distances with approximation error O(log2.5V)O(\log^{2.5}|V|) for fixed privacy parameters. For arbitrary bounded-weight graphs with edge weights in [0,M][0,M] we can release all distances with approximation error O~(VM)\tilde{O}(\sqrt{|V|M})

    Slicing cluster mass functions with a Bayesian razor

    Full text link
    We apply a Bayesian "razor" to forecast Bayes factors between different parameterizations of the galaxy cluster mass function. To demonstrate this approach, we calculate the minimum size N-body simulation needed for strong evidence favoring a two-parameter mass function over one-parameter mass functions and visa versa, as a function of the minimum cluster mass.Comment: 5 pages, 2 figures, accepted to Astronomische Nachrichte

    Predicting enhancer regions and transcription factor binding sites in D. melanogaster

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 71-75).Identifying regions in the genome that have regulatory function is important to the fundamental biological problem of understanding the mechanisms through which a regulatory sequence drives specific spatial and temporal patterns of gene expression in early development. The modENCODE project aims to comprehensively identify functional elements in the C. elegans and D. melanogaster genomes. The genome- wide binding locations of all known transcription factors as well as of other DNA- binding proteins are currently being mapped within the context of this project [8]. The large quantity of new data that is becoming available through the modENCODE project and other experimental efforts offers the potential for gaining insight into the mechanisms of gene regulation. Developing improved approaches to identify functional regions and understand their architecture based on available experimental data represents a critical part of the modENCODE effort. Towards this goal, I use a machine learning approach to study the predictive power of experimental and sequence-based combinations of features for predicting enhancers and transcription factor binding sites.by Rachel Sealfon.S.M

    Elucidation of molecular kinetic schemes from macroscopic traces using system identification

    Get PDF
    Overall cellular responses to biologically-relevant stimuli are mediated by networks of simpler lower-level processes. Although information about some of these processes can now be obtained by visualizing and recording events at the molecular level, this is still possible only in especially favorable cases. Therefore the development of methods to extract the dynamics and relationships between the different lower-level (microscopic) processes from the overall (macroscopic) response remains a crucial challenge in the understanding of many aspects of physiology. Here we have devised a hybrid computational-analytical method to accomplish this task, the SYStems-based MOLecular kinetic scheme Extractor (SYSMOLE). SYSMOLE utilizes system-identification input-output analysis to obtain a transfer function between the stimulus and the overall cellular response in the Laplace-transformed domain. It then derives a Markov-chain state molecular kinetic scheme uniquely associated with the transfer function by means of a classification procedure and an analytical step that imposes general biological constraints. We first tested SYSMOLE with synthetic data and evaluated its performance in terms of its rate of convergence to the correct molecular kinetic scheme and its robustness to noise. We then examined its performance on real experimental traces by analyzing macroscopic calcium-current traces elicited by membrane depolarization. SYSMOLE derived the correct, previously known molecular kinetic scheme describing the activation and inactivation of the underlying calcium channels and correctly identified the accepted mechanism of action of nifedipine, a calcium-channel blocker clinically used in patients with cardiovascular disease. Finally, we applied SYSMOLE to study the pharmacology of a new class of glutamate antipsychotic drugs and their crosstalk mechanism through a heteromeric complex of G protein-coupled receptors. Our results indicate that our methodology can be successfully applied to accurately derive molecular kinetic schemes from experimental macroscopic traces, and we anticipate that it may be useful in the study of a wide variety of biological systems

    ProbCD: enrichment analysis accounting for categorization uncertainty

    Get PDF
    As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R package to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for
the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation

    3D-Matched-Filter Galaxy Cluster Finder I: Selection Functions and CFHTLS Deep Clusters

    Full text link
    We present an optimised galaxy cluster finder, 3D-Matched-Filter (3D-MF), which utilises galaxy cluster radial profiles, luminosity functions and redshift information to detect galaxy clusters in optical surveys. This method is an improvement over other matched-filter methods, most notably through implementing redshift slicing of the data to significantly reduce line-of-sight projections and related false positives. We apply our method to the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) Deep fields, finding ~170 galaxy clusters per square degree in the 0.2 <= z <= 1.0 redshift range. Future surveys such as LSST and JDEM can exploit 3D-MF's automated methodology to produce complete and reliable galaxy cluster catalogues. We determine the reliability and accuracy of the statistical approach of our method through a thorough analysis of mock data from the Millennium Simulation. We detect clusters with 100% completeness for M_200 >= 3.0x10^(14)M_sun, 88% completeness for M_200 >= 1.0x10^(14)M_sun, and 72% completeness well into the 10^(13)M_sun cluster mass range. We show a 36% multiple detection rate for cluster masses >= 1.5x10^(13)M_sun and a 16% false detection rate for galaxy clusters >~ 5x10^(13)M_sun, reporting that for clusters with masses <~ 5x10^(13)M_sun false detections may increase up to ~24%. Utilising these selection functions we conclude that our galaxy cluster catalogue is the most complete CFHTLS Deep cluster catalogue to date.Comment: 18 pages, 17 figures, 5 tables; v2: added Fig 5, minor edits to match version published in MNRA
    corecore