Search CORE

737 research outputs found

Shortest Paths and Distances with Differential Privacy

Author: Sealfon Adam
Publication venue
Publication date: 20/04/2016
Field of study

We introduce a model for differentially private analysis of weighted graphs in which the graph topology

(V,E)

is assumed to be public and the private information consists only of the edge weights

w:E\to\mathbb{R}^+

. This can express hiding congestion patterns in a known system of roads. Differential privacy requires that the output of an algorithm provides little advantage, measured by privacy parameters

\epsilon

and

\delta

, for distinguishing between neighboring inputs, which are thought of as inputs that differ on the contribution of one individual. In our model, two weight functions

w,w'

are considered to be neighboring if they have

\ell_1

distance at most one. We study the problems of privately releasing a short path between a pair of vertices and of privately releasing approximate distances between all pairs of vertices. We are concerned with the approximation error, the difference between the length of the released path or released distance and the length of the shortest path or actual distance. For privately releasing a short path between a pair of vertices, we prove a lower bound of

\Omega(|V|)

on the additive approximation error for fixed

\epsilon,\delta

. We provide a differentially private algorithm that matches this error bound up to a logarithmic factor and releases paths between all pairs of vertices. The approximation error of our algorithm can be bounded by the number of edges on the shortest path, so we achieve better accuracy than the worst-case bound for vertex pairs that are connected by a low-weight path with

o(|V|)

vertices. For privately releasing all-pairs distances, we show that for trees we can release all distances with approximation error

O(\log^{2.5}|V|)

for fixed privacy parameters. For arbitrary bounded-weight graphs with edge weights in

[0,M]

we can release all distances with approximation error

\tilde{O}(\sqrt{|V|M})

arXiv.org e-Print Archive

Crossref

Slicing cluster mass functions with a Bayesian razor

Author: Sealfon Carolyn D.
Publication venue: 'Wiley'
Publication date: 09/09/2010
Field of study

We apply a Bayesian "razor" to forecast Bayes factors between different parameterizations of the galaxy cluster mass function. To demonstrate this approach, we calculate the minimum size N-body simulation needed for strong evidence favoring a two-parameter mass function over one-parameter mass functions and visa versa, as a function of the minimum cluster mass.Comment: 5 pages, 2 figures, accepted to Astronomische Nachrichte

arXiv.org e-Print Archive

Crossref

Predicting enhancer regions and transcription factor binding sites in D. melanogaster

Author: Sealfon Rachel (Rachel Sima)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 71-75).Identifying regions in the genome that have regulatory function is important to the fundamental biological problem of understanding the mechanisms through which a regulatory sequence drives specific spatial and temporal patterns of gene expression in early development. The modENCODE project aims to comprehensively identify functional elements in the C. elegans and D. melanogaster genomes. The genome- wide binding locations of all known transcription factors as well as of other DNA- binding proteins are currently being mapped within the context of this project [8]. The large quantity of new data that is becoming available through the modENCODE project and other experimental efforts offers the potential for gaining insight into the mechanisms of gene regulation. Developing improved approaches to identify functional regions and understand their architecture based on available experimental data represents a critical part of the modENCODE effort. Towards this goal, I use a machine learning approach to study the predictive power of experimental and sequence-based combinations of features for predicting enhancers and transcription factor binding sites.by Rachel Sealfon.S.M

DSpace@MIT

Elucidation of molecular kinetic schemes from macroscopic traces using system identification

Author: Brezina Vladimir
Fribourg Miguel
Galocha-Iragüen Belén
González-Maeso Javier
Las-Heras Andrés Fernando
Logothetis Diomedes E.
Sealfon Stuart C.
Publication venue: VCU Scholars Compass
Publication date: 01/01/2017
Field of study

Overall cellular responses to biologically-relevant stimuli are mediated by networks of simpler lower-level processes. Although information about some of these processes can now be obtained by visualizing and recording events at the molecular level, this is still possible only in especially favorable cases. Therefore the development of methods to extract the dynamics and relationships between the different lower-level (microscopic) processes from the overall (macroscopic) response remains a crucial challenge in the understanding of many aspects of physiology. Here we have devised a hybrid computational-analytical method to accomplish this task, the SYStems-based MOLecular kinetic scheme Extractor (SYSMOLE). SYSMOLE utilizes system-identification input-output analysis to obtain a transfer function between the stimulus and the overall cellular response in the Laplace-transformed domain. It then derives a Markov-chain state molecular kinetic scheme uniquely associated with the transfer function by means of a classification procedure and an analytical step that imposes general biological constraints. We first tested SYSMOLE with synthetic data and evaluated its performance in terms of its rate of convergence to the correct molecular kinetic scheme and its robustness to noise. We then examined its performance on real experimental traces by analyzing macroscopic calcium-current traces elicited by membrane depolarization. SYSMOLE derived the correct, previously known molecular kinetic scheme describing the activation and inactivation of the underlying calcium channels and correctly identified the accepted mechanism of action of nifedipine, a calcium-channel blocker clinically used in patients with cardiovascular disease. Finally, we applied SYSMOLE to study the pharmacology of a new class of glutamate antipsychotic drugs and their crosstalk mechanism through a heteromeric complex of G protein-coupled receptors. Our results indicate that our methodology can be successfully applied to accurately derive molecular kinetic schemes from experimental macroscopic traces, and we anticipate that it may be useful in the study of a wide variety of biological systems

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositorio Institucional de la Universidad de Oviedo

Directory of Open Access Journals

PubMed Central

VCU Scholars Compass

Secretaría de Estado de Cultura

The Francis Crick Institute

Archivo Digital UPM (Univ. Politécnica de Madrid)

ProbCD: enrichment analysis accounting for categorization uncertainty

Author: A Lewin
A Vinayagam
B Engelhardt
C Andersson
C Jones
D Martin
E Levy
I Rivals
Ilya Shmulevich
J Goeman
L Goodman
M Aubry
P Shannon
R Fisher
R Sealfon
R Vencio
Ricardo ZN Vêncio
S Carroll
S Maere
T Joshi
W Zhang
W Zhang
Z Jiang
Publication venue
Publication date: 01/01/2007
Field of study

As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R package to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for
the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Nature Precedings

3D-Matched-Filter Galaxy Cluster Finder I: Selection Functions and CFHTLS Deep Clusters

Author: Adami
Battye
Benoist
Benítez
C. Heymans
Cohn
Davis
De Lucia
Dietrich
Dressler
Erben
Gilbank
Gladders
Grove
Grove
H. Hildebrandt
Hansen
Hilbert
Hildebrandt
Hoekstra
J. P. Dietrich
Kepner
Kitzbichler
Kochanek
Kodama
Koester
L. Van Waerbeke
Le Fèvre
Li
Lilly
Lopes
Lu
M. Milkeraitis
Menanteau
Motl
Olsen
Olsen
Olsen
Olsen
Olsen
Olsen
Popesso
Postman
Ramella
Sealfon
Smith
Soneira
Springel
Stark
T. Erben
Thanjavur
Van Breukelen
Van Waerbeke
White
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

We present an optimised galaxy cluster finder, 3D-Matched-Filter (3D-MF), which utilises galaxy cluster radial profiles, luminosity functions and redshift information to detect galaxy clusters in optical surveys. This method is an improvement over other matched-filter methods, most notably through implementing redshift slicing of the data to significantly reduce line-of-sight projections and related false positives. We apply our method to the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) Deep fields, finding ~170 galaxy clusters per square degree in the 0.2 <= z <= 1.0 redshift range. Future surveys such as LSST and JDEM can exploit 3D-MF's automated methodology to produce complete and reliable galaxy cluster catalogues. We determine the reliability and accuracy of the statistical approach of our method through a thorough analysis of mock data from the Millennium Simulation. We detect clusters with 100% completeness for M_200 >= 3.0x10^(14)M_sun, 88% completeness for M_200 >= 1.0x10^(14)M_sun, and 72% completeness well into the 10^(13)M_sun cluster mass range. We show a 36% multiple detection rate for cluster masses >= 1.5x10^(13)M_sun and a 16% false detection rate for galaxy clusters >~ 5x10^(13)M_sun, reporting that for clusters with masses <~ 5x10^(13)M_sun false detections may increase up to ~24%. Utilising these selection functions we conclude that our galaxy cluster catalogue is the most complete CFHTLS Deep cluster catalogue to date.Comment: 18 pages, 17 figures, 5 tables; v2: added Fig 5, minor edits to match version published in MNRA

arXiv.org e-Print Archive

CiteSeerX

Crossref

Deep Blue Documents