91 research outputs found

    Nearly Optimal Private Convolution

    Full text link
    We study computing the convolution of a private input xx with a public input hh, while satisfying the guarantees of (Ï”,ÎŽ)(\epsilon, \delta)-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying (Ï”,ÎŽ)(\epsilon, \delta)-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants

    Urban Gravity: a Model for Intercity Telecommunication Flows

    Get PDF
    We analyze the anonymous communication patterns of 2.5 million customers of a Belgian mobile phone operator. Grouping customers by billing address, we build a social network of cities, that consists of communications between 571 cities in Belgium. We show that inter-city communication intensity is characterized by a gravity model: the communication intensity between two cities is proportional to the product of their sizes divided by the square of their distance

    Identifiability of flow distributions from link measurements with applications to computer networks

    Full text link
    We study the problem of identifiability of distributions of flows on a graph from aggregate measurements collected on its edges. This is a canonical example of a statistical inverse problem motivated by recent developments in computer networks. In this paper (i) we introduce a number of models for multi-modal data that capture their spatio-temporal correlation, (ii) provide sufficient conditions for the identifiability of nth order cumulants and also for a special class of heavy tailed distributions. Further, we investigate conditions on network routing for the flows that prove sufficient for identifiability of their distributions (up to mean). Finally, we extend our results to directed acyclic graphs and discuss some open problems.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/58107/2/ip7_5_004.pd

    Structure and expression analysis of rice paleo duplications

    Get PDF
    Having a well-known history of genome duplication, rice is a good model for studying structural and functional evolution of paleo duplications. Improved sequence alignment criteria were used to characterize 10 major chromosome-to-chromosome duplication relationships associated with 1440 paralogous pairs, covering 47.8% of the rice genome, with 12.6% of genes that are conserved within sister blocks. Using a micro-array experiment, a genome-wide expression map has been produced, in which 2382 genes show significant differences of expression in root, leaf and grain. By integrating both structural (1440 paralogous pairs) and functional information (2382 differentially expressed genes), we identified 115 paralogous gene pairs for which at least one copy is differentially expressed in one of the three tissues. A vast majority of the 115 paralogous gene pairs have been neofunctionalized or subfunctionalized as 88%, 89% and 96% of duplicates, respectively, expressed in grain, leaf and root show distinct expression patterns. On the basis of a Gene Ontology analysis, we have identified and characterized the gene families that have been structurally and functionally preferentially retained in the duplication showing that the vast majority (>85%) of duplicated have been either lost or have been subfunctionalized or neofunctionalized during 50–70 million years of evolution

    Specific patterns of gene space organisation revealed in wheat by using the combination of barley and wheat genomic resources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Because of its size, allohexaploid nature and high repeat content, the wheat genome has always been perceived as too complex for efficient molecular studies. We recently constructed the first physical map of a wheat chromosome (3B). However gene mapping is still laborious in wheat because of high redundancy between the three homoeologous genomes. In contrast, in the closely related diploid species, barley, numerous gene-based markers have been developed. This study aims at combining the unique genomic resources developed in wheat and barley to decipher the organisation of gene space on wheat chromosome 3B.</p> <p>Results</p> <p>Three dimensional pools of the minimal tiling path of wheat chromosome 3B physical map were hybridised to a barley Agilent 15K expression microarray. This led to the fine mapping of 738 barley orthologous genes on wheat chromosome 3B. In addition, comparative analyses revealed that 68% of the genes identified were syntenic between the wheat chromosome 3B and barley chromosome 3 H and 59% between wheat chromosome 3B and rice chromosome 1, together with some wheat-specific rearrangements. Finally, it indicated an increasing gradient of gene density from the centromere to the telomeres positively correlated with the number of genes clustered in islands on wheat chromosome 3B.</p> <p>Conclusion</p> <p>Our study shows that novel structural genomics resources now available in wheat and barley can be combined efficiently to overcome specific problems of genetic anchoring of physical contigs in wheat and to perform high-resolution comparative analyses with rice for deciphering the organisation of the wheat gene space.</p

    Children living with HIV in Europe: do migrants have worse treatment outcomes?

    Get PDF

    Brachypodium distachyon as a model for defining the allergen potential of non-prolamin proteins

    Get PDF
    Epitope databases and the protein sequences of published plant genomes are suitable to identify some of the proteins causing food allergies and sensitivities. Brachypodium distachyon, a diploid wild grass with a sequenced genome and low prolamin content, is the closest relative of the allergen cereals, such as wheat or barley. Using the Brachypodium genome sequence, a workflow has been developed to identify potentially harmful proteins which may cause either celiac disease or wheat allergy-related symptoms. Seed tissue-specific expression of the potential allergens has been determined, and intact epitopes following an in silico digestion with several endopeptidases have been identified. Molecular function of allergen proteins has been evaluated using Gene Ontology terms. Biologically overrepresented proteins and potentially allergen protein families have been identified. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10142-012-0294-z) contains supplementary material, which is available to authorized users
    • 

    corecore