222 research outputs found

    Consistency of Bayes estimators of a binary regression function

    Full text link
    When do nonparametric Bayesian procedures ``overfit''? To shed light on this question, we consider a binary regression problem in detail and establish frequentist consistency for a certain class of Bayes procedures based on hierarchical priors, called uniform mixture priors. These are defined as follows: let ν\nu be any probability distribution on the nonnegative integers. To sample a function ff from the prior πν\pi^{\nu}, first sample mm from ν\nu and then sample ff uniformly from the set of step functions from [0,1][0,1] into [0,1][0,1] that have exactly mm jumps (i.e., sample all mm jump locations and m+1m+1 function values independently and uniformly). The main result states that if a data-stream is generated according to any fixed, measurable binary-regression function f0≢1/2f_0\not\equiv1/2, then frequentist consistency obtains: that is, for any ν\nu with infinite support, the posterior of πν\pi^{\nu} concentrates on any L1L^1 neighborhood of f0f_0. Solution of an associated large-deviations problem is central to the consistency proof.Comment: Published at http://dx.doi.org/10.1214/009053606000000236 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations

    Get PDF
    This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it - an improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multi-variate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by (Arias-Castro, Donoho and Huo 2003). In the spirit of Reproducible Research (Donoho et al. 2008) all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.Comment: Added some missing script files and updated other ancillary data (code and data files). To be submitted to the Astophysical Journa

    Nonuniversal scaling behavior of Barkhausen noise

    Full text link
    We simulate Barkhausen avalanches on fractal clusters in a two-dimensional diluted Ising ferromagnet with an effective Gaussian random field. We vary the concentration of defect sites cc and find a scaling region for moderate disorder, where the distribution of avalanche sizes has the form D(s,c,L)=s(1+τ(c))D(sLDs(c))D(s,c,L) = s^{-(1+\tau (c))}{\cal{D}}(sL^{-D_s(c)}). The exponents τ(c)\tau (c) for size and α(c)\alpha (c) for length distribution, and the fractal dimension of avalanches Ds(c)D_s(c) satisfy the scaling relation Ds(c)τ(c)=α(c)D_s(c)\tau (c) =\alpha (c). For fixed disorder the exponents vary with driving rate in agreement with experiments on amorphous Si-Fe alloys.Comment: 5 pages, Latex, 4 PostScript figures include

    Disorder-Induced Critical Phenomena in Hysteresis: Numerical Scaling in Three and Higher Dimensions

    Full text link
    We present numerical simulations of avalanches and critical phenomena associated with hysteresis loops, modeled using the zero-temperature random-field Ising model. We study the transition between smooth hysteresis loops and loops with a sharp jump in the magnetization, as the disorder in our model is decreased. In a large region near the critical point, we find scaling and critical phenomena, which are well described by the results of an epsilon expansion about six dimensions. We present the results of simulations in 3, 4, and 5 dimensions, with systems with up to a billion spins (1000^3).Comment: Condensed and updated version of cond-mat/9609072,``Disorder-Induced Critical Phenomena in Hysteresis: A Numerical Scaling Analysis'

    Hysteresis, Avalanches, and Disorder Induced Critical Scaling: A Renormalization Group Approach

    Full text link
    We study the zero temperature random field Ising model as a model for noise and avalanches in hysteretic systems. Tuning the amount of disorder in the system, we find an ordinary critical point with avalanches on all length scales. Using a mapping to the pure Ising model, we Borel sum the 6ϵ6-\epsilon expansion to O(ϵ5)O(\epsilon^5) for the correlation length exponent. We sketch a new method for directly calculating avalanche exponents, which we perform to O(ϵ)O(\epsilon). Numerical exponents in 3, 4, and 5 dimensions are in good agreement with the analytical predictions.Comment: 134 pages in REVTEX, plus 21 figures. The first two figures can be obtained from the references quoted in their respective figure captions, the remaining 19 figures are supplied separately in uuencoded forma

    Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach

    Get PDF
    Elucidating the genetic basis of complex traits and diseases in non-European populations is particularly challenging because US minority populations have been under-represented in genetic association studies. We developed an empirical Bayes approach named XPEB (cross-population empirical Bayes), designed to improve the power for mapping complex-trait-associated loci in a minority population by exploiting information from genome-wide association studies (GWASs) from another ethnic population. Taking as input summary statistics from two GWASs—a target GWAS from an ethnic minority population of primary interest and an auxiliary base GWAS (such as a larger GWAS in Europeans)—our XPEB approach reprioritizes SNPs in the target population to compute local false-discovery rates. We demonstrated, through simulations, that whenever the base GWAS harbors relevant information, XPEB gains efficiency. Moreover, XPEB has the ability to discard irrelevant auxiliary information, providing a safeguard against inflated false-discovery rates due to genetic heterogeneity between populations. Applied to a blood-lipids study in African Americans, XPEB more than quadrupled the discoveries from the conventional approach, which used a target GWAS alone, bringing the number of significant loci from 14 to 65. Thus, XPEB offers a flexible framework for mapping complex traits in minority populations

    Genome-wide Characterization of Shared and Distinct Genetic Components that Influence Blood Lipid Levels in Ethnically Diverse Human Populations

    Get PDF
    Blood lipid concentrations are heritable risk factors associated with atherosclerosis and cardiovascular diseases. Lipid traits exhibit considerable variation among populations of distinct ancestral origin as well as between individuals within a population. We performed association analyses to identify genetic loci influencing lipid concentrations in African American and Hispanic American women in the Women’s Health Initiative SNP Health Association Resource. We validated one African-specific high-density lipoprotein cholesterol locus at CD36 as well as 14 known lipid loci that have been previously implicated in studies of European populations. Moreover, we demonstrate striking similarities in genetic architecture (loci influencing the trait, direction and magnitude of genetic effects, and proportions of phenotypic variation explained) of lipid traits across populations. In particular, we found that a disproportionate fraction of lipid variation in African Americans and Hispanic Americans can be attributed to genomic loci exhibiting statistical evidence of association in Europeans, even though the precise genes and variants remain unknown. At the same time, we found substantial allelic heterogeneity within shared loci, characterized both by population-specific rare variants and variants shared among multiple populations that occur at disparate frequencies. The allelic heterogeneity emphasizes the importance of including diverse populations in future genetic association studies of complex traits such as lipids; furthermore, the overlap in lipid loci across populations of diverse ancestral origin argues that additional knowledge can be gleaned from multiple populations

    Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation

    Get PDF
    Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth ∼20×). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types (∼94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination

    Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?

    Get PDF
    Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper

    Identification of the Transcriptional Regulator NcrB in the Nickel Resistance Determinant of Leptospirillum ferriphilum UBK03

    Get PDF
    The nickel resistance determinant ncrABCY was identified in Leptospirillum ferriphilum UBK03. Within this operon, ncrA and ncrC encode two membrane proteins that form an efflux system, and ncrB encodes NcrB, which belongs to an uncharacterized family (DUF156) of proteins. How this determinant is regulated remains unknown. Our data indicate that expression of the nickel resistance determinant is induced by nickel. The promoter of ncrA, designated pncrA, was cloned into the promoter probe vector pPR9TT, and co-transformed with either a wild-type or mutant nickel resistance determinant. The results revealed that ncrB encoded a transcriptional regulator that could regulate the expression of ncrA, ncrB, and ncrC. A GC-rich inverted repeat sequence was identified in the promoter pncrA. Electrophoretic mobility shift assays (EMSAs) and footprinting assays showed that purified NcrB could specifically bind to the inverted repeat sequence of pncrA in vitro; this was confirmed by bacterial one-hybrid analysis. Moreover, this binding was inhibited in the presence of nickel ions. Thus, we classified NcrB as a transcriptional regulator that recognizes the inverted repeat sequence binding motif to regulate the expression of the key nickel resistance gene, ncrA
    corecore