222 research outputs found
Consistency of Bayes estimators of a binary regression function
When do nonparametric Bayesian procedures ``overfit''? To shed light on this
question, we consider a binary regression problem in detail and establish
frequentist consistency for a certain class of Bayes procedures based on
hierarchical priors, called uniform mixture priors. These are defined as
follows: let be any probability distribution on the nonnegative integers.
To sample a function from the prior , first sample from
and then sample uniformly from the set of step functions from
into that have exactly jumps (i.e., sample all jump locations
and function values independently and uniformly). The main result states
that if a data-stream is generated according to any fixed, measurable
binary-regression function , then frequentist consistency
obtains: that is, for any with infinite support, the posterior of
concentrates on any neighborhood of . Solution of an
associated large-deviations problem is central to the consistency proof.Comment: Published at http://dx.doi.org/10.1214/009053606000000236 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
This paper addresses the problem of detecting and characterizing local
variability in time series and other forms of sequential data. The goal is to
identify and characterize statistically significant variations, at the same
time suppressing the inevitable corrupting observational errors. We present a
simple nonparametric modeling technique and an algorithm implementing it - an
improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds
the optimal segmentation of the data in the observation interval. The structure
of the algorithm allows it to be used in either a real-time trigger mode, or a
retrospective mode. Maximum likelihood or marginal posterior functions to
measure model fitness are presented for events, binned counts, and measurements
at arbitrary times with known error distributions. Problems addressed include
those connected with data gaps, variable exposure, extension to piecewise
linear and piecewise exponential representations, multi-variate time series
data, analysis of variance, data on the circle, other data modes, and dispersed
data. Simulations provide evidence that the detection efficiency for weak
signals is close to a theoretical asymptotic limit derived by (Arias-Castro,
Donoho and Huo 2003). In the spirit of Reproducible Research (Donoho et al.
2008) all of the code and data necessary to reproduce all of the figures in
this paper are included as auxiliary material.Comment: Added some missing script files and updated other ancillary data
(code and data files). To be submitted to the Astophysical Journa
Nonuniversal scaling behavior of Barkhausen noise
We simulate Barkhausen avalanches on fractal clusters in a two-dimensional
diluted Ising ferromagnet with an effective Gaussian random field. We vary the
concentration of defect sites and find a scaling region for moderate
disorder, where the distribution of avalanche sizes has the form . The exponents for size
and for length distribution, and the fractal dimension of
avalanches satisfy the scaling relation .
For fixed disorder the exponents vary with driving rate in agreement with
experiments on amorphous Si-Fe alloys.Comment: 5 pages, Latex, 4 PostScript figures include
Disorder-Induced Critical Phenomena in Hysteresis: Numerical Scaling in Three and Higher Dimensions
We present numerical simulations of avalanches and critical phenomena
associated with hysteresis loops, modeled using the zero-temperature
random-field Ising model. We study the transition between smooth hysteresis
loops and loops with a sharp jump in the magnetization, as the disorder in our
model is decreased. In a large region near the critical point, we find scaling
and critical phenomena, which are well described by the results of an epsilon
expansion about six dimensions. We present the results of simulations in 3, 4,
and 5 dimensions, with systems with up to a billion spins (1000^3).Comment: Condensed and updated version of cond-mat/9609072,``Disorder-Induced
Critical Phenomena in Hysteresis: A Numerical Scaling Analysis'
Hysteresis, Avalanches, and Disorder Induced Critical Scaling: A Renormalization Group Approach
We study the zero temperature random field Ising model as a model for noise
and avalanches in hysteretic systems. Tuning the amount of disorder in the
system, we find an ordinary critical point with avalanches on all length
scales. Using a mapping to the pure Ising model, we Borel sum the
expansion to for the correlation length exponent. We sketch a
new method for directly calculating avalanche exponents, which we perform to
. Numerical exponents in 3, 4, and 5 dimensions are in good
agreement with the analytical predictions.Comment: 134 pages in REVTEX, plus 21 figures. The first two figures can be
obtained from the references quoted in their respective figure captions, the
remaining 19 figures are supplied separately in uuencoded forma
Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach
Elucidating the genetic basis of complex traits and diseases in non-European populations is particularly challenging because US minority populations have been under-represented in genetic association studies. We developed an empirical Bayes approach named XPEB (cross-population empirical Bayes), designed to improve the power for mapping complex-trait-associated loci in a minority population by exploiting information from genome-wide association studies (GWASs) from another ethnic population. Taking as input summary statistics from two GWASs—a target GWAS from an ethnic minority population of primary interest and an auxiliary base GWAS (such as a larger GWAS in Europeans)—our XPEB approach reprioritizes SNPs in the target population to compute local false-discovery rates. We demonstrated, through simulations, that whenever the base GWAS harbors relevant information, XPEB gains efficiency. Moreover, XPEB has the ability to discard irrelevant auxiliary information, providing a safeguard against inflated false-discovery rates due to genetic heterogeneity between populations. Applied to a blood-lipids study in African Americans, XPEB more than quadrupled the discoveries from the conventional approach, which used a target GWAS alone, bringing the number of significant loci from 14 to 65. Thus, XPEB offers a flexible framework for mapping complex traits in minority populations
Genome-wide Characterization of Shared and Distinct Genetic Components that Influence Blood Lipid Levels in Ethnically Diverse Human Populations
Blood lipid concentrations are heritable risk factors associated with atherosclerosis and cardiovascular diseases. Lipid traits exhibit considerable variation among populations of distinct ancestral origin as well as between individuals within a population. We performed association analyses to identify genetic loci influencing lipid concentrations in African American and Hispanic American women in the Women’s Health Initiative SNP Health Association Resource. We validated one African-specific high-density lipoprotein cholesterol locus at CD36 as well as 14 known lipid loci that have been previously implicated in studies of European populations. Moreover, we demonstrate striking similarities in genetic architecture (loci influencing the trait, direction and magnitude of genetic effects, and proportions of phenotypic variation explained) of lipid traits across populations. In particular, we found that a disproportionate fraction of lipid variation in African Americans and Hispanic Americans can be attributed to genomic loci exhibiting statistical evidence of association in Europeans, even though the precise genes and variants remain unknown. At the same time, we found substantial allelic heterogeneity within shared loci, characterized both by population-specific rare variants and variants shared among multiple populations that occur at disparate frequencies. The allelic heterogeneity emphasizes the importance of including diverse populations in future genetic association studies of complex traits such as lipids; furthermore, the overlap in lipid loci across populations of diverse ancestral origin argues that additional knowledge can be gleaned from multiple populations
Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation
Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth ∼20×). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types (∼94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination
Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?
Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper
Identification of the Transcriptional Regulator NcrB in the Nickel Resistance Determinant of Leptospirillum ferriphilum UBK03
The nickel resistance determinant ncrABCY was identified in Leptospirillum ferriphilum UBK03. Within this operon, ncrA and ncrC encode two membrane proteins that form an efflux system, and ncrB encodes NcrB, which belongs to an uncharacterized family (DUF156) of proteins. How this determinant is regulated remains unknown. Our data indicate that expression of the nickel resistance determinant is induced by nickel. The promoter of ncrA, designated pncrA, was cloned into the promoter probe vector pPR9TT, and co-transformed with either a wild-type or mutant nickel resistance determinant. The results revealed that ncrB encoded a transcriptional regulator that could regulate the expression of ncrA, ncrB, and ncrC. A GC-rich inverted repeat sequence was identified in the promoter pncrA. Electrophoretic mobility shift assays (EMSAs) and footprinting assays showed that purified NcrB could specifically bind to the inverted repeat sequence of pncrA in vitro; this was confirmed by bacterial one-hybrid analysis. Moreover, this binding was inhibited in the presence of nickel ions. Thus, we classified NcrB as a transcriptional regulator that recognizes the inverted repeat sequence binding motif to regulate the expression of the key nickel resistance gene, ncrA
- …