38 research outputs found
Estimation in hidden Markov models via efficient importance sampling
Given a sequence of observations from a discrete-time, finite-state hidden
Markov model, we would like to estimate the sampling distribution of a
statistic. The bootstrap method is employed to approximate the confidence
regions of a multi-dimensional parameter. We propose an importance sampling
formula for efficient simulation in this context. Our approach consists of
constructing a locally asymptotically normal (LAN) family of probability
distributions around the default resampling rule and then minimizing the
asymptotic variance within the LAN family. The solution of this minimization
problem characterizes the asymptotically optimal resampling scheme, which is
given by a tilting formula. The implementation of the tilting formula is
facilitated by solving a Poisson equation. A few numerical examples are given
to demonstrate the efficiency of the proposed importance sampling scheme.Comment: Published at http://dx.doi.org/10.3150/07--BEJ5163 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Multi-armed bandit problem with precedence relations
Consider a multi-phase project management problem where the decision maker
needs to deal with two issues: (a) how to allocate resources to projects within
each phase, and (b) when to enter the next phase, so that the total expected
reward is as large as possible. We formulate the problem as a multi-armed
bandit problem with precedence relations. In Chan, Fuh and Hu (2005), a class
of asymptotically optimal arm-pulling strategies is constructed to minimize the
shortfall from perfect information payoff. Here we further explore optimality
properties of the proposed strategies. First, we show that the efficiency
benchmark, which is given by the regret lower bound, reduces to those in Lai
and Robbins (1985), Hu and Wei (1989), and Fuh and Hu (2000). This implies that
the proposed strategy is also optimal under the settings of aforementioned
papers. Secondly, we establish the super-efficiency of proposed strategies when
the bad set is empty. Thirdly, we show that they are still optimal with
constant switching cost between arms. In addition, we prove that the Wald's
equation holds for Markov chains under Harris recurrent condition, which is an
important tool in studying the efficiency of the proposed strategies.Comment: Published at http://dx.doi.org/10.1214/074921706000001067 in the IMS
Lecture Notes Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
New insights into old methods for identifying causal rare variants
The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants
Inflated Type I Error Rates When Using Aggregation Methods to Analyze Rare Variants in the 1000 Genomes Project Exon Sequencing Data in Unrelated Individuals: Summary Results from Group 7 at Genetic Analysis Workshop 17
As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and false-positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long-range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes
Association screening for genes with multiple potentially rare variants: an inverse-probability weighted clustering approach
Both common variants and rare variants are involved in the etiology of most complex diseases in humans. Developments in sequencing technology have led to the identification of a high density of rare variant single-nucleotide polymorphisms (SNPs) on the genome, each of which affects only at most 1% of the population. Genotypes derived from these SNPs allow one to study the involvement of rare variants in common human disorders. Here, we propose an association screening approach that treats genes as units of analysis. SNPs within a gene are used to create partitions of individuals, and inverse-probability weighting is used to overweight genotypic differences observed on rare variants. Association between a phenotype trait and the constructed partition is then evaluated. We consider three association tests (one-way ANOVA, chi-square test, and the partition retention method) and compare these strategies using the simulated data from the Genetic Analysis Workshop 17. Several genes that contain causal SNPs were identified by the proposed method as top genes
Identifying influential regions in extremely rare variants using a fixed-bin approach
In this study, we analyze the Genetic Analysis Workshop 17 data to identify regions of single-nucleotide polymorphisms (SNPs) that exhibit a significant influence on response rate (proportion of subjects with an affirmative affected status), called the affected ratio, among rare variants. Under the null hypothesis, the distribution of rare variants is assumed to be uniform over case (affected) and control (unaffected) subjects. We attempt to pinpoint regions where the composition is significantly different between case and control events, specifically where there are unusually high numbers of rare variants among affected subjects. We focus on private variants, which require a degree of âcollapsingâ to combine information over several SNPs, to obtain meaningful results. Instead of implementing a gene-based approach, where regions would vary in size and sometimes be too small to achieve a strong enough signal, we implement a fixed-bin approach, with a preset number of SNPs per region, relying on the assumption that proximity and similarity go hand in hand. Through application of 100-SNP and 30-SNP fixed bins, we identify several most influential regions, which later are seen to contain some of the causal SNPs. The 100- and 30-SNP approaches detected seven and three causal SNPs among the most significant regions, respectively, with two overlapping SNPs located in the ELAVL4 gene, reported by both procedures
Strong Consistency of Bayes Estimates in Stochastic Regression Models
Under minimum assumptions on the stochastic regressors, strong consistency of Bayes estimates is established in stochastic regression models in two cases: (1)ĂÂ When the prior distribution is discrete, the p.d.f.fof i.i.d. random errors is assumed to have finite Fisher informationI=[integral operator][infinity]-[infinity](f')2/fĂÂ dxBayes estimates stochastic regressor martingale system identification adaptive control dynamic model strongly unimodal