216 research outputs found
Recommended from our members
Three Contributions to Latent Variable Modeling
The dissertation includes three papers that address some theoretical and technical issues of latent variable models. The first paper extends the uniformly most powerful test approach for testing person parameter in IRT to the two-parameter logistic models. In addition, an efficient branch-and-bound algorithm for computing the exact p-value is proposed. The second paper proposes a reparameterization of the log-linear CDM model. A Gibbs sampler is developed for posterior computation. The third paper proposes an ordered latent class model with infinite classes using a stochastic process prior. Furthermore, a nonparametric IRT application is also discussed
Bayesian Ideas in Survey Sampling: The Legacy of Basu
Survey sampling and, more generally, Official Statistics are experiencing an
important renovation time. On one hand, there is the need to exploit the
huge information potentiality that the digital revolution made available in
terms of data. On the other hand, this process occurred simultaneously with
a progressive deterioration of the quality of classical sample surveys, due
to a decreasing willingness to participate and an increasing rate of missing
responses. The switch from survey-based inference to a hybrid system involv-
ing register-based information has made more stringent the debate and the
possible resolution of the design-based versus model-based approaches con-
troversy. In this new framework, the use of statistical models seems unavoid-
able and it is today a relevant part of the official statistician toolkit. Models
are important in several different contexts, from Small area estimation to
non sampling error adjustment, but they are also crucial for correcting bias
due to over and undercoverage of administrative data, in order to prevent
potential selection bias, and to deal with different definitions and/or errors in
the measurement process of the administrative sources. The progressive shift
from a design-based to a model-based approach in terms of super-population
is a matter of fact in the practice of the National Statistical Institutes. How-
ever, the introduction of Bayesian ideas in official statistics still encounters
difficulties and resistance. In this work, we attempt a non-systematic review
of the Bayesian development in this area and try to highlight the extra ben-
efit that a Bayesian approach might provide. Our general conclusion is that,
while the general picture is today clear and most of the basic topics of survey
sampling can be easily rephrased and tackled from a Bayesian perspective,
much work is still necessary for the availability of a ready-to-use platform
of Bayesian survey sampling in the presence of complex sampling design,
non-ignorable missing data patterns, and large datasets
Statistical methods in detecting differential expressed genes, analyzing insertion tolerance for genes and group selection for survival data
The thesis is composed of three independent projects: (i) analyzing transposon-sequencing data to infer functions of genes on bacteria growth (chapter 2), (ii) developing semi-parametric Bayesian method method for differential gene expression analysis with RNA-sequencing data (chapter 3), (iii) solving group selection problem for survival data (chapter 4). All projects
are motivated by statistical challenges raised in biological research.
The first project is motivated by the need to develop statistical models to accommodate the transposon insertion sequencing (Tn-Seq) data, Tn-Seq data consist of sequence reads around each transposon insertion site.
The detection of transposon insertion at a given site indicates that the disruption of genomic sequence at this site does not cause essential function loss and the bacteria can still grow.
Hence, such measurements have been used to infer the functions of each gene on bacteria growth. We propose a zero-inflated Poisson regression method for analyzing the Tn-Seq count data, and derive an Expectation-Maximization (EM) algorithm to obtain parameter estimates. We also propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant, and hyper-tolerant, while controlling false discovery rate. Simulation studies show our method provides good
estimation of model parameters and inference on gene functions.
In the second project, we model the count data from RNA-sequencing experiment for each gene using a Poisson-Gamma hierarchical model, or equivalently, a negative binomial (NB) model. We derive a full semi-parametric Bayesian approach with Dirichlet process as the prior for the fold changes between two treatment means. An inference strategy using Gibbs algorithm is developed for differential expression analysis. We evaluate our method with several simulation studies, and the results demonstrate that our method outperforms other methods including the popularly applied ones such as edgeR and DESeq.
In the third project, we develop a new semi-parametric Bayesian method to address the group variable selection problem and study the dependence of survival outcomes on the grouped predictors using the Cox proportional hazard model. We use indicators for groups to induce sparseness and obtain the posterior inclusion probability for each group. Bayes factors are used to evaluate whether the groups should be selected or not. We compare our method with one frequentist method (HPCox) based on several simulation studies and show that our method performs better than HPCox method.
In summary, this dissertation tackles several statistical problems raised in biological research, including high-dimensional genomic data analysis and survival analysis. All proposed methods are evaluated with simulation studies and show satisfactory performances. We also apply the proposed methods to real data analysis
- …