9 research outputs found

    Bayesian Variable Selection in High Dimensional Genomic Studies Using Nonlocal Priors

    Get PDF
    The advent of new genomic technologies has resulted in production of massive data sets. The outcomes in such experiments are often binary vectors or survival times, and the covariates are gene expressions obtained from thousands of genes under study. Analysis of these data, especially gene selection for a specific outcome, requires new statistical and computational methods. In this dissertation, I address this problem and propose one such method that is shown to be advantageous in selecting explanatory variables for prediction of binary responses and survival times. I adopt a Bayesian approach that utilizes a mixture of nonlocal prior densities and point masses on the regression coefficient vectors. The proposed method provides improved performance in identifying true models and reducing estimation and prediction error rates in a number of simulation studies for both binary and survival outcomes. I also describe a computational algorithm that can be used to implement the methodology in ultrahigh-dimensional settings (p ≫ n). In particular, for survival response datasets I show that MCMC is not feasible and instead provide a computational algorithm based on a stochastic search algorithm that is scalable and p invariant. As part of the variable selection methodology, I also propose a novel approach for setting prior hyperparameters by examining the total variation distance between the prior distributions on the regression parameters and the distribution of the maximum likelihood estimator under the null distribution. An R package, BVSNLP, is also introduced in this dissertation as a final product which contains all described methodology here. It performs high dimensional Bayesian variable selection for binary and survival outcome datasets that is expected to have a variety of applications including cancer genomic studies. Another problem that is addressed in this dissertation is methodology for deriving and extending Uniformly Most Powerful Bayesian tests (UMPBTs) from exponential family distributions to a larger class of testing contexts. UMPBTs are an objective class of Bayesian hypothesis tests that can be considered the Bayesian counterpart of classical uniformly most powerful tests. However, they have previously been exposed for application in one parameter exponential family models. I introduce sufficient conditions for the existence of UMPBTs and propose a unified approach for their derivation. An important application of my methodology is the extension of UMPBTs to testing whether the noncentrality parameter of a x^2 distribution is zero

    Presence / Absence Marker Discovery in RAD Markers for Multiplexed Samples in the Context of Next-Generation Sequencing

    Get PDF
    Recent improvements in sequencing technologies have caused various interesting problems to arouse. Having millions of read sequences as the final product of sequencing genome at a lower cost compared to micro array era, has encouraged scientists to enhance previous methods in various areas of bioinformatics. Genotyping and generating genetic maps to study inherited genotypes in order to analyze specific traits in a population is one of the fields of bioinformatics that involves generating different genetic markers and identify polymorphisms in different individuals of a population. Presence/absence markers are the main focus of this thesis. This is one type of Restriction site Associate DNA (RAD) markers which is present in some samples and absent in others and is the sign of variation in the cut site of a restriction enzyme. However, the counts of markers in an experiment are highly correlated and calling true absence and presence is not a straightforward task which means any marker with zero count is not necessarily absent in the sample under study. This is also the case for non-zero count markers which are not necessarily present. A good model that can fit the data is able to make true calls. We propose two different contexts for designing such models as a solution to this problem and investigate their performance. On the other hand, utilizing features of next generation sequencing technology in an even more efficient way, requires the ability to multiplex high number of samples in a single experiment run. In that case, appropriate barcoding, that is robust to various sources of noise in the machine, becomes paramount. Designing such barcodes in an efficient way is a challenging task which is addressed in detail as another problem of this thesis. We make two contributions. One, we propose an algorithm for barcoding multiplexed RADSeq samples. Two, we propose an algorithm for the statistical selection of presence/absence markers on the basis of RADSeq data on two related individuals. Operating characteristics of our methods are explored using both simulated and real data

    The dual glucose-dependent insulinotropic peptide and glucagon-like peptide-1 receptor agonist, tirzepatide, improves lipoprotein biomarkers associated with insulin resistance and cardiovascular risk in patients with type 2 diabetes

    Get PDF
    Aim To better understand the marked decrease in serum triglycerides observed with tirzepatide in patients with type 2 diabetes, additional lipoprotein-related biomarkers were measured post hoc in available samples from the same study. Materials and Methods Patients were randomized to receive once-weekly subcutaneous tirzepatide (1, 5, 10 or 15 mg), dulaglutide (1.5 mg) or placebo. Serum lipoprotein profile, apolipoprotein (apo) A-I, B and C-III and preheparin lipoprotein lipase (LPL) were measured at baseline and at 4, 12 and 26 weeks. Lipoprotein particle profile by nuclear magnetic resonance was assessed at baseline and 26 weeks. The lipoprotein insulin resistance (LPIR) score was calculated. Results At 26 weeks, tirzepatide dose-dependently decreased apoB and apoC-III levels, and increased serum preheparin LPL compared with placebo. Tirzepatide 10 and 15 mg decreased large triglyceride-rich lipoprotein particles (TRLP), small low-density lipoprotein particles (LDLP) and LPIR score compared with both placebo and dulaglutide. Treatment with dulaglutide also reduced apoB and apoC-III levels but had no effect on either serum LPL or large TRLP, small LDLP and LPIR score. The number of total LDLP was also decreased with tirzepatide 10 and 15 mg compared with placebo. A greater reduction in apoC-III with tirzepatide was observed in patients with high compared with normal baseline triglycerides. At 26 weeks, change in apoC-III, but not body weight, was the best predictor of changes in triglycerides with tirzepatide, explaining up to 22.9% of their variability. Conclusions Tirzepatide treatment dose-dependently decreased levels of apoC-III and apoB and the number of large TRLP and small LDLP, suggesting a net improvement in atherogenic lipoprotein profile.Peer reviewe

    Identification and Analysis of the First 2009 Pandemic H1N1 Influenza Virus from U.S. Feral Swine

    Get PDF
    The first case of pandemic H1N1 influenza (pH1N1) virus in feral swine in the United States was identified in Texas through the United States Department of Agriculture (USDA) Wildlife Services’ surveillance program. Two samples were identified as pandemic influenza by reverse transcriptase quantitative PCR (RT-qPCR). Full-genome Sanger sequencing of all eight influenza segments was performed. In addition, Illumina deep sequencing of the original diagnostic samples and their respective virus isolation cultures were performed to assess the feasibility of using an unbiased whole-genome linear target amplification method and multiple sample sequencing in a single Illumina GAIIx lane. Identical sequences were obtained using both techniques. Phylogenetic analysis indicated that all gene segments belonged to the pH1N1 (2009) lineage. In conclusion, we have identified the first pH1N1 isolate in feral swine in the United States and have demonstrated the use of an easy unbiased linear amplification method for deep sequencing of multiple samples
    corecore