52 research outputs found

    A Bayesian Approach to Estimating the Long Memory Parameter

    Get PDF
    DOI:10.1214/09-BA406We develop a Bayesian procedure for analyzing stationary long-range dependent processes. Specifically, we consider the fractional exponential model (FEXP) to estimate the memory parameter of a stationary long-memory Gaussian time series. In particular, we propose a hierarchical Bayesian model and make it fully adaptive by imposing a prior distribution on the model order. Further, we describe a reversible jump Markov chain Monte Carlo algorithm for variable dimension estimation and show that, in our context, the algorithm provides a reasonable method of model selection (within each repetition of the chain). Therefore, through an application of Bayesian model averaging, we incorporate all possible models from the FEXP class (up to a given finite order). As a result we reduce the underestimation of uncertainty at the model-selection stage as well as achieve better estimates of the long memory parameter. Additionally, we establish Bayesian consistency of the memory parameter under mild conditions on the data process. Finally, through simulation and the analysis of two data sets, we demonstrate the effectiveness of our approach

    Impact of Stand Your Ground, Background Checks and Conceal and Carry Laws on Homicide Rates in the U.S

    Get PDF
    In recent years, the number of gun related killings appear to be on the rise. In fact, data show that gun related murders rose 32% between 2014 and 2017 (Gramlich 2019). While the second amendment to the U.S. Constitution allows citizens to bear weapons, many states have passed additional laws regulating the industry. These include restrictive and prohibitive laws. The goal of this paper is to assess the impact of changes in hand gun related legislation on firearm homicide rates in the United States for the period 1999-2015. More specifically, we focus on the impact of stand your ground, right to carry and background checks laws and how they impact changes in homicide rates. Using a unique data set, we created a change point model and used regression models to show that changes to handgun laws do in fact impact homicide rates in many state

    Bayesian nonlinear regression for large p small n problems

    Get PDF
    AbstractStatistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik’s ϵ-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models

    ComPhy: Prokaryotic Composite Distance Phylogenies Inferred from Whole-Genome Gene Sets

    Get PDF
    doi:10.1186/1471-2105-10-S1-S5With the increasing availability of whole genome sequences, it is becoming more and more important to use complete genome sequences for inferring species phylogenies. We developed a new tool ComPhy, 'Composite Distance Phylogeny', based on a composite distance matrix calculated from the comparison of complete gene sets between genome pairs to produce a prokaryotic phylogeny. The composite distance between two genomes is defined by three components: Gene Dispersion Distance (GDD), Genome Breakpoint Distance (GBD) and Gene Content Distance (GCD). GDD quantifies the dispersion of orthologous genes along the genomic coordinates from one genome to another; GBD measures the shared breakpoints between two genomes; GCD measures the level of shared orthologs between two genomes. The phylogenetic tree is constructed from the composite distance matrix using a neighbor joining method. We tested our method on 9 datasets from 398 completely sequenced prokaryotic genomes. We have achieved above 90% agreement in quartet topologies between the tree created by our method and the tree from the Bergey's taxonomy. In comparison to several other phylogenetic analysis methods, our method showed consistently better performance. ComPhy is a fast and robust tool for genome-wide inference of evolutionary relationship among genomes."This work was supported in part by NSF/ITR-IIS-0407204.

    Gene Expression-Based Glioma Classification Using Hierarchical Bayesian Vector Machines

    Get PDF
    This paper considers several Bayesian classification methods for the analysis of the glioma cancer with microarray data based on reproducing kernel Hilbert space under the multiclass setup. We consider the multinomial logit likelihood as well as the likelihood related to the multiclass Support Vector Machine (SVM) model. It is shown that our proposed Bayesian classification models with multiple shrinkage parameters can produce more accurate classification scheme for the glioma cancer compared to several existing classical methods. We have also proposed a Bayesian variable selection scheme for selecting the differentially expressed genes integrated with our model. This integrated approach improves classifier design by yielding simultaneous gene selection

    Predicting disease risks from highly imbalanced data using random forest

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare.</p> <p>Methods</p> <p>We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases.</p> <p>Results</p> <p>We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process.</p> <p>Conclusions</p> <p>In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.</p
    • …
    corecore