Search CORE

52 research outputs found

A Bayesian Approach to Estimating the Long Memory Parameter

Author: Chakraborty Sounak
Holan Scott
McElroy Tucker
Publication venue: Bayesian Analysis
Publication date: 01/01/2009
Field of study

DOI:10.1214/09-BA406We develop a Bayesian procedure for analyzing stationary long-range dependent processes. Specifically, we consider the fractional exponential model (FEXP) to estimate the memory parameter of a stationary long-memory Gaussian time series. In particular, we propose a hierarchical Bayesian model and make it fully adaptive by imposing a prior distribution on the model order. Further, we describe a reversible jump Markov chain Monte Carlo algorithm for variable dimension estimation and show that, in our context, the algorithm provides a reasonable method of model selection (within each repetition of the chain). Therefore, through an application of Bayesian model averaging, we incorporate all possible models from the FEXP class (up to a given finite order). As a result we reduce the underestimation of uncertainty at the model-selection stage as well as achieve better estimates of the long memory parameter. Additionally, we establish Bayesian consistency of the memory parameter under mild conditions on the data process. Finally, through simulation and the analysis of two data sets, we demonstrate the effectiveness of our approach

University of Missouri: MOspace

Impact of Stand Your Ground, Background Checks and Conceal and Carry Laws on Homicide Rates in the U.S

Author: Charles E. Menifield
Ranadeep Daw
Sounak Chakraborty
Publication venue: Digital Scholarship @ Texas Southern University
Publication date: 19/09/2022
Field of study

In recent years, the number of gun related killings appear to be on the rise. In fact, data show that gun related murders rose 32% between 2014 and 2017 (Gramlich 2019). While the second amendment to the U.S. Constitution allows citizens to bear weapons, many states have passed additional laws regulating the industry. These include restrictive and prohibitive laws. The goal of this paper is to assess the impact of changes in hand gun related legislation on firearm homicide rates in the United States for the period 1999-2015. More specifically, we focus on the impact of stand your ground, right to carry and background checks laws and how they impact changes in homicide rates. Using a unique data set, we created a change point model and used regression models to show that changes to handgun laws do in fact impact homicide rates in many state

Texas Southern University, School of Public Affairs: Digital Scholarship

Bayesian nonlinear regression for large p small n problems

Author: Chakraborty Sounak
Ghosh Malay
Mallick Bani K.
Publication venue: Elsevier Inc.
Publication date
Field of study

AbstractStatistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik’s ϵ-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models

Elsevier - Publisher Connector

ComPhy: Prokaryotic Composite Distance Phylogenies Inferred from Whole-Genome Gene Sets

Author: Cai Zhipeng
Chakraborty Sounak
Lin Guan Ning, 1978-
Lin Guohui
Xu Dong, 1965-
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

doi:10.1186/1471-2105-10-S1-S5With the increasing availability of whole genome sequences, it is becoming more and more important to use complete genome sequences for inferring species phylogenies. We developed a new tool ComPhy, 'Composite Distance Phylogeny', based on a composite distance matrix calculated from the comparison of complete gene sets between genome pairs to produce a prokaryotic phylogeny. The composite distance between two genomes is defined by three components: Gene Dispersion Distance (GDD), Genome Breakpoint Distance (GBD) and Gene Content Distance (GCD). GDD quantifies the dispersion of orthologous genes along the genomic coordinates from one genome to another; GBD measures the shared breakpoints between two genomes; GCD measures the level of shared orthologs between two genomes. The phylogenetic tree is constructed from the composite distance matrix using a neighbor joining method. We tested our method on 9 datasets from 398 completely sequenced prokaryotic genomes. We have achieved above 90% agreement in quartet topologies between the tree created by our method and the tree from the Bergey's taxonomy. In comparison to several other phylogenetic analysis methods, our method showed consistently better performance. ComPhy is a fast and robust tool for genome-wide inference of evolutionary relationship among genomes."This work was supported in part by NSF/ITR-IIS-0407204.

University of Missouri: MOspace

PubMed Central

Gene Expression-Based Glioma Classification Using Hierarchical Bayesian Vector Machines

Author: Chakraborty Sounak
Dougherty Edward
Ghosh Debashis
Ghosh Malay
Mallick Bani K., 1965-
Publication venue: Indian Statistical Institute
Publication date: 01/01/2007
Field of study

This paper considers several Bayesian classification methods for the analysis of the glioma cancer with microarray data based on reproducing kernel Hilbert space under the multiclass setup. We consider the multinomial logit likelihood as well as the likelihood related to the multiclass Support Vector Machine (SVM) model. It is shown that our proposed Bayesian classification models with multiple shrinkage parameters can produce more accurate classification scheme for the glioma cancer compared to several existing classical methods. We have also proposed a Bayesian variable selection scheme for selecting the differentially expressed genes integrated with our model. This integrated approach improves classifier design by yielding simultaneous gene selection

CiteSeerX

University of Missouri: MOspace

Predicting disease risks from highly imbalanced data using random forest

Author: AP Bradley
C Chen
D Palmer
DA Davis
DH Mantzaris
E Cohen
F Provost
HCUP Project
J Mingers
JR Quinlan
L Breiman
L Breiman
L Breiman
M Skubic
Mihail Popescu
Mohammed Khalilia
N Japkowicz
P Hebert
Sounak Chakraborty
ST Moturu
T Hastie
T Yi
V Fuster
W Yu
W Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare. Methods We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases. Results We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process. Conclusions In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central