3,700 research outputs found
Bayesian analysis of calving ease scores and birth weights
International audienc
Recommended from our members
Identifying Examinees Who Possess Distinct and Reliable Subscores When Added Value is Lacking for the Total Sample
Research has demonstrated that although subdomain information may provide no added value beyond the total score, in some contexts such information is of utility to particular demographic subgroups (Sinharay & Haberman, 2014). However, it is argued that the utility of reporting subscores for an individual should not be based on one’s manifest characteristics (e.g., gender or ethnicity), but rather on individual needs for diagnostic information, which is driven by multidimensionality in subdomain scores. To improve the validity of diagnostic information, this study proposed the use of Mahalanobis Distance and HT indices to assess whether an individual’s data significantly departs from unidimensionality. Those examinees that were found to differ significantly were then assessed separately for subscore added value via Haberman’s (2008) procedure. To this end, simulation analyses were conducted to evaluate Type I error, power, and recovery of subscore added value classifications for various levels of subdomain test lengths, subdomain inter-correlations, and proportions of multidimensionality in the total sample. Results demonstrated that the HT index possessed around 100% power across all conditions, while maintaining Type I error below 5%, which led to nearly perfect recovery of subscore added value classifications. In contrast, the power rates for Mahalanobis Distance were much lower ranging from 13% to 61% with Type I errors maintained at the nominal level of 5%. Although the power rates were below the desired criterion of 80%, the cases identified as aberrant using this method were found to have greater variability between subdomain scores, increased reliability, and lower observed subdomain correlations when compared to the generated data. As a result, outlier cases were found to have subscore added value for nearly 100% of cases across conditions even when the generated multidimensional data did not possess subscore added value. These results were cross-validated using a large-scale high-stakes test in which the Mahalanobis Distance measure was found to identify 6.57% of 8,803 test-takers that possessed subscores with added-value who otherwise would have been masked by the unidimensionality of the total sample. Overall, this study suggests that the Mahalanobis Distance measure shows some promise in identifying examinees with multidimensional score profiles
Recommended from our members
Identifying Examinees Who Possess Distinct and Reliable Subscores When Added Value is Lacking for the Total Sample
Research has demonstrated that although subdomain information may provide no added value beyond the total score, in some contexts such information is of utility to particular demographic subgroups (Sinharay & Haberman, 2014). However, it is argued that the utility of reporting subscores for an individual should not be based on one’s manifest characteristics (e.g., gender or ethnicity), but rather on individual needs for diagnostic information, which is driven by multidimensionality in subdomain scores. To improve the validity of diagnostic information, this study proposed the use of Mahalanobis Distance and HT indices to assess whether an individual’s data significantly departs from unidimensionality. Those examinees that were found to differ significantly were then assessed separately for subscore added value via Haberman’s (2008) procedure. To this end, simulation analyses were conducted to evaluate Type I error, power, and recovery of subscore added value classifications for various levels of subdomain test lengths, subdomain inter-correlations, and proportions of multidimensionality in the total sample. Results demonstrated that the HT index possessed around 100% power across all conditions, while maintaining Type I error below 5%, which led to nearly perfect recovery of subscore added value classifications. In contrast, the power rates for Mahalanobis Distance were much lower ranging from 13% to 61% with Type I errors maintained at the nominal level of 5%. Although the power rates were below the desired criterion of 80%, the cases identified as aberrant using this method were found to have greater variability between subdomain scores, increased reliability, and lower observed subdomain correlations when compared to the generated data. As a result, outlier cases were found to have subscore added value for nearly 100% of cases across conditions even when the generated multidimensional data did not possess subscore added value. These results were cross-validated using a large-scale high-stakes test in which the Mahalanobis Distance measure was found to identify 6.57% of 8,803 test-takers that possessed subscores with added-value who otherwise would have been masked by the unidimensionality of the total sample. Overall, this study suggests that the Mahalanobis Distance measure shows some promise in identifying examinees with multidimensional score profiles
Variational Bayesian multinomial probit regression with Gaussian process priors
It is well known in the statistics literature that augmenting binary and polychotomous response models with Gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression coefficients in favour of Gaussian Process (GP) priors over functions, and employing variational approximations to the full posterior we obtain efficient computational methods for Gaussian Process classification in the multi-class setting. The model augmentation with additional latent variables ensures full a posteriori class coupling whilst retaining the simple a priori independent GP covariance structure from which sparse approximations, such as multi-class Informative Vector Machines (IVM), emerge in a very natural and straightforward manner. This is the first time that a fully Variational Bayesian treatment for multi-class GP classification has been developed without having to resort to additional explicit approximations to the non-Gaussian likelihood term. Empirical comparisons with exact analysis via MCMC and Laplace approximations illustrate the utility of the variational approximation as a computationally economic alternative to full MCMC and it is shown to be more accurate than the Laplace approximation
Discriminative Representations for Heterogeneous Images and Multimodal Data
Histology images of tumor tissue are an important diagnostic and prognostic tool for pathologists. Recently developed molecular methods group tumors into subtypes to further guide treatment decisions, but they are not routinely performed on all patients. A lower cost and repeatable method to predict tumor subtypes from histology could bring benefits to more cancer patients. Further, combining imaging and genomic data types provides a more complete view of the tumor and may improve prognostication and treatment decisions. While molecular and genomic methods capture the state of a small sample of tumor, histological image analysis provides a spatial view and can identify multiple subtypes in a single tumor. This intra-tumor heterogeneity has yet to be fully understood and its quantification may lead to future insights into tumor progression. In this work, I develop methods to learn appropriate features directly from images using dictionary learning or deep learning. I use multiple instance learning to account for intra-tumor variations in subtype during training, improving subtype predictions and providing insights into tumor heterogeneity. I also integrate image and genomic features to learn a projection to a shared space that is also discriminative. This method can be used for cross-modal classification or to improve predictions from images by also learning from genomic data during training, even if only image data is available at test time.Doctor of Philosoph
Nonparametric Bayes Modeling of Populations of Networks
Replicated network data are increasingly available in many research fields.
In connectomic applications, inter-connections among brain regions are
collected for each patient under study, motivating statistical models which can
flexibly characterize the probabilistic generative mechanism underlying these
network-valued data. Available models for a single network are not designed
specifically for inference on the entire probability mass function of a
network-valued random variable and therefore lack flexibility in characterizing
the distribution of relevant topological structures. We propose a flexible
Bayesian nonparametric approach for modeling the population distribution of
network-valued data. The joint distribution of the edges is defined via a
mixture model which reduces dimensionality and efficiently incorporates network
information within each mixture component by leveraging latent space
representations. The formulation leads to an efficient Gibbs sampler and
provides simple and coherent strategies for inference and goodness-of-fit
assessments. We provide theoretical results on the flexibility of our model and
illustrate improved performance --- compared to state-of-the-art models --- in
simulations and application to human brain networks
- …