89 research outputs found
Segmentation and Dimension Reduction: Exploratory and Model-Based Approaches
Representing the information in a data set in a concise way is an important part of data analysis. A variety of multivariate statistical techniques have been developed for this purpose, such as k-means clustering and principal components analysis. These techniques are often based on the principles of segmentation (partitioning the observations into distinct groups) and dimension reduction (constructing a low-dimensional representation of a data set). However, such techniques typically make no statistical assumptions on the process that generates the data; as a result, the statistical significance of the results is often unknown.
In this thesis, we incorporate the modeling principles of segmentation and dimension reduction into statistical models. We thus develop new models that can summarize and explain the information in a data set in a simple way. The focus is on dimension reduction using bilinear parameter structures and techniques for clustering both modes of a two-mode data matrix. To illustrate the usefulness of the techniques, the thesis includes a variety of empirical applications in marketing, psychometrics, and political science. An important application is modeling the response behavior in surveys with rating scales, which provides novel insight into what kinds of response styles exist, and how substantive opinions vary among respondents. We find that our modeling approaches yield new techniques for data analysis that can be useful in a variety of applied fields
A Bayesian approach to two-mode clustering
We develop a new Bayesian
approach to estimate the parameters of a latent-class model for the joint clustering of both modes of two-mode data matrices. Posterior results are obtained using a Gibbs sampler with data augmentation.
Our Bayesian approach has three advantages over existing methods.
First, we are able to do statistical inference on the model parameters, which would not be possible using frequentist estimation procedures. In addition, the Bayesian approach allows us to provide statistical criteria for determining the optimal numbers of clusters. Finally, our Gibbs sampler has fewer problems with local optima in the likelihood function and empty classes than the EM algorithm used in a frequentist approach. We apply the Bayesian estimation method of the latent-class two-mode clustering model to two empirical data sets. The first data set is the Supreme Court voting data set of Doreian, Batagelj, and Ferligoj (2004). The second data set comprises the roll call votes of the United States House
Identifying Unknown Response Styles: A Latent-Class Bilinear Multinomial Logit Model
Respondents can vary significantly in the way they use rating scales. Specifically, respondents can exhibit varying degrees of response style, which threatens the validity of the responses. The purpose of this article is to investigate to what extent rating scale responses show response style and substantive content of the item. The authors develop a novel model that accounts for possibly unknown kinds of response styles, content of the items, and background characteristics of respondents. By imposing a bilinear structure on the parameters of a multinomial logit model, the authors can visually distinguish the effects on the response behavior of both the characteristics of a respondent and the content of the item. This approach is combined with finite mixture modeling, so that two separate segmentations of the respondents are obtained: one for response style and one for item content. This latent-class bilinear multinomial logit (LC-BML) model is applied to a cross-national data set. The results show that item content is highly influential in explaining response behavior and reveal the presence of several response styles, including the prominent response styles acquiescence and extreme response style
Fuzzy clustering with Minkowski distance
Distances in the well known fuzzy c-means algorithm of Bezdek (1973) are measured by the squared Euclidean distance.
Other distances have been used as well in fuzzy clustering. For example, Jajuga (1991) proposed to use the L_1-distance and Bobrowski and Bezdek (1991) also used the L_infty-distance. For the more general case of Minkowski distance and the case of using a root of the squared Minkowski distance, Groenen and Jajuga (2001) introduced a majorization algorithm to minimize the error. One of the advantages of iterative majorization is that it is a guaranteed descent algorithm, so that every iteration reduces the error until convergence is reached.
However, their algorithm was limited to the case of Minkowski parameter between 1 and 2, that is, between the L_1-distance and the Euclidean distance. Here, we extend their majorization algorithm to any Minkowski distance with Minkowski parameter greater than (or equal to) 1. This extension also includes the case of the L_infty-distance. We also investigate how well this algorithm performs and present an empirical application
Optimal Scaling of Interaction Effects in Generalized Linear Models
Multiplicative interaction models, such as Goodman's RC(M) association models, can be a useful tool for analyzing the content of interaction effects. However, most models for interaction effects are only suitable for data sets with two or three predictor variables. Here, we discuss an optimal scaling model for analyzing the content of interaction effects in generalized linear models with any number of categorical predictor variables. This model, which we call the optimal scaling of interactions (OSI) model, is a parsimonious, one-dimensional multiplicative interaction model. We discuss how the model can be used to visually interpret the interaction effects. Two empirical data sets are used to show how the results of the model can
Dealing with Time in Health Economic Evaluation: Methodological Issues and Recommendations for Practice
Time is an important aspect of health economic evaluation, as the timing and duration of clinical events, healthcare interventions and their consequences all affect estimated costs and effects. These issues should be reflected in the design of health economic models. This article considers three important aspects of time in modelling: (1) which cohorts to simulate and how far into the future to extend the analysis; (2) the simulation of time, including the difference between discrete-time and continuous-time models, cycle lengths, and converting rates and probabilities; and (3) discounting future costs and effects to their present values. We provide a methodological overview of these issues and make recommendations to help inform both the conduct of cost-effectiveness analyses and the interpretation of their results. For choosing which cohorts to simulate and how many, we suggest analysts carefully assess potential reasons for variation in cost effectiveness between cohorts and the feasibility of subgroup-specific recommendations. For the simulation of time, we recommend using short cycles or continuous-time models to avoid biases and the need for half-cycle corrections, and provide advic
Global Optimization strategies for two-mode clustering
Two-mode clustering is a relatively new form of clustering that clusters both rows and
columns of a data matrix. To do so, a criterion similar to k-means is optimized. However,
it is still unclear which optimization method should be used to perform two-mode
clustering, as va
Expanded Access as a source of real-world data: An overview of FDA and EMA approvals
Aims: To identify, characterize and compare all Food and Drug Administration (FDA) and European Medicines Agency (EMA) approvals that included real-world data on efficacy from expanded access (EA) programmes. Methods: Cross-sectional study of FDA (1955–2018) and EMA (1995–2018) regulatory approval documentation. We automated searching for terms related to EA in 22,506 documents using machine learning techniques. We included all approvals where EA terms appeared in the regulatory documentation. Our main outcome was the inclusion of EA data as evidence of clinical efficacy. Characterization was based on approval date, disease area, orphan designation and whether the evidence was supportive or pivotal. Results: EA terms appeared in 693 out of 22,506 (3.1%) documents, which referenced 187 approvals. For 39 approvals, data from EA programmes were used to inform on clinical efficacy. The yearly number of approvals with EA data increased from 1.25 for 1993–2013 to 4.6 from 2014–2018. In 13 cases, these programmes formed the main evidence for approval. Of these, patients in EA programmes formed over half (median 71%, interquartile range: 34–100) of the total patient population available for efficacy evaluation. Almost all (12/13) approvals were granted orphan designation. In 8/13, there were differences between regulators in approval status and valuation of evidence. Strikingly, 4 treatments were granted approval based solely on efficacy from EA. Conclusion: Sponsors and regulators increasingly include real-world data from EA programmes in the efficacy profile of a treatment. The indications of the approved treatments are characterized by orphan designation and high unmet medical need
Including historical data in the analysis of clinical trials
Data of previous trials with a similar setting are often available in the analysis of clinical trials. Several Bayesian methods have been proposed for including historical data as prior information in the analysis of the current trial, such as the (modified) power prior, the (robust) meta-analytic-predictive prior, the commensurate prior and methods proposed by Pocock and Murray et al. We compared these methods and illustrated their use in a practical setting, including an assessment of the comparability of the current and the historical data. The motivating data set consists of randomised controlled trials for acute myeloid leukaemia. A simulation study was used to compare the methods in terms of bias, precision, power and type I error rate. Methods that estimate parameters for the between-trial heterogeneity generally offer the best trade-off of power, precision and type I error, with the meta-analytic-predictive prior being the most promising method. The results show that it can be feasible to include historical data in the analysis of clinical trials, if an appropriate method is used to estimate the heterogeneity between trials, and the historical data satisfy criteria for comparability
Parturition mode recommendation and symptoms of pelvic floor disorders after obstetric anal sphincter injuries
Introduction and hypothesis: Our primary objective was to evaluate parturition mode (PM) recommendations following obstetric anal sphincter injuries (OASIs) and adherence to these recommendations and to evaluate recurrence of OASIs in women who had a subsequent vaginal delivery (VD). The hypothesis was that adherence to the PM recommendations leads to a reasonable OASI recurrence rate. Methods: This was a retrospective observational cohort study of patients with previous OASIs between 2010 and 2016. After an outpatient visit including 3D transperineal ultrasound to screen for pelvic floor and anal sphincter injuries, all patients received recommendations for a subsequent PM. Patients were invited to complete validated questionnaires 2 to 5Â years post-OASIs. Results: The majority of invited patients (265/320) attended follow-up, with 264 receiving a recommendation for PM. Only 5.6% did not adhere to the received recommendation. One hundred sixty-one patients delivered again, 58% had a VD, and 42% had a cesarean section (CS). Recurrence of OASIs was observed in 4.3% of the patients that had a VD. Fecal incontinence occurred in 4.9%, however any form of anal incontinence in 48% of patients. While dyspareunia was common in patients with residual external anal sphincter (EAS) injuries and levator ani muscle (LAM) avulsions, anal pain occurred more frequently in EAS injuries and fecal incontinence in LAM avulsions. Conclusions: This study showed that the vast majority of patients followed PM recommendations, and this resulted in a low recurrence of OASIs with a high CS rate. Fecal incontinence after OASIs was correlated with the degree of OASIs
- …