639 research outputs found

    R for Marketing Research and Analytics

    Get PDF

    Discussion on Fifty Years of Classification and Regression Trees

    Get PDF
    In this discussion paper, we argue that the literature on tree algorithms is very fragmented. We identify possible causes and discuss good and bad sides of this situation. Among the latter is the lack of free open-source implementations for many algorithms. We argue that if the community adopts a standard of creating and sharing free open-source implementations for their developed algorithms and creates easy access to these programs the bad sides of the fragmentation will be actively combated and will benefit the whole scientific community. (authors' abstract

    COPS: Cluster optimized proximity scaling

    Get PDF
    Proximity scaling (i.e., multidimensional scaling and related methods) is a versatile statistical method whose general idea is to reduce the multivariate complexity in a data set by employing suitable proximities between the data points and finding low-dimensional configurations where the fitted distances optimally approximate these proximities. The ultimate goal, however, is often not only to find the optimal configuration but to infer statements about the similarity of objects in the high-dimensional space based on the the similarity in the configuration. Since these two goals are somewhat at odds it can happen that the resulting optimal configuration makes inferring similarities rather difficult. In that case the solution lacks "clusteredness" in the configuration (which we call "c-clusteredness"). We present a version of proximity scaling, coined cluster optimized proximity scaling (COPS), which solves the conundrum by introducing a more clustered appearance into the configuration while adhering to the general idea of multidimensional scaling. In COPS, an arbitrary MDS loss function is parametrized by monotonic transformations and combined with an index that quantifies the c-clusteredness of the solution. This index, the OPTICS cordillera, has intuitively appealing properties with respect to measuring c-clusteredness. This combination of MDS loss and index is called "cluster optimized loss" (coploss) and is minimized to push any configuration towards a more clustered appearance. The effect of the method will be illustrated with various examples: Assessing similarities of countries based on the history of banking crises in the last 200 years, scaling Californian counties with respect to the projected effects of climate change and their social vulnerability, and preprocessing a data set of hand written digits for subsequent classification by nonlinear dimension reduction. (authors' abstract)Series: Discussion Paper Series / Center for Empirical Research Method

    Recursive Partitioning of Models of a Generalized Linear Model Type

    Get PDF
    This thesis is concerned with recursive partitioning of models of a generalized linear model type (GLM-type), i.e., maximum likelihood models with a linear predictor for the linked mean, a topic that has received constant interest over the last twenty years. The resulting tree (a ''model tree'') can be seen as an extension of classic trees, to allow for a GLM-type model in the partitions. In this work, the focus lies on applied and computational aspects of model trees with GLM-type node models to work out different areas where application of the combination of parametric models and trees will be beneficial and to build a computational scaffold for future application of model trees. In the first part, model trees are defined and some algorithms for fitting model trees with GLM-type node model are reviewed and compared in terms of their properties of tree induction and node model fitting. Additionally, the design of a particularly versatile algorithm, the MOB algorithm (Zeileis et al. 2008) in R is described and an in-depth discussion of how the functionality offered can be extended to various GLM-type models is provided. This is highlighted by an example of using partitioned negative binomial models for investigating the effect of health care incentives. Part 2 consists of three research articles where model trees are applied to different problems that frequently occur in the social sciences. The first uses trees with GLM-type node models and applies it to a data set of voters, who show a non-monotone relationship between the frequency of attending past elections and the turnout in 2004. Three different type of model tree algorithms are used to investigate this phenomenon and for two the resulting trees can explain the counter-intuitive finding. Here model tress are used to learn a nonlinear relationship between a target model and a big number of candidate variables to provide more insight into a data set. A second application area is also discussed, namely using model trees to detect ill-fitting subsets in the data. The second article uses model trees to model the number of fatalities in Afghanistan war, based on the WikiLeaks Afghanistan war diary. Data pre-processing with a topic model generates predictors that are used as explanatory variables in a model tree for overdispersed count data. Here the combination of model trees and topic models allows to flexibly analyse database data, frequently encountered in data journalism, and provides a coherent description of fatalities in the Afghanistan war. The third paper uses a new framework built around model trees to approach the classic problem of segmentation, frequently encountered in marketing and management science. Here, the framework is used for segmentation of a sample of the US electorate for identifying likely and unlikely voters. It is shown that the framework's model trees enable accurate identification which in turn allows efficient targeted mobilisation of eligible voters. (author's abstract

    Developing and Measuring IS Scales Using Item Response Theory

    Get PDF
    Information Systems (IS) research frequently uses survey data to measure the interplay between technological systems and human beings. Researchers have developed sophisticated procedures to build and validate multi-item scales that measure latent constructs. Most studies use classical test theory (CTT), which suffers from several theoretical shortcomings. We discuss these problems and present item response theory (IRT) as a viable alternative. Subsequently, we use the CTT approach as well as Rasch models (a class of restrictive IRT models) to develop a scale for measuring the hedonic aspects of websites. The results illustrate how IRT can not only be successfully applied in IS research but also provide improved results over CTT approaches

    Breaking Free from the Limitations of Classical Test Theory: Developing and Measuring Information Systems Scales Using Item Response Theory

    Get PDF
    Information systems (IS) research frequently uses survey data to measure the interplay between technological systems and human beings. Researchers have developed sophisticated procedures to build and validate multi-item scales that measure latent constructs. The vast majority of IS studies uses classical test theory (CTT), but this approach suffers from three major theoretical shortcomings: (1) it assumes a linear relationship between the latent variable and observed scores, which rarely represents the empirical reality of behavioral constructs; (2) the true score can either not be estimated directly or only by making assumptions that are difficult to be met; and (3) parameters such as reliability, discrimination, location, or factor loadings depend on the sample being used. To address these issues, we present item response theory (IRT) as a collection of viable alternatives for measuring continuous latent variables by means of categorical indicators (i.e., measurement variables). IRT offers several advantages: (1) it assumes nonlinear relationships; (2) it allows more appropriate estimation of the true score; (3) it can estimate item parameters independently of the sample being used; (4) it allows the researcher to select items that are in accordance with a desired model; and (5) it applies and generalizes concepts such as reliability and internal consistency, and thus allows researchers to derive more information about the measurement process. We use a CTT approach as well as Rasch models (a special class of IRT models) to demonstrate how a scale for measuring hedonic aspects of websites is developed under both approaches. The results illustrate how IRT can be successfully applied in IS research and provide better scale results than CTT. We conclude by explaining the most appropriate circumstances for applying IRT, as well as the limitations of IRT

    Linking cause assessment, corporate philanthropy, and corporate reputation

    Get PDF
    This study analyzes the link between cause assessment, corporate philanthropy, and dimensions of corporate reputation from different stakeholders' perspectives, using balance theory as a conceptual framework and the telecommunications industry in Austria and Egypt as the empirical setting. Findings show that corporate philanthropy can improve perceptions of the corporate reputation dimensions, but the results vary between customers and non-customers and depend on the country setting

    Community Structure and Functional Gene Profile of Bacteria on Healthy and Diseased Thalli of the Red Seaweed Delisea pulchra

    Get PDF
    Disease is increasingly viewed as a major factor in the ecology of marine communities and its impact appears to be increasing with environmental change, such as global warming. The temperate macroalga Delisea pulchra bleaches in Southeast Australia during warm summer periods, a phenomenon which previous studies have indicated is caused by a temperature induced bacterial disease. In order to better understand the ecology of this disease, the bacterial communities associated with threes type of samples was investigated using 16S rRNA gene and environmental shotgun sequencing: 1) unbleached (healthy) D. pulchra 2) bleached parts of D. pulchra and 3) apparently healthy tissue adjacent to bleached regions. Phylogenetic differences between healthy and bleached communities mainly reflected relative changes in the taxa Colwelliaceae, Rhodobacteraceae, Thalassomonas and Parvularcula. Comparative metagenomics showed clear difference in the communities of healthy and diseased D. pulchra as reflected by changes in functions associated with transcriptional regulation, cation/multidrug efflux and non-ribosomal peptide synthesis. Importantly, the phylogenetic and functional composition of apparently healthy tissue adjacent to bleached sections of the thalli indicated that changes in the microbial communities already occur in the absence of visible tissue damage. This shift in unbleached sections might be due to the decrease in furanones, algal metabolites which are antagonists of bacterial quorum sensing. This study reveals the complex shift in the community composition associated with bleaching of Delisea pulchra and together with previous studies is consistent with a model in which elevated temperatures reduce levels of chemical defenses in stressed thalli, leading to colonization or proliferation by opportunistic pathogens or scavengers
    • …
    corecore