38,703 research outputs found

    Item Response Modeling of Multivariate Count Data With Zero Inflation, Maximum Inflation, and Heaping

    Get PDF
    Questionnaires that include items eliciting count responses are becoming increasingly common in psychology. This study proposes methodological techniques to overcome some of the challenges associated with analyzing multivariate item response data that exhibit zero inflation, maximum inflation, and heaping at preferred digits. The modeling framework combines approaches from three literatures: item response theory (IRT) models for multivariate count data, latent variable models for heaping and extreme responding, and mixture IRT models. Data from the Behavioral Risk Factor Surveillance System are used as a motivating example. Practical implications are discussed, and recommendations are provided for researchers who may wish to use count items on questionnaires

    Learning Word Embeddings from the Portuguese Twitter Stream: A Study of some Practical Aspects

    Full text link
    This paper describes a preliminary study for producing and distributing a large-scale database of embeddings from the Portuguese Twitter stream. We start by experimenting with a relatively small sample and focusing on three challenges: volume of training data, vocabulary size and intrinsic evaluation metrics. Using a single GPU, we were able to scale up vocabulary size from 2048 words embedded and 500K training examples to 32768 words over 10M training examples while keeping a stable validation loss and approximately linear trend on training time per epoch. We also observed that using less than 50\% of the available training examples for each vocabulary size might result in overfitting. Results on intrinsic evaluation show promising performance for a vocabulary size of 32768 words. Nevertheless, intrinsic evaluation metrics suffer from over-sensitivity to their corresponding cosine similarity thresholds, indicating that a wider range of metrics need to be developed to track progress

    Reasoning about Independence in Probabilistic Models of Relational Data

    Full text link
    We extend the theory of d-separation to cases in which data instances are not independent and identically distributed. We show that applying the rules of d-separation directly to the structure of probabilistic models of relational data inaccurately infers conditional independence. We introduce relational d-separation, a theory for deriving conditional independence facts from relational models. We provide a new representation, the abstract ground graph, that enables a sound, complete, and computationally efficient method for answering d-separation queries about relational models, and we present empirical results that demonstrate effectiveness.Comment: 61 pages, substantial revisions to formalisms, theory, and related wor

    TECHNOLOGIES AND LOCALIZED TECHNICAL CHANGE

    Get PDF
    Heterogenous Technologies, Transformation Function, Localized Technical Change, Production Economics, Research Methods/ Statistical Methods, Q12, O33, C35,

    Distinguishing Different Industry Technologies and Localized Technical Change

    Get PDF
    This contribution is based on the notion that different technologies are present in an industry. These different technologies result in differential “drivers” of economic performance depending on the kind of technology used by the individual firm. In a first step different technologies are empirically distinguished. Subsequently, the associated production patterns are approximated and the respective change over time is estimated. A latent class modelling approach is used to distinguish different technologies for a representative sample of E.U. dairy producers as an industry exhibiting significant structural changes and differences in production systems in the past decades. The production technology is modelled and evaluated by using the flexible functional form of a transformation function and measures of first- and second-order elasticities. We find that overall (average) measures do not well reflect individual firms’ production patterns if the technology of an industry is heterogeneous. If there is more than one type of production frontier embodied in the data, it should be recognized that different firms may exhibit very different output or input intensities and changes associated with different production systems. In particular, in the context of localized technical change, firms with different technologies can be expected to show different technical change patterns, both in terms of overall magnitudes and associated relative output and input mix changes. Assuming a homogenous technology would result in inefficient policy recommendations leading to suboptimal industry outcomes.Heterogenous Technologies, Transformation Function, Localized Technical Change, Production Economics, Q12, O33, C35,

    Latent class analysis variable selection

    Get PDF
    We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP

    Revealing quantum chaos with machine learning

    Full text link
    Understanding properties of quantum matter is an outstanding challenge in science. In this paper, we demonstrate how machine-learning methods can be successfully applied for the classification of various regimes in single-particle and many-body systems. We realize neural network algorithms that perform a classification between regular and chaotic behavior in quantum billiard models with remarkably high accuracy. We use the variational autoencoder for autosupervised classification of regular/chaotic wave functions, as well as demonstrating that variational autoencoders could be used as a tool for detection of anomalous quantum states, such as quantum scars. By taking this method further, we show that machine learning techniques allow us to pin down the transition from integrability to many-body quantum chaos in Heisenberg XXZ spin chains. For both cases, we confirm the existence of universal W shapes that characterize the transition. Our results pave the way for exploring the power of machine learning tools for revealing exotic phenomena in quantum many-body systems.Comment: 12 pages, 12 figure

    Benchmarking and Firm Heterogeneity in Electricity Distribution: A Latent Class Analysis of Germany

    Get PDF
    In January 2009 Germany introduced incentive regulation for the electricity distribution sector based on results obtained from econometric and nonparametric benchmarking analysis. One main problem for the regulator in assigning the relative efficiency scores are unobserved firm-specific factors such as network and technological differences. Comparing the efficiency of different firms usually assumes that they operate under the same production technology, thus unobserved factors might be inappropriately understood as inefficiency. To avoid this type of misspecification in regulatory practice estimation is carried out in two stages: in a first stage observations are classified into two categories according to the size of the network operators. Then separate analyses are conducted for each sub-group. This paper shows how to disentangle the heterogeneity from inefficiency in one step, using a latent class model for stochastic frontiers. As the classification is not based on a priori sample separation criteria it delivers more robust, statistical significant and testable results. Against this backround we analyze the level of technical efficiency of a sample of 200 regional and local German electricity distribution companies for a balanced panel data set (2001-2005). Testing the hypothesis if larger distributors operate under a different technology than smaller ones we assess if a single step latent class model provides new insights to the use of benchmarking approaches within the incentive regulation schemes.Stochastic frontiers, latent class model, electricity distribution, incentive regulation
    corecore