38,703 research outputs found
Item Response Modeling of Multivariate Count Data With Zero Inflation, Maximum Inflation, and Heaping
Questionnaires that include items eliciting count responses are becoming increasingly common in psychology. This study proposes methodological techniques to overcome some of the challenges associated with analyzing multivariate item response data that exhibit zero inflation, maximum inflation, and heaping at preferred digits. The modeling framework combines approaches from three literatures: item response theory (IRT) models for multivariate count data, latent variable models for heaping and extreme responding, and mixture IRT models. Data from the Behavioral Risk Factor Surveillance System are used as a motivating example. Practical implications are discussed, and recommendations are provided for researchers who may wish to use count items on questionnaires
Learning Word Embeddings from the Portuguese Twitter Stream: A Study of some Practical Aspects
This paper describes a preliminary study for producing and distributing a
large-scale database of embeddings from the Portuguese Twitter stream. We start
by experimenting with a relatively small sample and focusing on three
challenges: volume of training data, vocabulary size and intrinsic evaluation
metrics. Using a single GPU, we were able to scale up vocabulary size from 2048
words embedded and 500K training examples to 32768 words over 10M training
examples while keeping a stable validation loss and approximately linear trend
on training time per epoch. We also observed that using less than 50\% of the
available training examples for each vocabulary size might result in
overfitting. Results on intrinsic evaluation show promising performance for a
vocabulary size of 32768 words. Nevertheless, intrinsic evaluation metrics
suffer from over-sensitivity to their corresponding cosine similarity
thresholds, indicating that a wider range of metrics need to be developed to
track progress
Reasoning about Independence in Probabilistic Models of Relational Data
We extend the theory of d-separation to cases in which data instances are not
independent and identically distributed. We show that applying the rules of
d-separation directly to the structure of probabilistic models of relational
data inaccurately infers conditional independence. We introduce relational
d-separation, a theory for deriving conditional independence facts from
relational models. We provide a new representation, the abstract ground graph,
that enables a sound, complete, and computationally efficient method for
answering d-separation queries about relational models, and we present
empirical results that demonstrate effectiveness.Comment: 61 pages, substantial revisions to formalisms, theory, and related
wor
TECHNOLOGIES AND LOCALIZED TECHNICAL CHANGE
Heterogenous Technologies, Transformation Function, Localized Technical Change, Production Economics, Research Methods/ Statistical Methods, Q12, O33, C35,
Distinguishing Different Industry Technologies and Localized Technical Change
This contribution is based on the notion that different technologies are present in an industry. These different technologies result in differential “drivers” of economic performance depending on the kind of technology used by the individual firm. In a first step different technologies are empirically distinguished. Subsequently, the associated production patterns are approximated and the respective change over time is estimated. A latent class modelling approach is used to distinguish different technologies for a representative sample of E.U. dairy producers as an industry exhibiting significant structural changes and differences in production systems in the past decades. The production technology is modelled and evaluated by using the flexible functional form of a transformation function and measures of first- and second-order elasticities. We find that overall (average) measures do not well reflect individual firms’ production patterns if the technology of an industry is heterogeneous. If there is more than one type of production frontier embodied in the data, it should be recognized that different firms may exhibit very different output or input intensities and changes associated with different production systems. In particular, in the context of localized technical change, firms with different technologies can be expected to show different technical change patterns, both in terms of overall magnitudes and associated relative output and input mix changes. Assuming a homogenous technology would result in inefficient policy recommendations leading to suboptimal industry outcomes.Heterogenous Technologies, Transformation Function, Localized Technical Change, Production Economics, Q12, O33, C35,
Latent class analysis variable selection
We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP
Revealing quantum chaos with machine learning
Understanding properties of quantum matter is an outstanding challenge in
science. In this paper, we demonstrate how machine-learning methods can be
successfully applied for the classification of various regimes in
single-particle and many-body systems. We realize neural network algorithms
that perform a classification between regular and chaotic behavior in quantum
billiard models with remarkably high accuracy. We use the variational
autoencoder for autosupervised classification of regular/chaotic wave
functions, as well as demonstrating that variational autoencoders could be used
as a tool for detection of anomalous quantum states, such as quantum scars. By
taking this method further, we show that machine learning techniques allow us
to pin down the transition from integrability to many-body quantum chaos in
Heisenberg XXZ spin chains. For both cases, we confirm the existence of
universal W shapes that characterize the transition. Our results pave the way
for exploring the power of machine learning tools for revealing exotic
phenomena in quantum many-body systems.Comment: 12 pages, 12 figure
Benchmarking and Firm Heterogeneity in Electricity Distribution: A Latent Class Analysis of Germany
In January 2009 Germany introduced incentive regulation for the electricity distribution sector based on results obtained from econometric and nonparametric benchmarking analysis. One main problem for the regulator in assigning the relative efficiency scores are unobserved firm-specific factors such as network and technological differences. Comparing the efficiency of different firms usually assumes that they operate under the same production technology, thus unobserved factors might be inappropriately understood as inefficiency. To avoid this type of misspecification in regulatory practice estimation is carried out in two stages: in a first stage observations are classified into two categories according to the size of the network operators. Then separate analyses are conducted for each sub-group. This paper shows how to disentangle the heterogeneity from inefficiency in one step, using a latent class model for stochastic frontiers. As the classification is not based on a priori sample separation criteria it delivers more robust, statistical significant and testable results. Against this backround we analyze the level of technical efficiency of a sample of 200 regional and local German electricity distribution companies for a balanced panel data set (2001-2005). Testing the hypothesis if larger distributors operate under a different technology than smaller ones we assess if a single step latent class model provides new insights to the use of benchmarking approaches within the incentive regulation schemes.Stochastic frontiers, latent class model, electricity distribution, incentive regulation
- …