149 research outputs found

    Explaining individual response using aggregated data

    Get PDF
    Empirical analysis of individual response behavior is sometimes limited due to the lack of explanatory variables at the individual level. In this paper we put forward a new approach to estimate the effects of covariates on individual response, where the covariates are unknown at the individual level but observed at some aggregated level. This situation may, for example, occur if the response variable is available at the household level but covariates only at the zip-code level. We describe the missing individual covariates by a latent variable model which matches the sample information at the aggregate level. Parameter estimates can be obtained using maximum likelihood or a Bayesian approach. We illustrate the approach estimating the effects of household characteristics on donating behavior to a Dutch charity. Donating behavior is observed at the household level, while the covariates are only observed at the zip-code level

    Modeling regional house prices

    Get PDF
    We develop a parsimonious panel model for quarterly regional house prices, for which both the cross-section and the time series dimension is large. The model allows for stochastic trends, cointegration, cross-equation correlations and, most importantly, latent-class clustering of regions. Class membership is fully data-driven and based on (i) average growth rates of house prices, (ii) the propagation of shocks to house prices across regions, also known as the ripple effect, and (iii) the relationship of house prices with economic growth and other variables. Applying the model to quarterly data for the Netherlands, we find convincing evidence for the existence of two distinct clusters of regions, with pronounced differences in house price dynamics

    Essays on Finite Mixture Models

    Get PDF
    Finite mixture distributions are a weighted average of a ¯nite number of distributions. The latter are usually called the mixture components. The weights are usually described by a multinomial distribution and are sometimes called mixing proportions. The mixture components may be the same type of distributions with di®erent parameter values but they may also be completely di®erent distributions (Everitt and Hand, 1981; Titterington et al., 1985). Therefore, ¯nite mixture distributions are very °exible for modeling data. They are frequently used as a building block within many modern econometric models. The speci¯cation of the mixture distribution depends on the modeling problem at hand. In this thesis, we introduce new applications of ¯nite mixtures to deal with several di®erent modeling issues. Each chapter of the thesis focusses on a speci¯c modeling issue. The parameters of some of the resulting models can be estimated using standard techniques but for some of the chapters we need to develop new estimation and inference methods. To illustrate how the methods can be applied, we analyze at least one empirical data set for each approach. These data sets cover a wide range of research ¯elds, such as macroeconomics, marketing, and political science. We show the usefulness of the methods and, in some cases, the improvement over previous methods in the literature

    A Bayesian approach to two-mode clustering

    Get PDF
    We develop a new Bayesian approach to estimate the parameters of a latent-class model for the joint clustering of both modes of two-mode data matrices. Posterior results are obtained using a Gibbs sampler with data augmentation. Our Bayesian approach has three advantages over existing methods. First, we are able to do statistical inference on the model parameters, which would not be possible using frequentist estimation procedures. In addition, the Bayesian approach allows us to provide statistical criteria for determining the optimal numbers of clusters. Finally, our Gibbs sampler has fewer problems with local optima in the likelihood function and empty classes than the EM algorithm used in a frequentist approach. We apply the Bayesian estimation method of the latent-class two-mode clustering model to two empirical data sets. The first data set is the Supreme Court voting data set of Doreian, Batagelj, and Ferligoj (2004). The second data set comprises the roll call votes of the United States House

    Labour market transitions and job satisfaction

    Get PDF
    The paper investigates the relationship between job satisfaction and labour market transitions. Using a multinomial logit model, a model is estimated on the basis of individual data in which transitions are explained from individual characteristics, job characteristics, dissatisfaction with the job and discrepancies between the actual and the desired number of hours worked. Transitions can be changes in the hours worked, changes to a different job and/or employers, or combinations. Furthermore, people may lose their job and leave employment out of free will. The model has been estimated for three categories of workers according to the number of hours worked. The results show that both dissatisfaction with the job and discrepancies with respect to the hours worked have a significant impact on transition probabilities. Contrary to what is sometimes believed there is no structural increase in transition probabilities. We are still far away from a ‘transtional labour market’. The paper also shows that transitions significantly increase job satisfaction. However, despite the strong improvement in the labour market situation in the 1990s, the percentage of the workers experiencing a dscrepancy between the actual and the desired number of hours has not diminished

    ChiSCor: A Corpus of Freely Told Fantasy Stories by Dutch Children for Computational Linguistics and Cognitive Science

    Full text link
    In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor's stories were produced in natural contexts, in line with recent calls for more ecologically valid datasets. ChiSCor hosts text, audio, and annotations for character complexity and linguistic complexity. Additional metadata (e.g. education of caregivers) is available for one third of the Dutch children. ChiSCor also includes a small set of 62 English stories. This paper details how ChiSCor was compiled and shows its potential for future work with three brief case studies: i) we show that the syntactic complexity of stories is strikingly stable across children's ages; ii) we extend work on Zipfian distributions in free speech and show that ChiSCor obeys Zipf's law closely, reflecting its social context; iii) we show that even though ChiSCor is relatively small, the corpus is rich enough to train informative lemma vectors that allow us to analyse children's language use. We end with a reflection on the value of narrative datasets in computational linguistics.Comment: 12 pages, 5 figures, forthcoming in Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL

    Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding

    Full text link
    Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of `real' understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.Comment: 15 pages, 0 figures, Forthcoming in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processin
    corecore