1,948 research outputs found

    Efficient Bayesian Structural Equation Modeling in Stan

    Get PDF
    Structural equation models comprise a large class of popular statistical models, including factor analysis models, certain mixed models, and extensions thereof. Model estimation is complicated by the fact that we typically have multiple interdependent response variables and multiple latent variables (which may also be called random effects or hidden variables), often leading to slow and inefficient MCMC samples. In this paper, we describe and illustrate a general, efficient approach to Bayesian SEM estimation in Stan, contrasting it with previous implementations in R package blavaan (Merkle & Rosseel, 2018). After describing the approaches in detail, we conduct a practical comparison under multiple scenarios. The comparisons show that the new approach is clearly better. We also discuss ways that the approach may be extended to other models that are of interest to psychometricians.Comment: 21 pages, 5 figure

    A Bayesian Approach to the Detection Problem in Gravitational Wave Astronomy

    Full text link
    The analysis of data from gravitational wave detectors can be divided into three phases: search, characterization, and evaluation. The evaluation of the detection - determining whether a candidate event is astrophysical in origin or some artifact created by instrument noise - is a crucial step in the analysis. The on-going analyses of data from ground based detectors employ a frequentist approach to the detection problem. A detection statistic is chosen, for which background levels and detection efficiencies are estimated from Monte Carlo studies. This approach frames the detection problem in terms of an infinite collection of trials, with the actual measurement corresponding to some realization of this hypothetical set. Here we explore an alternative, Bayesian approach to the detection problem, that considers prior information and the actual data in hand. Our particular focus is on the computational techniques used to implement the Bayesian analysis. We find that the Parallel Tempered Markov Chain Monte Carlo (PTMCMC) algorithm is able to address all three phases of the anaylsis in a coherent framework. The signals are found by locating the posterior modes, the model parameters are characterized by mapping out the joint posterior distribution, and finally, the model evidence is computed by thermodynamic integration. As a demonstration, we consider the detection problem of selecting between models describing the data as instrument noise, or instrument noise plus the signal from a single compact galactic binary. The evidence ratios, or Bayes factors, computed by the PTMCMC algorithm are found to be in close agreement with those computed using a Reversible Jump Markov Chain Monte Carlo algorithm.Comment: 19 pages, 12 figures, revised to address referee's comment

    Interpreting Potential Groundwater Policies through Modeling of Market and Non-Market Benefits and Costs

    Get PDF
    Current policies leveraging financial incentives and improved irrigation efficiency to mitigate groundwater scarcity have not proven to curtail trends of resource depletion. Groundwater benefits cannot be appropriately valued solely on market forces, and so deeper policy consideration is warranted under a framework that considers the importance of groundwater across all its values to society. Understanding time preferences for groundwater management and preferences for alternative policies is vital to inform efficient policies. Furthermore, climate change remains politically controversial yet has important consequences for critical groundwater resources and their sustainable long-term management. Proliferating policy narratives concerning climate change could influence the way people think about managing groundwater resources. I present three empirical studies that address these issues. Chapter I examines irrigation efficiency technologies for improved outcomes using a market-based, spatially-dynamic optimization model to test the limitations of improvements alone and in tandem with typical environmental policy mechanisms. Improved efficiency induces some producers to plant more of water-intensive crops such as rice, and best-case improvements fail to counter trends of groundwater depletion over a 30-year horizon. Chapter II elicits public willingness to pay (WTP) for long-term groundwater management and for market and non-market groundwater services. I employ time-discounted choice models to endogenously estimate time preferences under different forms of discounting. This is the first non-market valuation to estimate heterogeneity in time preferences using flexible mixing distributions. I find significant WTP for water quality provision, buffer against long-term drought, jobs from agriculture, and provision of wildlife habitat that promotes fishing and duck hunting, while most people display evidence of hyperbolic or quasi-hyperbolic discounting. Individual parameter distributions for WTP and time preferences are not normally distributed. Chapter III continues the Narrative Policy Framework (NPF) tradition to test for systematic influences of narrative frames about climate change on elicited groundwater and policy preferences. In a Choice Experiment (CE), some respondents were exposed to a structuralist, culturally-biased narrative frame about climate change and groundwater resources. Using theories about cultural risk perception and motivated reasoning for systematic evaluation, I find evidence for a cultural incongruency effect but no evidence for a congruency effect. This suggests that people could respond more strongly to incongruence than to congruence in the case of groundwater policy preferences

    Identifying Graphs from Noisy Observational Data

    Get PDF
    There is a growing amount of data describing networks -- examples include social networks, communication networks, and biological networks. As the amount of available data increases, so does our interest in analyzing the properties and characteristics of these networks. However, in most cases the data is noisy, incomplete, and the result of passively acquired observational data; naively analyzing these networks without taking these errors into account can result in inaccurate and misleading conclusions. In my dissertation, I study the tasks of entity resolution, link prediction, and collective classification to address these deficiencies. I describe these tasks in detail and discuss my own work on each of these tasks. For entity resolution, I develop a method for resolving the identities of name mentions in email communications. For link prediction, I develop a method for inferring subordinate-manager relationships between individuals in an email communication network. For collective classification, I propose an adaptive active surveying method to address node labeling in a query-driven setting on network data. In many real-world settings, however, these deficiencies are not found in isolation and all need to be addressed to infer the desired complete and accurate network. Furthermore, because of the dependencies typically found in these tasks, the tasks are inherently inter-related and must be performed jointly. I define the general problem of graph identification which simultaneously performs these tasks; removing the noise and missing values in the observed input network and inferring the complete and accurate output network. I present a novel approach to graph identification using a collection of Coupled Collective Classifiers, C3, which, in addition to capturing the variety of features typically used for each task, can capture the intra- and inter-dependencies required to correctly infer nodes, edges, and labels in the output network. I discuss variants of C3 using different learning and inference paradigms and show the superior performance of C3, in terms of both prediction quality and runtime performance, over various previous approaches. I then conclude by presenting the Graph Alignment, Identification, and Analysis (GAIA) open-source software library which not only provides an implementation of C3 but also algorithms for various tasks in network data such as entity resolution, link prediction, collective classification, clustering, active learning, data generation, and analysis

    Making sense of the Internet of Things: A critical review of Internet of Things definitions between 2005 and 2019

    Get PDF
    Purpose: This paper aims to study the evolution of definitions of internet of things (IoT) through time, critically assess the knowledge these definitions contain and facilitate sensemaking by providing those unfamiliar with IoT with a theoretical definition and an extended framework. Design/methodology/approach: 164 articles published between 2005 and 2019 are collected using snowball sampling. Further, 100 unique definitions are identified in the sample. Definitions are examined using content analysis and applying a theoretical framework of five knowledge dimensions. Findings: In declarative/relational dimensions of knowledge, increasing levels of agreement are observed in the sample. Sources of tautological reasoning are identified. In conditional and causal dimensions, definitions of IoT remain underdeveloped. In the former, potential limitations of IoT related to resource scarcity, privacy and security are overlooked. In the latter, three main loci of agreement are identified. Research limitations/implications: This study does not cover all published definitions of IoT. Some narratives may be omitted by our selection criteria and process. Practical implications: This study supports sensemaking of IoT. Main loci of agreement in definitions of IoT are identified. Avenues for further clarification and consensus are explored. A new framework that can facilitate further investigation and agreement is introduced. Originality/value: This is, to the authors’ knowledge, the first study that examines the historical evolution of definitions of IoT vis-à-vis its technological features. This study introduces an updated framework to critically assess and compare definitions, identify ambiguities and resolve conflicts among different interpretations. The framework can be used to compare past and future definitions and help actors unfamiliar with IoT to make sense of it in a way to reduce adoption costs. It can also support researchers in studying early discussions of IoT

    A Comprehensive Review of Community Detection in Graphs

    Full text link
    The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a crucial role in understanding the organization and functioning of complex systems. We begin by introducing the concept of community structure, which refers to the arrangement of vertices into clusters, with strong internal connections and weaker connections between clusters. Then, we provide a thorough exposition of various community detection methods, including a new method designed by us. Additionally, we explore real-world applications of community detection in diverse networks. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs. It serves as a valuable resource for researchers and practitioners in multiple disciplines, offering insights into the challenges, methodologies, and applications of community detection in complex networks

    Streaming and Sketch Algorithms for Large Data NLP

    Get PDF
    The availability of large and rich quantities of text data is due to the emergence of the World Wide Web, social media, and mobile devices. Such vast data sets have led to leaps in the performance of many statistically-based problems. Given a large magnitude of text data available, it is computationally prohibitive to train many complex Natural Language Processing (NLP) models on large data. This motivates the hypothesis that simple models trained on big data can outperform more complex models with small data. My dissertation provides a solution to effectively and efficiently exploit large data on many NLP applications. Datasets are growing at an exponential rate, much faster than increase in memory. To provide a memory-efficient solution for handling large datasets, this dissertation show limitations of existing streaming and sketch algorithms when applied to canonical NLP problems and proposes several new variants to overcome those shortcomings. Streaming and sketch algorithms process the large data sets in one pass and represent a large data set with a compact summary, much smaller than the full size of the input. These algorithms can easily be implemented in a distributed setting and provide a solution that is both memory- and time-efficient. However, the memory and time savings come at the expense of approximate solutions. In this dissertation, I demonstrate that approximate solutions achieved on large data are comparable to exact solutions on large data and outperform exact solutions on smaller data. I focus on many NLP problems that boil down to tracking many statistics, like storing approximate counts, computing approximate association scores like pointwise mutual information (PMI), finding frequent items (like n-grams), building streaming language models, and measuring distributional similarity. First, I introduce the concept of approximate streaming large-scale language models in NLP. Second, I present a novel variant of the Count-Min sketch that maintains approximate counts of all items. Third, I conduct a systematic study and compare many sketch algorithms that approximate count of items with focus on large-scale NLP tasks. Last, I develop fast large-scale approximate graph (FLAG), a system that quickly constructs a large-scale approximate nearest-neighbor graph from a large corpus
    • …
    corecore