124 research outputs found
Bootstrap methods for the empirical study of decision-making and information flows in social systems
Abstract: We characterize the statistical bootstrap for the estimation of information theoretic quantities from data, with particular reference to its use in the study of large-scale social phenomena. Our methods allow one to preserve, approximately, the underlying axiomatic relationships of information theory—in particular, consistency under arbitrary coarse-graining—that motivate use of these quantities in the first place, while providing reliability comparable to the state of the art for Bayesian estimators. We show how information-theoretic quantities allow for rigorous empirical study of the decision-making capacities of rational agents, and the time-asymmetric flows of information in distributed systems. We provide illustrative examples by reference to ongoing collaborative work on the semantic structure of the British Criminal Court system and the conflict dynamics of the contemporary Afghanistan insurgency
Statistical Analysis and Design of Crowdsourcing Applications
This thesis develops methods for the analysis and design of crowdsourced experiments and crowdsourced labeling tasks. Much of this document focuses on applications including running natural field experiments, estimating the number of objects in images and collecting labels for word sense disambiguation. Observed shortcomings of the crowdsourced experiments inspired the development of methodology for running more powerful experiments via matching on-the-fly. Using the label data to estimate response functions inspired work on non-parametric function estimation using Bayesian Additive Regression Trees (BART). This work then inspired extensions to BART such as incorporation of missing data as well as a user-friendly R package
Statistical Modelling
The book collects the proceedings of the 19th International Workshop on Statistical Modelling held in Florence on July 2004. Statistical modelling is an important cornerstone in many scientific disciplines, and the workshop has provided a rich environment for cross-fertilization of ideas from different disciplines. It consists in four invited lectures, 48 contributed papers and 47 posters. The contributions are arranged in sessions: Statistical Modelling; Statistical Modelling in Genomics; Semi-parametric Regression Models; Generalized Linear Mixed Models; Correlated Data Modelling; Missing Data, Measurement of Error and Survival Analysis; Spatial Data Modelling and Time Series and Econometrics
Recommended from our members
Accelerating Materials Discovery with Machine Learning
As we enter the data age, ever-increasing amounts of human knowledge are being recorded in machine-readable formats.
This has opened up new opportunities to leverage data to accelerate scientific discovery.
This thesis focuses on how we can use historical and computational data to aid the discovery and development of new materials.
We begin by looking at a traditional materials informatics task -- elucidating the structure-function relationships of high-temperature cuprate superconductors.
One of the most significant challenges for materials informatics is the limited availability of relevant data.
We propose a simple calibration-based approach to estimate the apical and in-plane copper-oxygen distances from more readily available lattice parameter data to address this challenge for cuprate superconductors.
Our investigation uncovers a large, unexplored region of materials space that may yield cuprates with higher critical temperatures.
We propose two experimental avenues that may enable this region to be accessed.
Computational materials exploration is bottle-necked by our ability to provide input structures to feed our workflows.
Whilst \textit{ab-intio} structure identification is possible, it is computationally burdensome and we lack design rules for deciding where to target searches in high-throughput setups.
To address this, there is a need to develop tools that suggest promising candidates, enabling automated deployment and increased efficiency.
Machine learning models are well suited to this task, however, current approaches typically use hand-engineered inputs.
This means that their performance is circumscribed by the intuitions reflected in the chosen inputs.
We propose a novel way to formulate the machine learning task as a set regression problem over the elements in a material.
We show that our approach leads to higher sample efficiency than other well-established composition-based approaches.
Having demonstrated the ability of machine learning to aid in the selection of promising compound compositions, we next explore how useful machine learning might be for identifying fabrication routes.
Using a recently released data-mined data set of solid-state synthesis reactions, we design a two-stage model to predict the products of inorganic reactions.
We critically explore the performance of this model, showing that whilst the predictions fall short of the accuracy required to be chemically discriminative, the model provides valuable insights into understanding inorganic reactions.
Through careful investigation of the model's failure modes, we explore the challenges that remain in the construction of forward inorganic reaction prediction models and suggest some pathways to tackle the identified issues.
One of the principal ways that material scientists understand and categorise materials is in terms of their symmetries.
Crystal structure prototypes are assigned based on the presence of symmetrically equivalent sites known as Wyckoff positions.
We show that a powerful coarse-grained representation of materials structures can be constructed from the Wyckoff positions by discarding information about their coordinates within crystal structures.
One of the strengths of this representation is that it maintains the ability of structure-based methods to distinguish polymorphs whilst also allowing combinatorial enumeration akin to composition-based approaches.
We construct an end-to-end differentiable model that takes our proposed Wyckoff representation as input.
The performance of this approach is examined on a suite of materials discovery experiments showing that it leads to strong levels of enrichment in materials discovery tasks.
The research presented in this thesis highlights the promise of applying data-driven workflows and machine learning in materials discovery and development.
This thesis concludes by speculating about promising research directions for applying machine learning within materials discovery
Statistical Knowledge and Learning in Phonology
This thesis deals with the theory of the phonetic component of grammar in a formal probabilistic inference framework: (1) it has been recognized since the beginning of generative phonology that some language-specific phonetic implementation is actually context-dependent, and thus it can be said that there are gradient "phonetic processes" in grammar in addition to categorical "phonological processes." However, no explicit theory has been developed to characterize these processes. Meanwhile, (2) it is understood that language acquisition and perception are both really informed guesswork: the result of both types of inference can be reasonably thought to be a less-than-perfect committment, with multiple candidate grammars or parses considered and each associated with some degree of credence. Previous research has used probability theory to formalize these inferences in implemented computational models, especially in phonetics and phonology. In this role, computational models serve to demonstrate the existence of working learning/per- ception/parsing systems assuming a faithful implementation of one particular theory of human language, and are not intended to adjudicate whether that theory is correct. The current thesis (1) develops a theory of the phonetic component of grammar and how it
relates to the greater phonological system and (2) uses a formal Bayesian treatment of learning to evaluate this theory of the phonological architecture and for making predictions about how the resulting grammars will be organized. The coarse description of the consequence for linguistic theory is that the processes we think of as "allophonic" are actually language-specific, gradient phonetic processes, assigned to the phonetic component of grammar; strict allophones have no representation in the output of the categorical phonological grammar
- …