29 research outputs found
Recommended from our members
Sharp and Practical Performance Bounds for MCMC and Multiple Testing
With the greater adoption of statistical and machine learning methods across science and industry, a greater awareness of the need to align statistical theory ever more closely with the demands of applications is developing. One recurring theme within this process is the re-examination of basic questions and core assumptions through the lens of modern mathematical statistics. This thesis targets two such basic questions in two different contexts: posterior simulation using Markov chain Monte Carlo (MCMC), on the one hand; and multiple hypothesis testing, on the other. For MCMC, we analyze convergence in terms of the expectations of a limited number of query functions, rather than the entire posterior. We show both theoretically and via simulations that the resultant theory predicts the required chain length more sharply than global convergence criteria. Furthermore, we provide match- ing lower bounds that show our bounds are essentially optimal in their dependence on chain parameters and target accuracy. For multiple testing, we provide the first general framework for establishing the optimal tradeoff between false positives (measured by the False Discovery Rate, or FDR) and false negatives (measured by the False Non-discovery rate, or FNR). We begin by proving some of the first non-asymptotic results available on this topic – initially in the context of Gaussian-like test statistics. We then go on to develop the more general framework. The latter applies to test statistics with essentially arbitrary analytic forms and dependence, recovers previous results as special cases, and yields numerically simulable lower bounds that can be evaluated in almost any model of interest
Abstract Syntax Networks for Code Generation and Semantic Parsing
Tasks like code generation and semantic parsing require mapping unstructured
(or partially structured) inputs to well-formed, executable outputs. We
introduce abstract syntax networks, a modeling framework for these problems.
The outputs are represented as abstract syntax trees (ASTs) and constructed by
a decoder with a dynamically-determined modular structure paralleling the
structure of the output tree. On the benchmark Hearthstone dataset for code
generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy,
compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we
perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with
no task-specific engineering.Comment: ACL 2017. MR and MS contributed equall
Quantitative criticism of literary relationships
Authors often convey meaning by referring to or imitating prior works of literature, a process that creates complex networks of literary relationships ("intertextuality") and contributes to cultural evolution. In this paper, we use techniques from stylometry and machine learning to address subjective literary critical questions about Latin literature, a corpus marked by an extraordinary concentration of intertextuality. Our work, which we term "quantitative criticism," focuses on case studies involving two influential Roman authors, the playwright Seneca and the historian Livy. We find that four plays related to but distinct from Seneca's main writings are differentiated from the rest of the corpus by subtle but important stylistic features. We offer literary interpretations of the significance of these anomalies, providing quantitative data in support of hypotheses about the use of unusual formal features and the interplay between sound and meaning. The second part of the paper describes a machine-learning approach to the identification and analysis of citational material that Livy loosely appropriated from earlier sources. We extend our approach to map the stylistic topography of Latin prose, identifying the writings of Caesar and his near-contemporary Livy as an inflection point in the development of Latin prose style. In total, our results reflect the integration of computational and humanistic methods to investigate a diverse range of literary questions