Search CORE

29 research outputs found

Recommended from our members

Sharp and Practical Performance Bounds for MCMC and Multiple Testing

Author: Rabinovich Maxim
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

With the greater adoption of statistical and machine learning methods across science and industry, a greater awareness of the need to align statistical theory ever more closely with the demands of applications is developing. One recurring theme within this process is the re-examination of basic questions and core assumptions through the lens of modern mathematical statistics. This thesis targets two such basic questions in two different contexts: posterior simulation using Markov chain Monte Carlo (MCMC), on the one hand; and multiple hypothesis testing, on the other. For MCMC, we analyze convergence in terms of the expectations of a limited number of query functions, rather than the entire posterior. We show both theoretically and via simulations that the resultant theory predicts the required chain length more sharply than global convergence criteria. Furthermore, we provide match- ing lower bounds that show our bounds are essentially optimal in their dependence on chain parameters and target accuracy. For multiple testing, we provide the first general framework for establishing the optimal tradeoff between false positives (measured by the False Discovery Rate, or FDR) and false negatives (measured by the False Non-discovery rate, or FNR). We begin by proving some of the first non-asymptotic results available on this topic – initially in the context of Gaussian-like test statistics. We then go on to develop the more general framework. The latter applies to test statistics with essentially arbitrary analytic forms and dependence, recovers previous results as special cases, and yields numerically simulable lower bounds that can be evaluated in almost any model of interest

eScholarship - University of California

Abstract Syntax Networks for Code Generation and Semantic Parsing

Author: Klein Dan
Rabinovich Maxim
Stern Mitchell
Publication venue
Publication date: 01/01/2017
Field of study

Tasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs. We introduce abstract syntax networks, a modeling framework for these problems. The outputs are represented as abstract syntax trees (ASTs) and constructed by a decoder with a dynamically-determined modular structure paralleling the structure of the output tree. On the benchmark Hearthstone dataset for code generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy, compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with no task-specific engineering.Comment: ACL 2017. MR and MS contributed equall

arXiv.org e-Print Archive

Crossref

Quantitative criticism of literary relationships

Author: Bonilla Lopez Jorge A.
Brofos James A.
Casarez Adriana
Chaudhuri Pramit
Dasgupta Tathagata
Dexter Joseph P.
Haimson Lushkov Ayelet
Kannan Ajay
Katz Theodore R.
Rabinovich Maxim
Schroeder Lea A.
Tripuraneni Nilesh
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 03/04/2017
Field of study

Authors often convey meaning by referring to or imitating prior works of literature, a process that creates complex networks of literary relationships ("intertextuality") and contributes to cultural evolution. In this paper, we use techniques from stylometry and machine learning to address subjective literary critical questions about Latin literature, a corpus marked by an extraordinary concentration of intertextuality. Our work, which we term "quantitative criticism," focuses on case studies involving two influential Roman authors, the playwright Seneca and the historian Livy. We find that four plays related to but distinct from Seneca's main writings are differentiated from the rest of the corpus by subtle but important stylistic features. We offer literary interpretations of the significance of these anomalies, providing quantitative data in support of hypotheses about the use of unusual formal features and the interplay between sound and meaning. The second part of the paper describes a machine-learning approach to the identification and analysis of citational material that Livy loosely appropriated from earlier sources. We extend our approach to map the stylistic topography of Latin prose, identifying the writings of Caesar and his near-contemporary Livy as an inflection point in the development of Latin prose style. In total, our results reflect the integration of computational and humanistic methods to investigate a diverse range of literary questions

DSpace@MIT

Crossref

Inverse Regression Topic Modeling: Models, Inference and Applications

Author: Rabinovich Maxim
Publication venue
Publication date: 12/07/2013
Field of study

Dataspace