2,333 research outputs found
Rapid prediction of NMR spectral properties with quantified uncertainty
open access articleAccurate calculation of specific spectral properties for NMR is an important step for molecular structure elucidation. Here we report the development of a novel machine learning technique for accurately predicting chemical shifts of both 1H and 13C nuclei which exceeds DFT-accessible accuracy for 13C and 1H for a subset of nuclei, while being orders of magnitude more performant. Our method produces estimates of uncertainty, allowing for robust and confident predictions, and suggests future avenues for improved performance
Learning Semantic Correspondences in Technical Documentation
We consider the problem of translating high-level textual descriptions to
formal representations in technical documentation as part of an effort to model
the meaning of such documentation. We focus specifically on the problem of
learning translational correspondences between text descriptions and grounded
representations in the target documentation, such as formal representation of
functions or code templates. Our approach exploits the parallel nature of such
documentation, or the tight coupling between high-level text and the low-level
representations we aim to learn. Data is collected by mining technical
documents for such parallel text-representation pairs, which we use to train a
simple semantic parsing model. We report new baseline results on sixteen novel
datasets, including the standard library documentation for nine popular
programming languages across seven natural languages, and a small collection of
Unix utility manuals.Comment: accepted to ACL-201
Polyglot Semantic Parsing in APIs
Traditional approaches to semantic parsing (SP) work by training individual
models for each available parallel dataset of text-meaning pairs. In this
paper, we explore the idea of polyglot semantic translation, or learning
semantic parsing models that are trained on multiple datasets and natural
languages. In particular, we focus on translating text to code signature
representations using the software component datasets of Richardson and Kuhn
(2017a,b). The advantage of such models is that they can be used for parsing a
wide variety of input natural languages and output programming languages, or
mixed input languages, using a single unified model. To facilitate modeling of
this type, we develop a novel graph-based decoding framework that achieves
state-of-the-art performance on the above datasets, and apply this method to
two other benchmark SP tasks.Comment: accepted for NAACL-2018 (camera ready version
SLM-based Digital Adaptive Coronagraphy: Current Status and Capabilities
Active coronagraphy is deemed to play a key role for the next generation of
high-contrast instruments, notably in order to deal with large segmented
mirrors that might exhibit time-dependent pupil merit function, caused by
missing or defective segments. To this purpose, we recently introduced a new
technological framework called digital adaptive coronagraphy (DAC), making use
of liquid-crystal spatial light modulators (SLMs) display panels operating as
active focal-plane phase mask coronagraphs. Here, we first review the latest
contrast performance, measured in laboratory conditions with monochromatic
visible light, and describe a few potential pathways to improve SLM
coronagraphic nulling in the future. We then unveil a few unique capabilities
of SLM-based DAC that were recently, or are currently in the process of being,
demonstrated in our laboratory, including NCPA wavefront sensing,
aperture-matched adaptive phase masks, coronagraphic nulling of multiple star
systems, and coherent differential imaging (CDI).Comment: 14 pages, 9 figures, to appear in Proceedings of the SPIE, paper
10706-9
Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary
The introduction of pre-trained transformer-based contextualized word
embeddings has led to considerable improvements in the accuracy of graph-based
parsers for frameworks such as Universal Dependencies (UD). However, previous
works differ in various dimensions, including their choice of pre-trained
language models and whether they use LSTM layers. With the aims of
disentangling the effects of these choices and identifying a simple yet widely
applicable architecture, we introduce STEPS, a new modular graph-based
dependency parser. Using STEPS, we perform a series of analyses on the UD
corpora of a diverse set of languages. We find that the choice of pre-trained
embeddings has by far the greatest impact on parser performance and identify
XLM-R as a robust choice across the languages in our study. Adding LSTM layers
provides no benefits when using transformer-based embeddings. A multi-task
training setup outputting additional UD features may contort results. Taking
these insights together, we propose a simple but widely applicable parser
architecture and configuration, achieving new state-of-the-art results (in
terms of LAS) for 10 out of 12 diverse languages.Comment: 14 pages, 1 figure; camera-ready version for IWPT 202
Towards Opinion Mining from Reviews for the Prediction of Product Rankings
Opinion mining aims at summarizing the content of reviews for a specific brand, product, or manufacturer. However, the actual desire of a user is often one step further: Produce a ranking corresponding to specific needs such that a selection process is supported. In this work, we aim towards closing this gap. We present the task to rank products based on sentiment information and discuss necessary steps towards addressing this task. This includes, on the one hand, the identification of gold rankings as a fundament for an objective function and evaluation and, on the other hand, methods to rank products based on review information. To demonstrate early results on that task, we employ real world examples of rankings as gold standard that are of interest to potential customers as well as product managers, in our case the sales ranking provided by Amazon.com and the quality ranking by Snapsort.com. As baseline methods, we use the average star ratings and review frequencies. Our best text-based approximation of the sales ranking achieves a Spearman’s correlation coefficient of ρ = 0.23. On the Snapsort data, a ranking based on extracting comparisons leads to ρ = 0.51. In addition, we show that aspect-specific rankings can be used to measure the impact of specific aspects on the ranking
- …