Search CORE

233 research outputs found

Substring-based Machine Translation

Author: G Neubig
Graham Neubig
S Mori
Shinsuke Mori
T Kawahara
T Watanabe
Taro Watanabe
Tatsuya Kawahara
Publication venue
Publication date: 24/04/2020
Field of study

Abstract Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al (2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs

CiteSeerX

A covalent peptide inhibitor of RGS4 identified in a focused one-bead, one compound library screen

Author: Blazer Levi L
Clements Samuel T
Mosberg Henry I
Neubig Richard R
Ota Shodai
Roman David L
Roof Rebecca A
Sobczyk-Kojiro Katarzyna
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Molecular architecture of Gαo and the structural basis for RGS16-mediated deactivation

Author: Berman
C.-K. Chen
Chan
Chen
Chen
Chen
Coleman
De Vries
Grafstein-Dunn
He
Heximer
K. C. Slep
Kimple
Koelle
Kozasa
Kozasa
Kudlacek
Lodowski
M. A. Kercher
M. I. Simon
McIntire
Neubig
Noel
P. B. Sigler
Slep
Snow
Sondek
Sunahara
T. Wieland
Tesmer
Tesmer
Valenzuela
Wang
Wilkie
Willars
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2008
Field of study

Heterotrimeric G proteins relay extracellular cues from heptahelical transmembrane receptors to downstream effector molecules. Composed of an α subunit with intrinsic GTPase activity and a βγ heterodimer, the trimeric complex dissociates upon receptor-mediated nucleotide exchange on the α subunit, enabling each component to engage downstream effector targets for either activation or inhibition as dictated in a particular pathway. To mitigate excessive effector engagement and concomitant signal transmission, the Gα subunit's intrinsic activation timer (the rate of GTP hydrolysis) is regulated spatially and temporally by a class of GTPase accelerating proteins (GAPs) known as the regulator of G protein signaling (RGS) family. The array of G protein-coupled receptors, Gα subunits, RGS proteins and downstream effectors in mammalian systems is vast. Understanding the molecular determinants of specificity is critical for a comprehensive mapping of the G protein system. Here, we present the 2.9 Å crystal structure of the enigmatic, neuronal G protein Gαo in the GTP hydrolytic transition state, complexed with RGS16. Comparison with the 1.89 Å structure of apo-RGS16, also presented here, reveals plasticity upon Gαo binding, the determinants for GAP activity, and the structurally unique features of Gαo that likely distinguish it physiologically from other members of the larger Gαi family, affording insight to receptor, GAP and effector specificity

Crossref

PubMed Central

Carolina Digital Repository

Caltech Authors

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

Author: Clark Jonathan H.
Deutsch Daniel
Fernandes Patrick
Finkelstein Mara
Firat Orhan
Freitag Markus
Garg Ankush
Martins André F. T.
Neubig Graham
Riley Parker
Publication venue
Publication date: 14/08/2023
Field of study

Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems. While considerable progress has been made on estimating a single scalar quality score, current metrics lack the informativeness of more detailed schemes that annotate individual errors, such as Multidimensional Quality Metrics (MQM). In this paper, we help fill this gap by proposing AutoMQM, a prompting technique which leverages the reasoning and in-context learning capabilities of large language models (LLMs) and asks them to identify and categorize errors in translations. We start by evaluating recent LLMs, such as PaLM and PaLM-2, through simple score prediction prompting, and we study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores (with particularly large gains for larger models) while providing interpretability through error spans that align with human annotations.Comment: 19 page

arXiv.org e-Print Archive

Neutrino Oscillations and the Supernova 1987A Signal

Author: A. Burrows
A. Yu. Smirnov
A. Yu. Smirnov
Beat Jegerlehner
C. B. Bratton
D. Nötzold
E. S. Myra
Frank Neubig
G. Raffelt
Georg Raffelt
H. Minakata
H. Minakata
H.-T. Janka
H.-T. Janka
H.-T. Janka
H.-T. Janka
H.-T. Janka
H.-T. Janka
J. Arafune
J. Arafune
K. S. Hirata
K. S. Hirata
K. Sato
L. Wolfenstein
M. G. Kendall
N. Hata
P. I. Krastev
P. J. Kernan
P. M. Giovanoni
P. O. Lagage
R. M. Bionta
R. Mayle
S. P. Mikheev
S. P. Rosen
S. W. Bruenn
T. J. Loredo
T. K. Kuo
T. P. Walker
V. Barger
W. T. Eadie
Publication venue: 'American Physical Society (APS)'
Publication date: 22/01/1996
Field of study

We study the impact of neutrino oscillations on the interpretation of the supernova (SN) 1987A neutrino signal by means of a maximum-likelihood analysis. We focus on oscillations between

\overline\nu_e

with

\overline\nu_\mu

\overline\nu_\tau

with those mixing parameters that would solve the solar neutrino problem. For the small-angle MSW solution (

\Delta m^2\approx10^{-5}\,\rm eV^2

\sin^22\Theta_0\approx0.007

), there are no significant oscillation effects on the Kelvin-Helmholtz cooling signal; we confirm previous best-fit values for the neutron-star binding energy and average spectral

\overline\nu_e

temperature. There is only marginal overlap between the upper end of the 95.4\% CL inferred range of

\langle E_{\overline\nu_e}\rangle

and the lower end of the range of theoretical predictions. Any admixture of the stiffer

\overline\nu_\mu

spectrum by oscillations aggravates the conflict between experimentally inferred and theoretically predicted spectral properties. For mixing parameters in the neighborhood of the large-angle MSW solution (

\Delta m^2\approx10^{-5}\,\rm eV^2

\sin^22\Theta_0\approx0.7

) the oscillations in the SN are adiabatic, but one needs to include the regeneration effect in the Earth which causes the Kamiokande and IMB detectors to observe different

\overline\nu_e

spectra. For the solar vacuum solution (

\Delta m^2\approx10^{-10}\,\rm eV^2

\sin^22\Theta_0\approx1

) the oscillations in the SN are nonadiabatic; vacuum oscillations take place between the SN and the detector. If either of the large-angle solutions were borne out by the upcoming round of solar neutrino experiments, one would have to conclude that the SN~1987A

\overline\nu_\mu

and/or

\overline\nu_e

spectra had been much softer than predicted by currentComment: Final version with very minor wording changes, to be published in Phys. Rev.

arXiv.org e-Print Archive

Crossref

CERN Document Server

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

Author: Bertsch Amanda
de Souza José G. C.
Farinhas António
Fernandes Patrick
Liu Emmy
Madaan Aman
Martins André F. T.
Martins Pedro Henrique
Neubig Graham
Wu Tongshuang
Zhou Shuyan
Publication venue
Publication date: 31/05/2023
Field of study

Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation. First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models. We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for human intervention.Comment: Work in Progres

arXiv.org e-Print Archive

High-Throughput Screening for Small-Molecule Inhibitors of LARG-Stimulated RhoA Nucleotide Binding via a Novel Fluorescence Polarization Assay

Author: Evelyn C. R.
Ferng T.
Larsen M. J.
Neubig R. R.
Rojas R. J.
Sondek J.
Publication venue
Publication date: 01/01/2009
Field of study

Guanine nucleotide-exchange factors (GEFs) stimulate guanine nucleotide exchange and the subsequent activation of Rho-family proteins in response to extracellular stimuli acting upon cytokine, tyrosine kinase, adhesion, integrin, and G-protein coupled receptors (GPCRs). Upon Rho activation, several downstream events occur, such as morphological and cytokskeletal changes, motility, growth, survival, and gene transcription. The RhoGEF Leukemia-Associated RhoGEF (LARG) is a member of the Regulators of G-protein Signaling Homology Domain (RH) family of GEFs originally identified as a result of chromosomal translocation in acute myeloid leukemia. Using a novel fluorescence polarization guanine nucleotide binding assay utilizing BODIPY-Texas Red-GTPγS (BODIPY-TR-GTPγS), we performed a ten-thousand compound high-throughput screen for inhibitors of LARG-stimulated RhoA nucleotide binding. Five compounds identified from the high-throughput screen were confirmed in a non-fluorescent radioactive guanine nucleotide binding assay measuring LARG-stimulated [35S] GTPγS binding to RhoA, thus ruling out non-specific fluorescent effects. All five compounds selectively inhibited LARG-stimulated RhoA [35S] GTPγS binding, but had little to no effect upon RhoA or Gαo [35S] GTPγS binding. Therefore, these five compounds should serve as promising starting points for the development of small molecule inhibitors of LARG-mediated nucleotide exchange as both pharmacological tools and therapeutics. In addition, the fluorescence polarization guanine nucleotide binding assay described here should serve as a useful approach for both high-throughput screening and general biological applications

PubMed Central

Carolina Digital Repository

Oral Ethanol Self-Administration in Rhesus Monkeys: Behavioral and Neurochemical Correlates

Author: Ballenger JC
Carroll ME
Cloninger CR
Devoto P
Erikkson K
Heinz A
Higley JD
Higley JD
Li T-K
Mardones J
Meisch RA
Naranjo CA
Neubig RR
O'Brien CP
Prescott CA
Rezvani AH
Tallarida RJ
Virkkunen M
Winger G
Yoshimoto KY
Zhou FC
Publication venue: 'Wiley'
Publication date: 01/08/1999
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/66306/1/j.1530-0277.1999.tb04357.x.pd

Crossref

Deep Blue Documents

Tiny-Scale Molecular Structures in the Magellanic Clouds (Part 1)

Author: A. Vidal-Madjar
Abgrall
Abgrall
Abgrall
Ballester
Bertin
Bluhm
Boehringer
Bolatto
Cardelli
Cecchi-Pestellini
Cecchi-Pestellini
Chin
Chu
Cohen
Curry
Curry
Danforth
de Boer
de Vaucouleurs
Dekker
Dickey
Diplas
Draine
Dufour
E. Roueff
Ehrenfreund
F. Le Petit
Faison
Ferlet
Ferlet
Ferlet
Fitzpatrick
Frail
Friedman
Garay
Garnett
Herbig
Hoopes
Houlahan
Howk
Hébrard
Hébrard
Israel
Israel
J-.M. Désert
Jenkins
Jenkins
Jenkins
Jura
Jura
Kimble
Kirkman
Kobulnicky
Koornneef
Lauroesch
Le Petit
Lehner
Lehner
Lemoine
Lemoine
Lequeux
M. K. André
Mallouris
McKee
Mebold
Moos
Morton
Neubig
Osterbrock
P. Sonnentrucker
Parravano
Pfenniger
Pottasch
Péquignot
R. Ferlet
Rachford
Rachford
Richter
Richter
Rollinde
Roueff
Russell
S. Lacour
Sahnow
Savage
Savage
Savage
Schectman
Schramm
Sembach
Snow
Snowden
Songaila
Sonneborn
Sonnentrucker
Sonnentrucker
Spitzer
Spitzer
T. Civeit
Tumlinson
van der Tak
Varshalovich
Vidal-Madjar
Vladilo
Wayte
Welty
Welty
Welty
Welty
Wood
Woodgate
Woosley
Wright
Wright
Publication venue: 'EDP Sciences'
Publication date: 01/01/2004
Field of study

We report on the {\small FUSE} detections of the HD and CO molecules {\bf on the lines of sight towards three Large Magellanic stars}: Sk

-

67D05, Sk

-

68D135, and Sk

-

69D246. HD is also detected for the first time {\bf on the lines of sight towards two Small Magellanic Cloud stars}: AV 95 and Sk 159. While the HD and CO abundances are expected to be lower in the Large Magellanic Cloud where molecular fractions are a third of the Galactic value and where the photodissociation flux is up to thousands times larger, we report an average HD/H

_2

ratio of 1.4

\pm

0.5 ppm and CO/H

_2

ratio ranging from 0.8 to 2.7 ppm similar to the Galactic ones. We tentatively identify a deuterium reservoir (hereafter D--reservoir) towards the Small Magellanic Cloud, along the light path to AV 95. We derive a D/H ratio ranging from 1. 10

^{-6}

to 1.1 10

^{-5}

.Comment: 34 pages, 10 tables, 12 figures, accepted for publication in A&

arXiv.org e-Print Archive

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Highly Variable Chloroplast Markers for Evaluating Plant Phylogeny at Low Taxonomic Levels and for DNA Barcoding

Author: A Rambaut
Ahmed Moustafa
AJ Alverson
BA Whitlock
C Sass
CA Wilson
D Swofford
GDD Hurst
HL Lee
J Doyle
J Rozas
J Shaw
J Shaw
J Yu
JD Thompson
JH Li
Jing Liu
Jing Yu
JM Zhang
JP Londo
JR Starr
K Neubig
KF Müller
KJ Kim
KJ Kim
KM Neubig
KW Hilu
Ling Wang
LJ Zhao
LT Dunning
LT Lu
MA Larkin
MP Simmons
MW Chase
O Seberg
P Erixon
P Korall
PM Hollingsworth
PM Peterson
SA Kelchner
SG Newmaster
SG Newmaster
Shiliang Zhou
T Borsch
Wenpan Dong
WJ Kress
X Gao
X Quan
Y Matsuda1a
YJ Zuo
ZY Yang
Publication venue: Public Library of Science
Publication date: 01/04/2012
Field of study

BACKGROUND: At present, plant molecular systematics and DNA barcoding techniques rely heavily on the use of chloroplast gene sequences. Because of the relatively low evolutionary rates of chloroplast genes, there are very few choices suitable for molecular studies on angiosperms at low taxonomic levels, and for DNA barcoding of species. METHODOLOGY/PRINCIPAL FINDINGS: We scanned the entire chloroplast genomes of 12 genera to search for highly variable regions. The sequence data of 9 genera were from GenBank and 3 genera were of our own. We identified nearly 5% of the most variable loci from all variable loci in the chloroplast genomes of each genus, and then selected 23 loci that were present in at least three genera. The 23 loci included 4 coding regions, 2 introns, and 17 intergenic spacers. Of the 23 loci, the most variable (in order from highest variability to lowest) were intergenic regions ycf1-a, trnK, rpl32-trnL, and trnH-psbA, followed by trnS(UGA)-trnG(UCC), petA-psbJ, rps16-trnQ, ndhC-trnV, ycf1-b, ndhF, rpoB-trnC, psbE-petL, and rbcL-accD. Three loci, trnS(UGA)-trnG(UCC), trnT-psbD, and trnW-psaJ, showed very high nucleotide diversity per site (π values) across three genera. Other loci may have strong potential for resolving phylogenetic and species identification problems at the species level. The loci accD-psaI, rbcL-accD, rpl32-trnL, rps16-trnQ, and ycf1 are absent from some genera. To amplify and sequence the highly variable loci identified in this study, we designed primers from their conserved flanking regions. We tested the applicability of the primers to amplify target sequences in eight species representing basal angiosperms, monocots, eudicots, rosids, and asterids, and confirmed that the primers amplified the desired sequences of these species. SIGNIFICANCE/CONCLUSIONS: Chloroplast genome sequences contain regions that are highly variable. Such regions are the first consideration when screening the suitable loci to resolve closely related species or genera in phylogenetic analyses, and for DNA barcoding

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

of Botany,Chinese Academy Of Sciences

The Francis Crick Institute