233 research outputs found
Substring-based Machine Translation
Abstract Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al (2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs
Molecular architecture of Gαo and the structural basis for RGS16-mediated deactivation
Heterotrimeric G proteins relay extracellular cues from heptahelical transmembrane receptors to downstream effector molecules. Composed of an α subunit with intrinsic GTPase activity and a βγ heterodimer, the trimeric complex dissociates upon receptor-mediated nucleotide exchange on the α subunit, enabling each component to engage downstream effector targets for either activation or inhibition as dictated in a particular pathway. To mitigate excessive effector engagement and concomitant signal transmission, the Gα subunit's intrinsic activation timer (the rate of GTP hydrolysis) is regulated spatially and temporally by a class of GTPase accelerating proteins (GAPs) known as the regulator of G protein signaling (RGS) family. The array of G protein-coupled receptors, Gα subunits, RGS proteins and downstream effectors in mammalian systems is vast. Understanding the molecular determinants of specificity is critical for a comprehensive mapping of the G protein system. Here, we present the 2.9 Å crystal structure of the enigmatic, neuronal G protein Gαo in the GTP hydrolytic transition state, complexed with RGS16. Comparison with the 1.89 Å structure of apo-RGS16, also presented here, reveals plasticity upon Gαo binding, the determinants for GAP activity, and the structurally unique features of Gαo that likely distinguish it physiologically from other members of the larger Gαi family, affording insight to receptor, GAP and effector specificity
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Automatic evaluation of machine translation (MT) is a critical tool driving
the rapid iterative development of MT systems. While considerable progress has
been made on estimating a single scalar quality score, current metrics lack the
informativeness of more detailed schemes that annotate individual errors, such
as Multidimensional Quality Metrics (MQM). In this paper, we help fill this gap
by proposing AutoMQM, a prompting technique which leverages the reasoning and
in-context learning capabilities of large language models (LLMs) and asks them
to identify and categorize errors in translations. We start by evaluating
recent LLMs, such as PaLM and PaLM-2, through simple score prediction
prompting, and we study the impact of labeled data through in-context learning
and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that
it improves performance compared to just prompting for scores (with
particularly large gains for larger models) while providing interpretability
through error spans that align with human annotations.Comment: 19 page
Neutrino Oscillations and the Supernova 1987A Signal
We study the impact of neutrino oscillations on the interpretation of the
supernova (SN) 1987A neutrino signal by means of a maximum-likelihood analysis.
We focus on oscillations between with or
with those mixing parameters that would solve the solar
neutrino problem. For the small-angle MSW solution (, ), there are no
significant oscillation effects on the Kelvin-Helmholtz cooling signal; we
confirm previous best-fit values for the neutron-star binding energy and
average spectral temperature. There is only marginal overlap
between the upper end of the 95.4\% CL inferred range of and the lower end of the range of theoretical
predictions. Any admixture of the stiffer spectrum by
oscillations aggravates the conflict between experimentally inferred and
theoretically predicted spectral properties. For mixing parameters in the
neighborhood of the large-angle MSW solution (, ) the oscillations in the SN are adiabatic,
but one needs to include the regeneration effect in the Earth which causes the
Kamiokande and IMB detectors to observe different spectra. For
the solar vacuum solution (,
) the oscillations in the SN are nonadiabatic; vacuum
oscillations take place between the SN and the detector. If either of the
large-angle solutions were borne out by the upcoming round of solar neutrino
experiments, one would have to conclude that the SN~1987A
and/or spectra had been much softer than predicted by currentComment: Final version with very minor wording changes, to be published in
Phys. Rev.
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Many recent advances in natural language generation have been fueled by
training large language models on internet-scale data. However, this paradigm
can lead to models that generate toxic, inaccurate, and unhelpful content, and
automatic evaluation metrics often fail to identify these behaviors. As models
become more capable, human feedback is an invaluable signal for evaluating and
improving models. This survey aims to provide an overview of the recent
research that has leveraged human feedback to improve natural language
generation. First, we introduce an encompassing formalization of feedback, and
identify and organize existing research into a taxonomy following this
formalization. Next, we discuss how feedback can be described by its format and
objective, and cover the two approaches proposed to use feedback (either for
training or decoding): directly using the feedback or training feedback models.
We also discuss existing datasets for human-feedback data collection, and
concerns surrounding feedback collection. Finally, we provide an overview of
the nascent field of AI feedback, which exploits large language models to make
judgments based on a set of principles and minimize the need for human
intervention.Comment: Work in Progres
High-Throughput Screening for Small-Molecule Inhibitors of LARG-Stimulated RhoA Nucleotide Binding via a Novel Fluorescence Polarization Assay
Guanine nucleotide-exchange factors (GEFs) stimulate guanine nucleotide exchange and the subsequent activation of Rho-family proteins in response to extracellular stimuli acting upon cytokine, tyrosine kinase, adhesion, integrin, and G-protein coupled receptors (GPCRs). Upon Rho activation, several downstream events occur, such as morphological and cytokskeletal changes, motility, growth, survival, and gene transcription. The RhoGEF Leukemia-Associated RhoGEF (LARG) is a member of the Regulators of G-protein Signaling Homology Domain (RH) family of GEFs originally identified as a result of chromosomal translocation in acute myeloid leukemia. Using a novel fluorescence polarization guanine nucleotide binding assay utilizing BODIPY-Texas Red-GTPγS (BODIPY-TR-GTPγS), we performed a ten-thousand compound high-throughput screen for inhibitors of LARG-stimulated RhoA nucleotide binding. Five compounds identified from the high-throughput screen were confirmed in a non-fluorescent radioactive guanine nucleotide binding assay measuring LARG-stimulated [35S] GTPγS binding to RhoA, thus ruling out non-specific fluorescent effects. All five compounds selectively inhibited LARG-stimulated RhoA [35S] GTPγS binding, but had little to no effect upon RhoA or Gαo [35S] GTPγS binding. Therefore, these five compounds should serve as promising starting points for the development of small molecule inhibitors of LARG-mediated nucleotide exchange as both pharmacological tools and therapeutics. In addition, the fluorescence polarization guanine nucleotide binding assay described here should serve as a useful approach for both high-throughput screening and general biological applications
Oral Ethanol Self-Administration in Rhesus Monkeys: Behavioral and Neurochemical Correlates
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/66306/1/j.1530-0277.1999.tb04357.x.pd
Tiny-Scale Molecular Structures in the Magellanic Clouds (Part 1)
We report on the {\small FUSE} detections of the HD and CO molecules {\bf on
the lines of sight towards three Large Magellanic stars}: Sk 67D05, Sk
68D135, and Sk 69D246. HD is also detected for the first time {\bf on the
lines of sight towards two Small Magellanic Cloud stars}: AV 95 and Sk 159.
While the HD and CO abundances are expected to be lower in the Large Magellanic
Cloud where molecular fractions are a third of the Galactic value and where the
photodissociation flux is up to thousands times larger, we report an average
HD/H ratio of 1.40.5 ppm and CO/H ratio ranging from 0.8 to 2.7
ppm similar to the Galactic ones. We tentatively identify a deuterium reservoir
(hereafter D--reservoir) towards the Small Magellanic Cloud, along the light
path to AV 95. We derive a D/H ratio ranging from 1. 10 to 1.1
10.Comment: 34 pages, 10 tables, 12 figures, accepted for publication in A&
Highly Variable Chloroplast Markers for Evaluating Plant Phylogeny at Low Taxonomic Levels and for DNA Barcoding
BACKGROUND: At present, plant molecular systematics and DNA barcoding techniques rely heavily on the use of chloroplast gene sequences. Because of the relatively low evolutionary rates of chloroplast genes, there are very few choices suitable for molecular studies on angiosperms at low taxonomic levels, and for DNA barcoding of species. METHODOLOGY/PRINCIPAL FINDINGS: We scanned the entire chloroplast genomes of 12 genera to search for highly variable regions. The sequence data of 9 genera were from GenBank and 3 genera were of our own. We identified nearly 5% of the most variable loci from all variable loci in the chloroplast genomes of each genus, and then selected 23 loci that were present in at least three genera. The 23 loci included 4 coding regions, 2 introns, and 17 intergenic spacers. Of the 23 loci, the most variable (in order from highest variability to lowest) were intergenic regions ycf1-a, trnK, rpl32-trnL, and trnH-psbA, followed by trnS(UGA)-trnG(UCC), petA-psbJ, rps16-trnQ, ndhC-trnV, ycf1-b, ndhF, rpoB-trnC, psbE-petL, and rbcL-accD. Three loci, trnS(UGA)-trnG(UCC), trnT-psbD, and trnW-psaJ, showed very high nucleotide diversity per site (π values) across three genera. Other loci may have strong potential for resolving phylogenetic and species identification problems at the species level. The loci accD-psaI, rbcL-accD, rpl32-trnL, rps16-trnQ, and ycf1 are absent from some genera. To amplify and sequence the highly variable loci identified in this study, we designed primers from their conserved flanking regions. We tested the applicability of the primers to amplify target sequences in eight species representing basal angiosperms, monocots, eudicots, rosids, and asterids, and confirmed that the primers amplified the desired sequences of these species. SIGNIFICANCE/CONCLUSIONS: Chloroplast genome sequences contain regions that are highly variable. Such regions are the first consideration when screening the suitable loci to resolve closely related species or genera in phylogenetic analyses, and for DNA barcoding
- …