1,500 research outputs found
Let's Reinforce Step by Step
While recent advances have boosted LM proficiency in linguistic benchmarks,
LMs consistently struggle to reason correctly on complex tasks like
mathematics. We turn to Reinforcement Learning from Human Feedback (RLHF) as a
method with which to shape model reasoning processes. In particular, we explore
two reward schemes, outcome-supervised reward models (ORMs) and
process-supervised reward models (PRMs), to optimize for logical reasoning. Our
results show that the fine-grained reward provided by PRM-based methods
enhances accuracy on simple mathematical reasoning (GSM8K) while, unexpectedly,
reducing performance in complex tasks (MATH). Furthermore, we show the critical
role reward aggregation functions play in model performance. Providing
promising avenues for future research, our study underscores the need for
further exploration into fine-grained reward modeling for more reliable
language models.Comment: NeurIPS 2023 Workshop on Instruction Tuning and Instruction Followin
Recommended from our members
Genome-Resolved Proteomic Stable Isotope Probing of Soil Microbial Communities Using 13CO2 and 13C-Methanol.
Stable isotope probing (SIP) enables tracking the nutrient flows from isotopically labeled substrates to specific microorganisms in microbial communities. In proteomic SIP, labeled proteins synthesized by the microbial consumers of labeled substrates are identified with a shotgun proteomics approach. Here, proteomic SIP was combined with targeted metagenomic binning to reconstruct metagenome-assembled genomes (MAGs) of the microorganisms producing labeled proteins. This approach was used to track carbon flows from 13CO2 to the rhizosphere communities of Zea mays, Triticum aestivum, and Arabidopsis thaliana. Rhizosphere microorganisms that assimilated plant-derived 13C were capable of metabolic and signaling interactions with their plant hosts, as shown by their MAGs containing genes for phytohormone modulation, quorum sensing, and transport and metabolism of nutrients typical of those found in root exudates. XoxF-type methanol dehydrogenases were among the most abundant proteins identified in the rhizosphere metaproteomes. 13C-methanol proteomic SIP was used to test the hypothesis that XoxF was used to metabolize and assimilate methanol in the rhizosphere. We detected 7 13C-labeled XoxF proteins and identified methylotrophic pathways in the MAGs of 8 13C-labeled microorganisms, which supported the hypothesis. These two studies demonstrated the capability of proteomic SIP for functional characterization of active microorganisms in complex microbial communities
Recommended from our members
A Little Blue Bird Told Me: Sentiment Change on Orphanage Tourism
This study examined Twitter conversations to understand the changes in the sentiments related to orphanage tourism between 2009 and 2019. Past research on orphanage tourism mostly took a qualitative approach, which provided profound knowledge but was limited to a single time and those directly involved. This study fills the gap by analyzing the tweets posted by different parties, including the individuals that have participated or are willing to participate in orphanage tourism, institutions that have campaigned against the phenomenon, and organizations that have promoted orphanage volunteering. Analysis of 109,723 tweets using lexicons and supervised learning—Support Vector Machine, Random Forest, Naive Bayes, and Logistic regression—revealed that the year 2014 marked a turning point for orphanage tourism conversations as the number of posts dropped significantly, and the messages against orphanage tourism received attention. However, the positive sentiment still prevailed throughout the decade when all tweets were considered
Context, intelligence and interactions for personalized systems
This special issue on Context, Intelligence and Interactions for Personalized Systems provides a snapshot of the latest research activities, results, and technologies and application developments focusing on the smart personalised systems in Ambient Intelligence and Humanized Computing. It is intended for researchers and practitioners from artificial intelligence (AI) with expertise in formal modeling, representation and inference on situations, activities and goals; researchers from ubiquitous computing and embedded systems with expertise in context-aware computing; and application developers or users with expertise and experience in user requirements, system implementation and evaluation. The special issue also serves to motivate application scenarios from various domains including smart homes and cities, localisation tracking, image analysis and environmental monitoring. For solution developers and providers of specific application domains, this special issue will provide an opportunity to convey needs and requirements, as well as obtain first-hand information on the latest technologies, prototypes, and application exemplars
An Estimated Contribution of Glacier Runoff to Mongolia’s Upper Khovd River Basin in the Altai Mountains
In the semiarid climate of northwestern Mongolia, glaciers are critical contributors to water resources, particularly during the dry summer months. Nevertheless, our knowledge of the contribution of glacier runoff in the Upper Khovd River Basin (UKRB) is limited. This study investigates the impact of glacier recession on the UKRB\u27s hydrology in western Mongolia\u27s Altai Mountains. The analysis included glaciological method measurements, satellite-derived glacier extent records, and a simple ice ablation model. Our modeling used a mass balance gradient of 0.50 meters water equivalent 100 m–1 for the years 2000, 2010, and 2016 and included a sensitivity analysis that applied lower and upper mass balance gradient values and ±200 m around the equilibrium line altitude (ELA). The glacier contribution to the UKRB\u27s water resources decreased from almost 8% in 2000 to 6.7% in 2016. Hypsometries revealed that glacier areas decreased at all elevations, indicating that only small accumulation zones exist. Therefore, applying a modeled increased ELA better represents glacier contribution to total runoff, at 18.7% in 2000 and 15.4% in 2016. The decreasing glacier runoff contribution indicates that the UKRB glaciers have passed the tipping point of an increased contribution that first follows enhanced melting. The continued glacier recession and uncertain water availability represent challenges for water resource management and future human–water relations in the Mongolian Altai
Timescales of glacial isostatic adjustment in Greenland: is transient rheology required?
The possibility of a transient rheological response to ice age loading, first discussed in the literature of the 1980s, has received renewed attention. Transient behaviour across centennial to millennial timescales has been invoked to reconcile apparently contradictory inferences of steady-state (Maxwell) viscosity based on two distinct data sets from Greenland: Holocene sea-level curves and Global Navigation Satellite System (GNSS) derived modern crustal uplift data. To revisit this issue, we first compute depth-dependent Fréchet kernels using 1-D Maxwell viscoelastic Earth models and demonstrate that the mantle resolving power of the two Greenland data sets is highly distinct, reflecting the differing spatial scale of the associated surface loading: the sea-level records are sensitive to viscosity structure across the entire upper mantle while uplift rates associated with post-1000 CE fluctuations of the Greenland Ice Sheet have a dominant sensitivity to shallow asthenosphere viscosity. Guided by these results, we present forward models which demonstrate that a moderate low viscosity zone beneath the lithosphere in Maxwell Earth models provides a simple route to simultaneously reconciling both data sets by significantly increasing predictions of present-day uplift rates in Greenland whilst having negligible impact on predictions of Holocene relative sea-level curves from the region. Our analysis does not rule out the possibility of transient deformation, but it suggests that it is not required to simultaneously explain these two data sets. A definitive demonstration of transient behaviour requires that one account for the resolving power of the data sets in modelling the glacial isostatic adjustment process
ADP Protects Cardiac Mitochondria under Severe Oxidative Stress.
ADP is not only a key substrate for ATP generation, but also a potent inhibitor of mitochondrial permeability transition pore (mPTP). In this study, we assessed how oxidative stress affects the potency of ADP as an mPTP inhibitor and whether its reduction of reactive oxygen species (ROS) production might be involved. We determined quantitatively the effects of ADP on mitochondrial Ca(2+) retention capacity (CRC) until the induction of mPTP in normal and stressed isolated cardiac mitochondria. We used two models of chronic oxidative stress (old and diabetic mice) and two models of acute oxidative stress (ischemia reperfusion (IR) and tert-butyl hydroperoxide (t-BH)). In control mitochondria, the CRC was 344 ± 32 nmol/mg protein. 500 μmol/L ADP increased CRC to 774 ± 65 nmol/mg protein. This effect of ADP seemed to relate to its concentration as 50 μmol/L had a significantly smaller effect. Also, oligomycin, which inhibits the conversion of ADP to ATP by F0F1ATPase, significantly increased the effect of 50 μmol/L ADP. Chronic oxidative stress did not affect CRC or the effect of 500 μmol/L ADP. After IR or t-BH exposure, CRC was drastically reduced to 1 ± 0.2 and 32 ± 4 nmol/mg protein, respectively. Surprisingly, ADP increased the CRC to 447 ± 105 and 514 ± 103 nmol/mg protein in IR and t-BH, respectively. Thus, it increased CRC by the same amount as in control. In control mitochondria, ADP decreased both substrate and Ca(2+)-induced increase of ROS. However, in t-BH mitochondria the effect of ADP on ROS was relatively small. We conclude that ADP potently restores CRC capacity in severely stressed mitochondria. This effect is most likely not related to a reduction in ROS production. As the effect of ADP relates to its concentration, increased ADP as occurs in the pathophysiological situation may protect mitochondrial integrity and function
Eye-tracking based classification of Mandarin Chinese readers with and without dyslexia using neural sequence models
Eye movements are known to reflect cognitive processes in reading, and psychological reading research has shown that eye gaze patterns differ between readers with and without dyslexia. In recent years, researchers have attempted to classify readers with dyslexia based on their eye movements using Support Vector Machines (SVMs). However, these approaches (i) are based on highly aggregated features averaged over all words read by a participant, thus disregarding the sequential nature of the eye movements, and (ii) do not consider the linguistic stimulus and its interaction with the reader’s eye movements. In the present work, we propose two simple sequence models that process eye movements on the entire stimulus without the need of aggregating features across the sentence. Additionally, we incorporate the linguistic stimulus into the model in two ways---contextualized word embeddings and manually extracted linguistic features. The models are evaluated on a Mandarin Chinese dataset containing eye movements from children with and without dyslexia. Our results show that (i) even for a logographic script such as Chinese, sequence models are able to classify dyslexia on eye gaze sequences, reaching state-of-the-art performance, and (ii) incorporating the linguistic stimulus does not help to improve classification performance
- …