3,901 research outputs found

    Inconsistent dialogue responses and how to recover from them

    Full text link
    One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recovery utterances are authored by annotators. This covers the life span of inconsistencies, namely introduction, understanding, and resolution. Building on this, we introduce a set of tasks centered on dialogue consistency, specifically focused on its detection and resolution. Our experimental findings indicate that our dataset significantly helps the progress in identifying and resolving conversational inconsistencies, and current popular large language models like ChatGPT which are good at resolving inconsistencies however still struggle with detection.Comment: Accepted in EACL 2024. Code and dataset available at https://github.com/mianzhang/CIDE

    Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing

    Full text link
    Current natural language processing (NLP) models such as BERT and RoBERTa have achieved high overall performance, but they often make systematic errors due to bias or certain difficult features to learn. Thus research on slice detection models (SDM) which automatically identifies underperforming groups of datapoints has gradually caught more attention, which aims at both understanding model behaviors and providing insights for future model training and designing. However, there is little systematic research on SDM and quantitative evaluation of its assessment for NLP models. Our paper fills this gap by proposing "Discover, Explanation, Improvement" framework that discovers coherent and underperforming groups of datapoints and unites datapoints of each slice under human-understandable concepts; it also provides comprehensive evaluation tasks and the corresponding quantitative metrics, which enable convenient comparison for future works. Results show that our framework can accurately select error-prone datapoints with informative semantic features that summarize error patterns, based on which it directly boosts model performance by an average of 2.85 points based on trained models without tuning any parameters across multiple datasets.Comment: 15 pages, 5 figure

    Collaborative decoding of critical tokens for boosting factuality of large language models

    Full text link
    The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model. Finetuned and aligned models show improved abilities of instruction following and safe generation, however their abilities to stay factual about the world are impacted by the finetuning process. Furthermore, the common practice of using sampling during generation also increases chances of hallucination. In this work, we introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens. We first design a critical token classifier to decide which model to use for the next token, and subsequently generates the next token using different decoding strategies. Experiments with different models and datasets show that our decoding framework is able to reduce model hallucination significantly, showcasing the importance of the collaborative decoding framework.Comment: work in progres

    Large-scale preparation of active caspase-3 in E. coli by designing its thrombin-activatable precursors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Caspase-3, a principal apoptotic effector that cleaves the majority of cellular substrates, is an important medicinal target for the treatment of cancers and neurodegenerative diseases. Large amounts of the protein are required for drug discovery research. However, previous efforts to express the full-length caspase-3 gene in <it>E. coli </it>have been unsuccessful.</p> <p>Results</p> <p>Overproducers of thrombin-activatable full-length caspase-3 precursors were prepared by engineering the auto-activation sites of caspase-3 precursor into a sequence susceptible to thrombin hydrolysis. The engineered precursors were highly expressed as soluble proteins in <it>E. coli </it>and easily purified by affinity chromatography, to levels of 10–15 mg from 1 L of <it>E. coli </it>culture, and readily activated by thrombin digestion. Kinetic evaluation disclosed that thrombin digestion enhanced catalytic activity (<it>k</it><sub>cat</sub>/<it>K</it><sub><it>M</it></sub>) of the precursor proteins by two orders of magnitude.</p> <p>Conclusion</p> <p>A novel method for a large-scale preparation of active caspase-3 was developed by a strategic engineering to lack auto-activation during expression with amino acid sequences susceptible to thrombin, facilitating high-level expression in <it>E. coli</it>. The precursor protein was easily purified and activated through specific cleavage at the engineered sites by thrombin, generating active caspase-3 in high yields.</p

    A Questionnaire Study on Attitudes toward Birth and Child-rearing of University Students in Japan, China,and South Korea

    Get PDF
    This study examines the attitudes of young Japanese, Chinese, and South Koreans toward birth and child-rearing. The survey targeted four-year university students (n=1,668) who responded to an anonymous survey using self-report questionnaires between December 2012 and April 2013. The collection rates were 72.5%, 94.7%, and 96.5% for the Japanese, Chinese, and South Korean students, respectively. Correlations among the respondentsʼ attributes, medical and scientific literacy levels, and views of preferred qualities of children were analyzed using chi-square test, supplemented by residual analysis (significance level set at p<0.05). Participants were asked whether they were willing to use the following methods for obtaining preferred qualities in their children:(1) choosing a spouse (43.2%, 72.6%, and 85.1% of the Japanese, Chinese, and South Koreans, respectively, agreed);(2) using a sperm bank (cryobank) (5.8%, 60.1%, and 81.7% of the Japanese, Chines, and South Koreans, respectively, agreed);and (3) using an egg cell bank (ova bank or cryobank) (5.3%, 47.2%, and 70.3% of the Japanese, Chinese, and South Koreans, respectively, agreed). The proportion of affirmative responses (indicating “eugenic inclination”) to these statements was significantly higher among the Chinese and South Korean participants than their Japanese counterparts (p<0.001). Significant differences were also found in the attitudes of the 3 groups toward methods for obtaining the preferred qualities for their children:prenatal diagnosis, pre-implantation diagnosis, the environment during pregnancy, and child-rearing

    Associations between Organochlorine Pesticides and Vitamin D Deficiency in the U.S. Population

    Get PDF
    Background: Recently low dose organochlorine (OC) pesticides have been strongly linked to various chronic diseases including diabetes and cardiovascular diseases. Both field and animal studies have suggested a possibility that persistent lipophilic chemicals like OC pesticides can cause vitamin D deficiency, but there have been no human studies of exposure to any chemical as a possible cause of vitamin D deficiency. This study was performed to examine if serum concentrations of OC pesticides were associated with serum concentrations of 25-hydroxyvitamin D (25(OH)D) in the U.S. general population. Methodology/Principal Findings: Cross-sectional associations of serum OC pesticides with serum 25(OH)D were investigated in 1,275 subjects aged 20intheNationalHealthandNutritionExaminationSurvey(NHANES),2003–2004.Weselected7OCpesticidesdetectablein20 in the National Health and Nutrition Examination Survey(NHANES), 2003–2004. We selected 7 OC pesticides detectable in 80 % of participants. Among the 7 OC pesticides, p,p9-DDT (b = 20.022, P,0.01), p,p9-DDE (b = 20.018, P = 0.04), and b-hexachlorocyclohexane (b = 20.022, P = 0.02) showed significant inverse associations with serum 25(OH)D levels. When study subjects were stratified by age, race, and the presence of various chronic diseases, p,p9-DDT showed consistent inverse associations in all subgroups, although stronger associations tended to be observed among subjects with old age, white race, or chronic diseases. Conclusion/Significance: The current study suggests that the background exposure to some OC pesticides leads to vitamin D deficiency in human. Considering the importance of vitamin D deficiency in the development of chronic diseases

    Interactions between Food Additive Silica Nanoparticles and Food Matrices

    Get PDF
    Nanoparticles (NPs) have been widely utilized in the food industry as additives with their beneficial characteristics, such as improving sensory property and processing suitability, enhancing functional and nutritional values, and extending shelf-life of foods. Silica is used as an anti-caking agent to improve flow property of powered ingredients and as a carrier for flavors or active compounds in food. Along with the rapid development of nanotechnology, the sizes of silica fall into nanoscale, thereby raising concerns about the potential toxicity of nano-sized silica materials. There have been a number of studies carried out to investigate possible adverse effects of NPs on the gastrointestinal tract. The interactions between NPs and surrounding food matrices should be also taken into account since the interactions can affect their bioavailability, efficacy, and toxicity. In the present study, we investigated the interactions between food additive silica NPs and food matrices, such as saccharides, proteins, lipids, and minerals. Quantitative analysis was performed to determine food component-NP corona using HPLC, fluorescence quenching, GC-MS, and ICP-AES. The results demonstrate that zeta potential and hydrodynamic radius of silica NPs changed in the presence of all food matrices, but their solubility was not affected. However, quantitative analysis on the interactions revealed that a small portion of food matrices interacted with silica NPs and the interactions were highly dependent on the type of food component. Moreover, minor nutrients could also affect the interactions, as evidenced by higher NP interaction with honey rather than with a simple sugar mixture containing an equivalent amount of fructose, glucose, sucrose, and maltose. These findings provide fundamental information to extend our understanding about the interactions between silica NPs and food components and to predict the interaction effect on the safety aspects of food-grade NPs

    The Trickle-down Impact of Reward (In-)consistency on RLHF

    Full text link
    Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- and their impact on the downstream RLHF model. In this paper, we visit a series of research questions relevant to RM inconsistency: (1) How can we measure the consistency of reward models? (2) How consistent are the existing RMs and how can we improve them? (3) In what ways does reward inconsistency influence the chatbots resulting from the RLHF model training? We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM is expected to rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans. To show that RM consistency can be improved efficiently without using extra training budget, we propose two techniques ConvexDA and RewardFusion, which enhance reward consistency through extrapolation during the RM training and inference stage, respectively. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process
    • …
    corecore