3,285 research outputs found

    Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems

    Full text link
    Despite tremendous advancements in dialogue systems, stable evaluation still requires human judgments producing notoriously high-variance metrics due to their inherent subjectivity. Moreover, methods and labels in dialogue evaluation are not fully standardized, especially for open-domain chats, with a lack of work to compare and assess the validity of those approaches. The use of inconsistent evaluation can misinform the performance of a dialogue system, which becomes a major hurdle to enhance it. Thus, a dimensional evaluation of chat-oriented open-domain dialogue systems that reliably measures several aspects of dialogue capabilities is desired. This paper presents a novel human evaluation method to estimate the rates of many dialogue system behaviors. Our method is used to evaluate four state-of-the-art open-domain dialogue systems and compared with existing approaches. The analysis demonstrates that our behavior method is more suitable than alternative Likert-style or comparative approaches for dimensional evaluation of these systems.Comment: Accepted to ACL 2023; first two authors contributed equall

    Leveraging Large Language Models for Automated Dialogue Analysis

    Full text link
    Developing high-performing dialogue systems benefits from the automatic identification of undesirable behaviors in system responses. However, detecting such behaviors remains challenging, as it draws on a breadth of general knowledge and understanding of conversational practices. Although recent research has focused on building specialized classifiers for detecting specific dialogue behaviors, the behavior coverage is still incomplete and there is a lack of testing on real-world human-bot interactions. This paper investigates the ability of a state-of-the-art large language model (LLM), ChatGPT-3.5, to perform dialogue behavior detection for nine categories in real human-bot dialogues. We aim to assess whether ChatGPT can match specialized models and approximate human performance, thereby reducing the cost of behavior detection tasks. Our findings reveal that neither specialized models nor ChatGPT have yet achieved satisfactory results for this task, falling short of human performance. Nevertheless, ChatGPT shows promising potential and often outperforms specialized detection models. We conclude with an in-depth examination of the prevalent shortcomings of ChatGPT, offering guidance for future research to enhance LLM capabilities.Comment: Accepted to SIGDIAL 202

    Factors Important to Older Adults Who Disagree With a Deprescribing Recommendation.

    Get PDF
    IMPORTANCE Little is known about why older adults decline deprescribing recommendations, primarily because interventional studies rarely capture the reasons. OBJECTIVE To examine factors important to older adults who disagree with a deprescribing recommendation given by a primary care physician to a hypothetical patient experiencing polypharmacy. DESIGN, SETTING, AND PARTICIPANTS This online, vignette-based survey study was conducted from December 1, 2020, to March 31, 2021, with participants 65 years or older in the United Kingdom, the US, Australia, and the Netherlands. The primary outcome of the main study was disagreement with a deprescribing recommendation. A content analysis was subsequently conducted of the free-text reasons provided by participants who strongly disagreed or disagreed with deprescribing. Data were analyzed from August 22, 2022, to February 12, 2023. MAIN OUTCOMES AND MEASURES Attitudes, beliefs, fears, and recommended actions of older adults in response to deprescribing recommendations. RESULTS Of the 899 participants included in the analysis, the mean (SD) age was 71.5 (4.9) years; 456 participants (50.7%) were men. Attitudes, beliefs, and fears reported by participants included doubts about deprescribing (361 [40.2%]), valuing medications (139 [15.5%]), and a preference to avoid change (132 [14.7%]). Valuing medications was reported more commonly among participants who strongly disagreed compared with those who disagreed with deprescribing (48 of 205 [23.4%] vs 91 of 694 [13.1%], respectively; P < .001) or had personal experience with the same medication class as the vignette compared with no experience (93 of 517 [18.0%] vs 46 of 318 [12.1%], respectively; P = .02). Participants shared that improved communication (225 [25.0%]), alternative strategies (138 [15.4%]), and consideration of medication preferences (137 [15.2%]) may increase their agreement with deprescribing. Participants who disagreed compared with those who strongly disagreed were more interested in additional communication (196 [28.2%] vs 29 [14.2%], respectively; P < .001), alternative strategies (117 [16.9%] vs 21 [10.2%], respectively; P = .02), or consideration of medication preferences (122 [17.6%] vs 15 [7.3%], respectively; P < .001). CONCLUSIONS AND RELEVANCE In this survey study, older adults who disagreed with a deprescribing recommendation were more interested in additional communication, alternative strategies, or consideration of medication preferences compared with those who strongly disagreed. These findings suggest that identifying the degree of disagreement with deprescribing could be used to tailor patient-centered communication about deprescribing in older adults

    Relationships between Endogenous Plasma Biomarkers of Constitutive Cytochrome P450 3A Activity and Single-Time-Point Oral Midazolam Microdose Phenotype in Healthy Subjects

    Get PDF
    Due to high basal interindividual variation in cytochrome P450 3A (CYP3A) activity and susceptibility to drug interactions, there has been interest in the application of efficient probe drug phenotyping strategies, as well as endogenous biomarkers for assessment of in vivo CYP3A activity. The biomarkers 4β-hydroxycholesterol (4βHC) and 6β-hydroxycortisol (6βHCL) are sensitive to CYP3A induction and inhibition. However, their utility for the assessment of constitutive CYP3A activity remains uncertain. We investigated whether endogenous plasma biomarkers (4βHC and 6βHCL) are associated with basal CYP3A metabolic activity in healthy subjects assessed by a convenient single-time-point oral midazolam (MDZ) phenotyping strategy. Plasma 4βHC and 6βHCL metabolic ratios (MRs) were analysed in 51 healthy adult participants. CYP3A activity was determined after administration of an oral MDZ microdose (100 μg). Simple linear and multiple linear regression analyses were performed to assess relationships between MDZ oral clearance, biomarkers and subject covariates. Among study subjects, basal MDZ oral clearance, 4βHC and 6βHCL MRs ranged 6.5-, 10- and 13-fold, respectively. Participant age and alcohol consumption were negatively associated with MDZ oral clearance (p = 0.03 and p = 0.045, respectively), while weight and female sex were associated with lower plasma 4βHC MR (p = 0.0003 and p = 0.032, respectively). Neither 4βHC nor 6βHCL MRs were associated with MDZ oral clearance. Plasma 4βHC and 6βHCL MRs do not relate to MDZ single-time-point metabolic phenotype in the assessment of constitutive CYP3A activity among healthy individuals

    Train Small, Model Big: Scalable Physics Simulators via Reduced Order Modeling and Domain Decomposition

    Full text link
    Numerous cutting-edge scientific technologies originate at the laboratory scale, but transitioning them to practical industry applications is a formidable challenge. Traditional pilot projects at intermediate scales are costly and time-consuming. An alternative, the E-pilot, relies on high-fidelity numerical simulations, but even these simulations can be computationally prohibitive at larger scales. To overcome these limitations, we propose a scalable, physics-constrained reduced order model (ROM) method. ROM identifies critical physics modes from small-scale unit components, projecting governing equations onto these modes to create a reduced model that retains essential physics details. We also employ Discontinuous Galerkin Domain Decomposition (DG-DD) to apply ROM to unit components and interfaces, enabling the construction of large-scale global systems without data at such large scales. This method is demonstrated on the Poisson and Stokes flow equations, showing that it can solve equations about 15−4015 - 40 times faster with only ∼\sim 1%1\% relative error. Furthermore, ROM takes one order of magnitude less memory than the full order model, enabling larger scale predictions at a given memory limitation.Comment: 40 pages, 12 figures. Submitted to Computer Methods in Applied Mechanics and Engineerin

    Response to COVID-19 vaccination in patients on cancer therapy:Analysis in a SARS-CoV-2-naïve population

    Get PDF
    Background: Cancer patients have increased morbidity and mortality from COVID-19, but may respond poorly to vaccination. The Evaluation of COVID-19 Vaccination Efficacy and Rare Events in Solid Tumors (EVEREST) study, comparing seropositivity between cancer patients and healthy controls in a low SARS-CoV-2 community-transmission setting, allows determination of vaccine response with minimal interference from infection. Methods: Solid tumor patients from The Canberra Hospital, Canberra, Australia, and healthy controls who received COVID-19 vaccination between March 2021 and January 2022 were included. Blood samples were collected at baseline, pre-second vaccine dose and at 1, 3 (primary endpoint), and 6 months post-second dose. SARS-CoV-2 anti-spike-RBD (S-RBD) and anti-nucleocapsid IgG antibodies were measured. Results: Ninety-six solid tumor patients and 20 healthy controls were enrolled, with median age 62 years, and 60% were female. Participants received either AZD1222 (65%) or BNT162b2 (35%) COVID-19 vaccines. Seropositivity 3 months post vaccination was 87% (76/87) in patients and 100% (20/20) in controls (p =.12). Seropositivity was observed in 84% of patients on chemotherapy, 80% on immunotherapy, and 96% on targeted therapy (differences not satistically significant). Seropositivity in cancer patients increased from 40% (6/15) after first dose, to 95% (35/37) 1 month after second dose, then dropped to 87% (76/87) 3 months after second dose. Conclusion: Most patients and all controls became seropositive after two vaccine doses. Antibody concentrations and seropositivity showed a decrease between 1 and 3 months post vaccination, highlighting need for booster vaccinations. SARS-CoV-2 infection amplifies S-RBD antibody responses; however, cannot be adequately identified using nucleocapsid serology. This underlines the value of our COVID-naïve population in studying vaccine immunogenicity.</p

    How are falls and fear of falling associated with objectively measured physical activity in a cohort of community-dwelling older men?

    Get PDF
    BACKGROUND: Falls affect approximately one third of community-dwelling older adults each year and have serious health and social consequences. Fear of falling (FOF) (lack of confidence in maintaining balance during normal activities) affects many older adults, irrespective of whether they have actually experienced falls. Both falls and fear of falls may result in restrictions of physical activity, which in turn have health consequences. To date the relation between (i) falls and (ii) fear of falling with physical activity have not been investigated using objectively measured activity data which permits examination of different intensities of activity and sedentary behaviour. METHODS: Cross-sectional study of 1680 men aged 71-92 years recruited from primary care practices who were part of an on-going population-based cohort. Men reported falls history in previous 12 months, FOF, health status and demographic characteristics. Men wore a GT3x accelerometer over the hip for 7 days. RESULTS: Among the 12% of men who had recurrent falls, daily activity levels were lower than among non-fallers; 942 (95% CI 503, 1381) fewer steps/day, 12(95% CI 2, 22) minutes less in light activity, 10(95% CI 5, 15) minutes less in moderate to vigorous PA [MVPA] and 22(95% CI 9, 35) minutes more in sedentary behaviour. 16% (n = 254) of men reported FOF, of whom 52% (n = 133) had fallen in the past year. Physical activity deficits were even greater in the men who reported that they were fearful of falling than in men who had fallen. Men who were fearful of falling took 1766(95% CI 1391, 2142) fewer steps/day than men who were not fearful, and spent 27(95% CI 18, 36) minutes less in light PA, 18(95% CI 13, 22) minutes less in MVPA, and 45(95% CI 34, 56) minutes more in sedentary behaviour. The significant differences in activity levels between (i) fallers and non-fallers and (ii) men who were fearful of falling or not fearful, were mediated by similar variables; lower exercise self-efficacy, fewer excursions from home and more mobility difficulties. CONCLUSIONS: Falls and in particular fear of falling are important barriers to older people gaining health benefits of walking and MVPA. Future studies should assess the longitudinal associations between falls and physical activity
    • …
    corecore