187 research outputs found

    Reliable and Interpretable Drift Detection in Streams of Short Texts

    Full text link
    Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences. Meaningful drift interpretation is a fundamental step towards effective re-training of the model. In this study we propose an end-to-end framework for reliable model-agnostic change-point detection and interpretation in large task-oriented dialog systems, proven effective in multiple customer deployments. We evaluate our approach and demonstrate its benefits with a novel variant of intent classification training dataset, simulating customer requests to a dialog system. We make the data publicly available.Comment: ACL2023 industry track (9 pages

    Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

    Full text link
    Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical ML models sometimes output incorrect results. A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifiers, is acceptable and when it is not. In addition to business requirements that should provide a threshold, it is a best practice to require any proposed ML solution to out-perform simple baseline models, such as a decision tree. We have developed complexity measures, which quantify how difficult given observations are to assign to their true class label; these measures can then be used to automatically determine a baseline performance threshold. These measures are superior to the best practice baseline in that, for a linear computation cost, they also quantify each observation' classification complexity in an explainable form, regardless of the classifier model used. Our experiments with both numeric synthetic data and real natural language chatbot data demonstrate that the complexity measures effectively highlight data regions and observations that are likely to be misclassified.Comment: Accepted to EDSMLS workshop at AAAI conferenc

    Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

    Full text link
    Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.Comment: EMNLP2023 GEM workshop, 17 page

    Mental health problems among adolescents and young adults with childhood-onset physical disabilities: A scoping review

    Get PDF
    AimThis scoping review aims to better understand the extent and nature of research activity on the topic of mental health problems in young people with childhood-onset physical disabilities. Specifically, we document what has been investigated in terms of the occurrence and experience of mental health problems among young people with childhood-onset physical disabilities, and their access to mental health services.MethodsWe searched four databases (Medline, PsycINFO, CINAHL, Embase) for articles published between 2007 and 2019. Studies were included if they addressed: (1) young people between the ages of 13 and 24 with a childhood-onset physical disability, and (2) mental health assessment, treatment, or service access and use.ResultsWe identified 33 peer-reviewed studies that focused mainly on young people with cerebral palsy, juvenile arthritis, and spina bifida. The most common mental health problems investigated were depression and mood related difficulties (73%), anxiety (39%), and social/behavioural issues (33%) and the most common age range was 13 to 17. Ten studies explored access, use, and experiences of mental health services; stigma; caregiver mental health; and value for comprehensive care, using qualitative, quantitative, or mixed methods.ConclusionsFindings suggest the importance of developing integrated models of service delivery to identify and address the mental health needs of this population, and consensus on best practices for assessment and reporting rates of subclinical symptoms and psychiatric conditions

    Understanding the Properties of Generated Corpora

    Full text link
    Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a set of tools that examine the properties of generated text corpora. Applying these tools on various generated corpora allowed us to gain new insights into the properties of the generative models. As part of our characterization process, we found remarkable differences in the corpora generated by two leading generative technologies

    A Preliminary Study to Develop a Collaborative Tiered School-Based Physical Therapy Service Delivery Model: Results from an International Delphi Consultation

    Get PDF
    Background: Physical therapy (PT) is increasingly provided at schools to help students participate in educational activities. Recent rehabilitation models emphasized the benefits of using collaborative tiered services for service provision, yet no model is available to guide how these services should be delivered. Therefore, this study aims to determine the core attributes and PT interventions of a collaborative tiered school-based PT model that could guide how PT services are delivered in schools worldwide. Methods: A modified Delphi method was used to identify the core attributes and the PT interventions that would be part of the model. An introductory webinar followed by three Delphi rounds with 24 international experts was conducted. Similar ideas generated in Round 1 were combined into statements; the statements reaching the predetermined consensus level in Rounds 2 or 3 were retained. Categories were created to present core attributes and Tiered interventions that were retained. Results: 41 core attributes were identified and grouped under seven categories. Tiered interventions were grouped under 15 categories which included 37 interventions for Tier 1, 24 interventions for Tier 2, and 60 interventions for Tier 3. Conclusion: The recommended core attributes and interventions will support the development of an international framework for school-based PT services, fostering health promotion for all children, and supporting those with disabilities

    Predictors of activities and participation six months after mild traumatic brain injury in children and adolescents

    Get PDF
    OBJECTIVE: This study aimed to identify predictors of long-term consequences for activities and participation in children and adolescents with mild traumatic brain injury (mTBI).METHODS: A multicentre prospective longitudinal cohort study was conducted. The primary outcome measure was activities and participation measured with the Child and Adolescent Scale of Participation - CASP and completed by children (N = 156) and caregivers (N = 231) six months post-mTBI. The CASP items were categorized into home, community, school, and environment. Predictors were categorized according to the International Classification of Functioning, Disability and Health for Children and Youth. Predictors included pre-injury personal- and environmental factors, injury-related factors, symptoms, and resumption of activities in the first two weeks after mTBI. Univariate and multivariate logistic regression analyses were used to determine the predictive value of these factors.RESULTS: Results show that predictors differ across settings and perspectives (child or caregiver). Decreased activities and participation in children with mTBI can be predicted by adverse pre-injury behavioral functioning of the child (p &lt; .000 - p = .038), adverse pre-injury family functioning (p = .001), lower parental SES (p = .038), more stress symptoms post-injury (p = .017 - p = .032), more post-concussive symptoms (p = .016 - p = .028) and less resumption of activities (p = .006 - p = .045).DISCUSSION: Pre-injury factors, more symptoms post-injury and less resumption of activities should be considered when children are screened for unfavorable outcomes. Additional factors may add to the prediction, but injury-related factors do not. It is recommended that future research explores psychosocial factors, such as coping styles, emotion-regulation, personality traits, social support, and other comorbid problems of both children and caregivers.</p

    Multi-criteria analysis of measures in benchmarking: Dependability benchmarking as a case study

    Full text link
    This is the author’s version of a work that was accepted for publication in The Journal of Systems and Software. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Multi-criteria analysis of measures in benchmarking: Dependability benchmarking as a case study. Journal of Systems and Software, 111, 2016. DOI 10.1016/j.jss.2015.08.052.Benchmarks enable the comparison of computer-based systems attending to a variable set of criteria, such as dependability, security, performance, cost and/or power consumption. It is not despite its difficulty, but rather its mathematical accuracy that multi-criteria analysis of results remains today a subjective process rarely addressed in an explicit way in existing benchmarks. It is thus not surprising that industrial benchmarks only rely on the use of a reduced set of easy-to-understand measures, specially when considering complex systems. This is a way to keep the process of result interpretation straightforward, unambiguous and accurate. However, it limits at the same time the richness and depth of the analysis process. As a result, the academia prefers to characterize complex systems with a wider set of measures. Marrying the requirements of industry and academia in a single proposal remains a challenge today. This paper addresses this question by reducing the uncertainty of the analysis process using quality (score-based) models. At measure definition time, these models make explicit (i) which are the requirements imposed to each type of measure, that may vary from one context of use to another, and (ii) which is the type, and intensity, of the relation between considered measures. At measure analysis time, they provide a consistent, straightforward and unambiguous method to interpret resulting measures. The methodology and its practical use are illustrated through three different case studies from the dependability benchmarking domain, a domain where various different criteria, including both performance and dependability, are typically considered during analysis of benchmark results.. Although the proposed approach is limited to dependability benchmarks in this document, its usefulness for any type of benchmark seems quite evident attending to the general formulation of the provided solution. © 2015 Elsevier Inc. All rights reserved.This work is partially supported by the Spanish project ARENES (TIN2012-38308-C02-01), ANR French project AMORES (ANR-11-INSE-010), the Intel Doctoral Student Honour Programme 2012, and the "Programa de Ayudas de Investigacion y Desarrollo" (PAID) from the Universitat Politecnica de Valencia.Friginal López, J.; Martínez, M.; De Andrés, D.; Ruiz, J. (2016). Multi-criteria analysis of measures in benchmarking: Dependability benchmarking as a case study. Journal of Systems and Software. 111:105-118. https://doi.org/10.1016/j.jss.2015.08.052S10511811

    A strategic initiative to facilitate knowledge translation research in rehabilitation

    Get PDF
    While there is a growing body of literature supporting clinical decision-making for rehabilitation professionals, suboptimal use of evidence-based practices in that field persists. A strategic initiative that ensures the relevance of the research and its implementation in the context of rehabilitation could 1) help improve the coordination of knowledge translation (KT) research and 2) enhance the delivery of evidence-based rehabilitation services offered to patients with physical disabilities. This paper describes the process and methods used to develop a KT strategic initiative aimed at building capacity and coordinating KT research in physical rehabilitation and its strategic plan; it also reports the initial applications of the strategic plan implementation
    • …
    corecore