56 research outputs found

    Investigating Data Contamination for Pre-training Language Models

    Full text link
    Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding of how this potential contamination might influence LMs' performance on downstream tasks. In this paper, we explore the impact of data contamination at the pre-training stage by pre-training a series of GPT-2 models \textit{from scratch}. We highlight the effect of both text contamination (\textit{i.e.}\ input text of the evaluation samples) and ground-truth contamination (\textit{i.e.}\ the prompts asked on the input and the desired outputs) from evaluation data. We also investigate the effects of repeating contamination for various downstream tasks. Additionally, we examine the prevailing n-gram-based definitions of contamination within current LLM reports, pinpointing their limitations and inadequacy. Our findings offer new insights into data contamination's effects on language model capabilities and underscore the need for independent, comprehensive contamination assessments in LLM studies.Comment: 16 pages, 5 figure

    The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

    Full text link
    Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks. However, it remains unclear whether the existing focus of NLP research accurately captures the genuine requirements of human users. This paper provides a comprehensive analysis of the divergence between current NLP research and the needs of real-world NLP applications via a large-scale collection of user-GPT conversations. We analyze a large-scale collection of real user queries to GPT. We compare these queries against existing NLP benchmark tasks and identify a significant gap between the tasks that users frequently request from LLMs and the tasks that are commonly studied in academic research. For example, we find that tasks such as ``design'' and ``planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks. We investigate these overlooked tasks, dissect the practical challenges they pose, and provide insights toward a roadmap to make LLMs better aligned with user needs.Comment: EMNLP 202

    Using Ai-Generated Suggestions From ChatGPT to Optimize Clinical Decision Support

    Get PDF
    OBJECTIVE: To determine if ChatGPT can generate useful suggestions for improving clinical decision support (CDS) logic and to assess noninferiority compared to human-generated suggestions. METHODS: We supplied summaries of CDS logic to ChatGPT, an artificial intelligence (AI) tool for question answering that uses a large language model, and asked it to generate suggestions. We asked human clinician reviewers to review the AI-generated suggestions as well as human-generated suggestions for improving the same CDS alerts, and rate the suggestions for their usefulness, acceptance, relevance, understanding, workflow, bias, inversion, and redundancy. RESULTS: Five clinicians analyzed 36 AI-generated suggestions and 29 human-generated suggestions for 7 alerts. Of the 20 suggestions that scored highest in the survey, 9 were generated by ChatGPT. The suggestions generated by AI were found to offer unique perspectives and were evaluated as highly understandable and relevant, with moderate usefulness, low acceptance, bias, inversion, redundancy. CONCLUSION: AI-generated suggestions could be an important complementary part of optimizing CDS alerts, can identify potential improvements to alert logic and support their implementation, and may even be able to assist experts in formulating their own suggestions for CDS improvement. ChatGPT shows great potential for using large language models and reinforcement learning from human feedback to improve CDS alert logic and potentially other medical areas involving complex, clinical logic, a key step in the development of an advanced learning health system

    Leveraging Explainable Artificial Intelligence to Optimize Clinical Decision Support

    Get PDF
    OBJECTIVE: To develop and evaluate a data-driven process to generate suggestions for improving alert criteria using explainable artificial intelligence (XAI) approaches. METHODS: We extracted data on alerts generated from January 1, 2019 to December 31, 2020, at Vanderbilt University Medical Center. We developed machine learning models to predict user responses to alerts. We applied XAI techniques to generate global explanations and local explanations. We evaluated the generated suggestions by comparing with alert\u27s historical change logs and stakeholder interviews. Suggestions that either matched (or partially matched) changes already made to the alert or were considered clinically correct were classified as helpful. RESULTS: The final dataset included 2 991 823 firings with 2689 features. Among the 5 machine learning models, the LightGBM model achieved the highest Area under the ROC Curve: 0.919 [0.918, 0.920]. We identified 96 helpful suggestions. A total of 278 807 firings (9.3%) could have been eliminated. Some of the suggestions also revealed workflow and education issues. CONCLUSION: We developed a data-driven process to generate suggestions for improving alert criteria using XAI techniques. Our approach could identify improvements regarding clinical decision support (CDS) that might be overlooked or delayed in manual reviews. It also unveils a secondary purpose for the XAI: to improve quality by discovering scenarios where CDS alerts are not accepted due to workflow, education, or staffing issues

    Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study

    No full text
    BackgroundArtificial intelligence models tailored to diagnose cognitive impairment have shown excellent results. However, it is unclear whether large linguistic models can rival specialized models by text alone. ObjectiveIn this study, we explored the performance of ChatGPT for primary screening of mild cognitive impairment (MCI) and standardized the design steps and components of the prompts. MethodsWe gathered a total of 174 participants from the DementiaBank screening and classified 70% of them into the training set and 30% of them into the test set. Only text dialogues were kept. Sentences were cleaned using a macro code, followed by a manual check. The prompt consisted of 5 main parts, including character setting, scoring system setting, indicator setting, output setting, and explanatory information setting. Three dimensions of variables from published studies were included: vocabulary (ie, word frequency and word ratio, phrase frequency and phrase ratio, and lexical complexity), syntax and grammar (ie, syntactic complexity and grammatical components), and semantics (ie, semantic density and semantic coherence). We used R 4.3.0. for the analysis of variables and diagnostic indicators. ResultsThree additional indicators related to the severity of MCI were incorporated into the final prompt for the model. These indicators were effective in discriminating between MCI and cognitively normal participants: tip-of-the-tongue phenomenon (P<.001), difficulty with complex ideas (P<.001), and memory issues (P<.001). The final GPT-4 model achieved a sensitivity of 0.8636, a specificity of 0.9487, and an area under the curve of 0.9062 on the training set; on the test set, the sensitivity, specificity, and area under the curve reached 0.7727, 0.8333, and 0.8030, respectively. ConclusionsChatGPT was effective in the primary screening of participants with possible MCI. Improved standardization of prompts by clinicians would also improve the performance of the model. It is important to note that ChatGPT is not a substitute for a clinician making a diagnosis

    Field study on human thermal comfort and indoor air quality in university dormitory buildings

    No full text
    Field studies on the environmental conditions and occupant thermal comfort were carried in air-conditioned buildings and no air-conditioned building in Xi’an, China. The present study aimed to explore the effect of indoor thermal history on the thermal adaptation and indoor air quality of occupants. Based on a field study, 550 and 580 data sets were obtained in naturally ventilated (NV) and spilled air-conditioned dormitory buildings (SAC), respectively. The physical environment parameters and subjective responses were explored. Most of the environment in NV mode were warmer than the current standard upper limit (28 °C). The neutral temperature of the NV group was 26.7 °C, 1.5 higher than that of the SAC group (24.6 °C). The upper limit of 80% acceptable temperature range was 29.2 °C for the NV group, 1.7 °C higher than that of the SAC group (27.5 °C). Compared to the SAC group, a warm indoor thermal history of the NV group produced a shift to higher neutral temperature and higher acceptable temperature. Differences were found in the indoor environment quality and in the occupant’s subjective satisfaction between the two groups. Compared to PMV model, the adaptive model was more applicable to spilt air-conditioned building

    Risk factors for mortality among lung cancer patients with covid-19 infection: A systematic review and meta-analysis.

    No full text
    BackgroundLung cancer patients with coronavirus disease 2019 (COVID-19) infection experience high mortality rates. The study aims to determine the risk factors for mortality in lung cancer patients with COVID-19 infection.Materials and methodsFollowed the PRISMA reporting guidelines, PubMed, Embase, and Web of Science were systematically searched to February 20, 2023, for studies of lung cancer patients with COVID-19 infection. The main outcome of interest was the risk factor for mortality. We also compared the mortality rate of those patients among different continents. A pooled risk ratio (RR) with 95% CI was presented as the result of this meta-analysis.ResultsMeta-analysis of 33 studies involving 5018 patients showed that pooled mortality rate of lung cancer in COVID-19 patients was 0.31 (95% CI: 0.25-0.36). Subgroup analysis based on the continents showed significant difference of the mortality rate was observed between Asia and the rest of world (χ2 = 98.96, P ConclusionsFindings of this meta-analysis confirms an increased risk of mortality in lung cancer patients with COVID-19 infection, whose risk factors for these patients appear to be exacerbated by older age, advanced-stage lung cancer, and comorbidities such as hypertension and cardiovascular disease

    Intelligent Metaverse Scene Content Construction

    No full text
    The integration of artificial intelligence (AI) and virtual reality (VR) has revolutionized research across various scientific fields, with AI-driven VR simulations finding applications in education, healthcare, and entertainment. However, existing literature lacks a comprehensive investigation that systematically summarizes the fundamental characteristics and development trajectory of AI-generated visual content in the metaverse. This survey focuses on intelligent metaverse scene content construction, aiming to address this gap by exploring the application of AI in content generation. It investigates scene content generation, simulation biology, personalized content, and intelligent agents. Analyzing the current state and identifying common features, this survey provides a detailed description of methods for constructing intelligent metaverse scenes. The primary contribution is a comprehensive analysis of the current landscape of intelligent visual content production in the metaverse, highlighting emerging trends. The discussion on methods for constructing intelligent scene content in the metaverse suggests that in the era of intelligence, it has the potential to become the dominant approach for content creation in metaverse scenes

    Carbon dioxide generation rates and subjects’ perception of air quality of office activities under various ambient temperatures

    No full text
    Indoor carbon dioxide (CO2) concentration is an important parameter that has been used to characterize and design indoor air quality and building ventilation. In indoor spaces, the primary source of CO2 is occupants, and the rate is always related to occupants’ activities intensity. However, the CO2 generation rates required by many applications were currently calculated by metabolic rates using equations given in the ASHRAE Handbook, which were based on the average of adults from Europe and North America that are several decades old. In addition, the ambient temperatures may also affect CO2 generation rates by affecting human metabolic reactions but were not considered. There is little systematic experimental determination of human CO2 generation rates at different activity levels and various ambient temperatures. This study experimentally determines Chinese office people’s CO2 generation rates by 28 college students (14 women and 14 men) aged 20~30, while conducting office tasks (sitting and typing, standing and typing, walking at 1 km/h, and walking at 2 km/h) at 20, 23, 26, and 29 ℃. CO2 generation rates increase significantly as activity levels increase, and slightly increased with increasing ambient temperature. With activity intensity increases, the gender and temperature differences also grow

    Analysis of adult disease characteristics and mortality on MIMIC-III.

    No full text
    PURPOSE:To deeply analyze the basic information and disease information of adult patients in the MIMIC-III (Medical Information Mart for Intensive Care III) database, and provide data reference for clinicians and researchers. MATERIALS AND METHODS:Tableau2019.1.0 and Navicat12.0.29 were used for data analysis and extraction of disease distribution of adult patients in the MIMIC-III database. RESULT:A total of 38,163 adult patients were included in the MIMIC-III database. Only 38,156 patients with the first diagnosis were selected. Among them, 21,598 were males accounting for 56.6% the median age was 66 years (Q1-Q3: 53-78), the median length of a hospital stay was 7 days (Q1-Q3: 4-12), and the median length of an ICU stay was 2.1 days (Q1-Q3: 1.2-4.1). Septicemia was the disease with the highest mortality rate among patients and the total mortality rate was 48.9%. The disease with the largest number of patients at the last time was other forms of chronic ischemic heart disease. CONCLUSION:By analyzing the patients' basic information, the admission spectrum and the disease morbidity and mortality can help more researchers understand the MIMIC-III database and facilitate further research
    • …
    corecore