4,897 research outputs found

    Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

    Full text link
    The potential of large language models in medicine for education and decision making purposes has been demonstrated as they achieve decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. In this work, we evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology using the 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal gray zone cases. For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 63.65% and 74.57%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates good knowledge of statistics, CNS & eye, pediatrics, biology, and physics but has limitations in bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs well in diagnosis, prognosis, and toxicity but lacks proficiency in topics related to brachytherapy and dosimetry, as well as in-depth questions from clinical trials. For the gray zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Most importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts. Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Because of the risk of hallucination, facts provided by ChatGPT always need to be verified

    Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents

    Get PDF
    This study evaluates the quality and readability of informed consent documents generated by AI platforms ChatGPT-4 and Bard Gemini Advanced compared to those written by a first-year oral surgery resident for common oral surgery procedures. The evaluation, conducted by 18 experienced oral and maxillofacial surgeons, assessed consents for accuracy, completeness, readability, and overall quality. ChatGPT-4 consistently outperformed both Bard and human-written consents. ChatGPT-4 consents had a median accuracy score of 4 [IQR 4-4], compared to Bard's 3 [IQR 3–4] and human's 4 [IQR 3–4]. Completeness scores were higher for ChatGPT-4 (4 [IQR 4–5]) than Bard (3 [IQR 3–4]) and human (4 [IQR 3–4]). Readability was also superior for ChatGPT-4, with a median score of 4 [IQR 4–5] compared to Bard and human consents, both at 4 [IQR 4-4] and 4 [IQR 3–4], respectively. The Gunning Fog Index for ChatGPT-4 was 17.2 [IQR 16.5–18.2], better than Bard's 23.1 [IQR 20.5–24.7] and the human consents' 20 [IQR 19.2–20.9]. Overall, ChatGPT-4's consents received the highest quality ratings, underscoring AI's potential in enhancing patient communication and the informed consent process. The study suggests AI can reduce misinformation risks and improve patient understanding, but continuous evaluation, oversight, and patient feedback integration are crucial to ensure the effectiveness and appropriateness of AI-generated content in clinical practice

    ChatGPT-4 as a Tool for Reviewing Academic Books in Spanish

    Full text link
    This study evaluates the potential of ChatGPT-4, an artificial intelligence language model developed by OpenAI, as an editing tool for Spanish literary and academic books. The need for efficient and accessible reviewing and editing processes in the publishing industry has driven the search for automated solutions. ChatGPT-4, being one of the most advanced language models, offers notable capabilities in text comprehension and generation. In this study, the features and capabilities of ChatGPT-4 are analyzed in terms of grammatical correction, stylistic coherence, and linguistic enrichment of texts in Spanish. Tests were conducted with 100 literary and academic texts, where the edits made by ChatGPT-4 were compared to those made by expert human reviewers and editors. The results show that while ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time, it still faces challenges in areas such as context sensitivity, bibliometric analysis, deep contextual understanding, and interaction with visual content like graphs and tables. However, it is observed that collaboration between ChatGPT-4 and human reviewers and editors can be a promising strategy for improving efficiency without compromising quality. Furthermore, the authors consider that ChatGPT-4 represents a valuable tool in the editing process, but its use should be complementary to the work of human editors to ensure high-caliber editing in Spanish literary and academic books.Comment: Preprint. Paper accepted in the 18\textsuperscript{th} Latin American Conference on Learning Technologies (LACLO 2023), 14 page

    PEMANFAATAN LARGE LANGUAGE MODELS CHATGPT-4 UNTUK KONTROL DRONE: STUDI KASUS DJI TELLO

    Get PDF
    This study explores the potential of using ChatGPT-4, a Large Language Model (LLM) from OpenAI, for creating flight trajectories for drones. The primary focus is on the DJI Tello drone, which is controlled through Python code to execute various flying commands. Unlike traditional approaches that involve direct programming in Python, this research utilizes ChatGPT-4 to automatically generate Python computer programs capable of instructing the drone. Results indicate ChatGPT-4's intriguing capability to produce the necessary code for flying the drone according to given commands. This suggests that LLMs like ChatGPT-4 can be employed to determine drone flight trajectories using human language, facilitating ease of use compared to traditional computer programming languages

    Using ChatGPT and Other AI Engines to Vocalize Medieval Hebrew

    Get PDF
    Hebrew is usually written without vowel points, making it challenging for some readers to decipher. This is especially true of medieval Hebrew, which can have nonstandard grammar and orthography. This paper tested four artificial intelligence (AI) tools by asking them to add vowel points to an unpublished medieval Hebrew translation of the Lord’s Prayer. The vocalization tools tested were OpenAI’s ChatGPT-3.5 and ChatGPT-4, Pellaworks’ DoItInHebrew, and Dicta’s Nakdan. ChatGPT-3.5 freely changed the text, even rewriting some phrases and adding an entire sentence. ChatGPT-3.5 also provided erroneous vowels in its rewritten Hebrew text. ChatGPT-4 did a moderately good job with only a few errors, but also modified the orthography. One of ChatGPT-4’s errors was not trivial, resulting in the invention of a word. When challenged, ChatGPT-4 corrected this confabulation by inventing another word, which it claimed was a “rare form” for which it provided a fictitious derivation. When challenged on this second made-up word, ChatGPT-4 replaced the word from the input text with a word based on an entirely different root. DoItInHebrew inserted vowels that produced a gibberish text. In contrast, Dicta’s Nakdan provided near perfect vocalization, with only one genuine error, but like ChatGPT-4 it modified the orthography. ChatGPT-3.5, ChatGPT-4, and DoItInHebrew exhibited serious “hallucinations,” of both the “factual” and the “untruthful” varieties, typical of other AIs, making them counterproductive for vocalizing historic Hebrew texts. Nakdan can be a powerful tool but still requires someone with expertise in Hebrew grammar to verify and correct the vocalization. Nakdan’s interface simplified correcting the vocalization, although it required its user to have advanced knowledge of Hebrew

    Beyond the Hype—The Actual Role and Risks of AI in Today’s Medical Practice: Comparative-Approach Study

    Get PDF
    BackgroundThe evolution of artificial intelligence (AI) has significantly impacted various sectors, with health care witnessing some of its most groundbreaking contributions. Contemporary models, such as ChatGPT-4 and Microsoft Bing, have showcased capabilities beyond just generating text, aiding in complex tasks like literature searches and refining web-based queries.ObjectiveThis study explores a compelling query: can AI author an academic paper independently? Our assessment focuses on four core dimensions: relevance (to ensure that AI’s response directly addresses the prompt), accuracy (to ascertain that AI’s information is both factually correct and current), clarity (to examine AI’s ability to present coherent and logical ideas), and tone and style (to evaluate whether AI can align with the formality expected in academic writings). Additionally, we will consider the ethical implications and practicality of integrating AI into academic writing.MethodsTo assess the capabilities of ChatGPT-4 and Microsoft Bing in the context of academic paper assistance in general practice, we used a systematic approach. ChatGPT-4, an advanced AI language model by Open AI, excels in generating human-like text and adapting responses based on user interactions, though it has a knowledge cut-off in September 2021. Microsoft Bing's AI chatbot facilitates user navigation on the Bing search engine, offering tailored searchResultsIn terms of relevance, ChatGPT-4 delved deeply into AI’s health care role, citing academic sources and discussing diverse applications and concerns, while Microsoft Bing provided a concise, less detailed overview. In terms of accuracy, ChatGPT-4 correctly cited 72% (23/32) of its peer-reviewed articles but included some nonexistent references. Microsoft Bing’s accuracy stood at 46% (6/13), supplemented by relevant non–peer-reviewed articles. In terms of clarity, both models conveyed clear, coherent text. ChatGPT-4 was particularly adept at detailing technical concepts, while Microsoft Bing was more general. In terms of tone, both models maintained an academic tone, but ChatGPT-4 exhibited superior depth and breadth in content delivery.ConclusionsComparing ChatGPT-4 and Microsoft Bing for academic assistance revealed strengths and limitations. ChatGPT-4 excels in depth and relevance but falters in citation accuracy. Microsoft Bing is concise but lacks robust detail. Though both models have potential, neither can independently handle comprehensive academic tasks. As AI evolves, combining ChatGPT-4’s depth with Microsoft Bing’s up-to-date referencing could optimize academic support. Researchers should critically assess AI outputs to maintain academic credibility.</p

    Evaluating ChatGPT-4’s historical accuracy: a case study on the origins of SWOT analysis

    Get PDF
    In this study we test ChatGPT-4’s ability to provide accurate information about the origins and evolution of SWOT analysis, perhaps the most widely used strategy tool in practice worldwide. ChatGPT-4 is tested for historical accuracy and hallucinations. The API is prompted using a Python script with a series of structured questions from an Excel file and the results are recorded in another Excel file and rated on a binary scale. Our findings present a nuanced view of ChatGPT-4’s capabilities. We observe that while ChatGPT-4 demonstrates a high level of proficiency in describing and outlining the general concept of SWOT analysis, there are notable discrepancies when it comes to detailing its origins and evolution. These inaccuracies range from minor factual errors to more serious hallucinations that deviate from evidence in scholarly publications. However, we also find that ChatGPT-4 comes up with spontaneous historically accurate facts. Our interpretation of the result is that ChatGPT is largely trained on easily available websites and to a very limited extent has been trained on scholarly publications on SWOT analysis, especially when these are behind a paywall. We conclude with four propositions for future research.publishedVersio

    A Structured Framework for AutoML: Integrating LLMs through Comparative Experiments

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis work explores the potential constructive interaction between Generative AI, specifically ChatGPT-4, and Automated Machine Learning (AutoML) frameworks. The study focuses on leveraging ChatGPT-4's capabilities within the CRISP-DM (Cross-Industry Standard Process for Data Mining) model phases to improve the efficiency and effectiveness of data-driven tasks. Through a series of experiments involving classification, regression and clustering, the research compares the performance of ChatGPT-4 in two settings: a global user perspective with general prompts and a structured approach aligned with the CRISP-DM methodology, providing a comparative benchmark. The findings demonstrate that aligning ChatGPT-4's tasks with the CRISP-DM phases yields better performance and more comprehensive insights than the general prompt approach. The study highlights the importance of prompt engineering in optimizing ChatGPT-4's contributions to AutoML tasks, emphasizing its role in improving data preparation, model selection and the evaluation processes. Additionally, the research underscores the ethical considerations and potential challenges associated with integrating generative AI in AutoML, particularly concerning data quality, bias, and model interpretability

    What Did I Miss? A Demonstration of the Differences Between ChatGPT-4 and 3.5 that Impact Legal Research and Writing

    Full text link
    Many news sources are raving about how much more advanced ChatGPT-4 is than 3.5. You may have heard that ChatGPT-4 outscored 90% of test takers on the Uniform Bar Exam, while ChatGPT 3.5 only outscored 10% of test takers. But what does this mean for teaching legal research and writing? In this presentation, we will compare specific examples of ChatGPT 3.5 (the free version many of us tried in the spring) and ChatGPT-4 (the paid version released in March)
    corecore