Search CORE

31 research outputs found

Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering

Author: Asaadi Shima
Farri Oladimeji
Ha Cuong Nhat
Heimann Tobias
Karn Sanjeev Kumar
Runkler Thomas
Publication venue
Publication date: 24/04/2024
Field of study

Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical. We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain. This model goes through three stages of parameter-efficient training using three separate biomedical and radiology multi-modal visual and text datasets. The proposed model achieves state-of-the-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.Comment: Clinical NLP @ NAACL 202

arXiv.org e-Print Archive

Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies

Author: Weng Benjue
Publication venue
Publication date: 13/04/2024
Field of study

With the surge of ChatGPT,the use of large models has significantly increased,rapidly rising to prominence across the industry and sweeping across the internet. This article is a comprehensive review of fine-tuning methods for large models. This paper investigates the latest technological advancements and the application of advanced methods in aspects such as task-adaptive fine-tuning,domain-adaptive fine-tuning,few-shot learning,knowledge distillation,multi-task learning,parameter-efficient fine-tuning,and dynamic fine-tuning

arXiv.org e-Print Archive

Parallel and Scalable Hyperparameter Optimization for Distributed Deep Learning Methods on High-Performance Computing Systems

Author: Aach Marcel
Publication venue: University of Iceland, School of Engineering and Natural Sciences, Faculty of Industrial Engineering, Mechanical Engineering and Computer Science
Publication date: 01/01/2025
Field of study

The design of Deep Learning (DL) models is a complex task, involving decisions on the general architecture of the model (e.g., the number of layers of the Neural Network (NN)) and on the optimization algorithms (e.g., the learning rate). These so-called hyperparameters significantly influence the performance (e.g., accuracy or error rates) of the final DL model and are, therefore, of great importance. However, optimizing these hyperparameters is a computationally intensive process due to the necessity of evaluating many combinations to identify the best-performing ones. Often, the optimization is manually performed. This Ph.D. thesis leverages the power of High-Performance Computing (HPC) systems to perform automatic and efficient Hyperparameter Optimization (HPO) for DL models that are trained on large quantities of scientific data. On modern HPO systems, equipped with a high number of Graphics Processing Units (GPUs), it becomes possible to not only evaluate multiple models with different hyperparameter combinations in parallel but also to distribute the training of the models themselves to multiple GPUs. State-of-the-art HPO methods, based on the concepts of early stopping, have demonstrated significant reductions in the runtime of the HPO process. Their performance at scale, particularly in the context of HPC environments and when applied to large scientific datasets, has remained unexplored. This thesis thus researches parallel and scalable HPO methods that leverage new inherent capabilities of HPC systems and innovative workflows incorporating novel computing paradigms. The developed HPO methods are validated on different scientific datasets ranging from the Computational Fluid Dynamics (CFD) to Remote Sensing (RS) domain, spanning multiple hundred Gigabytes (GBs) to several Terabytes (TBs) in size.Að hanna Deep Learning (DL) kerfi er flókið verkefni, sem felur í sér ákvarðanir um almennan arkitektúr kerfisins (t.d. fjölda laga) og fínstillingu á breytum (t.d. við innleiðingu kerfisins). Þessar svokölluðu ofurfæribreytur hafa veruleg áhrif á frammistöðu staðbundna DL líkansins og eru því mjög mikilvægar. Hins vegar getur fínstilling þessa færibreyta verið auðlindafrekt (resource-intensive) ferli vegna þess að það þarf að meta margar samsetningar til að finna þær sem standa sig best og skila besta árangri. Þessi Ph.D. ritgerð miðar að nýta kraftinn í ofurtölvu kerfum (High-Performance Computing/HPC) til að framkvæma skilvirka Hyperparameter Optimization (HPO) fyrir DL líkön sem eru þjálfuð á stórum vísinda-gagnasöfnum. Í nútíma HPC kerfum, búin fjölda grafískra vinnslueininga (GPU), verður ekki aðeins hægt að mæla margar gerðir með mismunandi samsetningar samhliða, heldur einnig að keyra þjálfun líkananna sjálfra á mörgum GPU einingum. Nýjustu HPO aðferðir, sem byggja á hugmyndinni um snemmbúna stöðvun, hafa sýnt verulega lækkun á keyrslutíma HPO ferlisins. Frammistaða þeirra í stærðargráðu, sérstaklega í tengslum við HPC umhverfi og þegar þau eru notuð í stórum vísindalegum gagnagrunnum, hefur hingað til verið órannsakað svið. Í þessari ritgerð er leitast við að brúa þetta bil með því að innleiða hliðstæðar og stigstærðar/skalanlegar HPO aðferðir sem nýta eðlislæga eiginleika HPC kerfisins og verkflæði sem fela í sér innlimun nýrra reikniviðmiða. HPO aðferðirnar og virkni þeirra hafa verið staðfest (validated) á mismunandi vísindalegum gagnasöfnum og sviðum, allt frá Computational Fluid Dynamics (CFD) til fjarkönnunar (Remote Sensing/RS), sem spannar nokkur hundruð gígabæt til nokkurra terabæta að stærð

Opin visindi

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Author: Chen Hao
Chen Taolue
Huang Yunpeng
Jiang Zixu
Lai Junyu
Li Shupeng
Li Zenan
Ma Xiaoxing
Xu Jingwei
Yang Lijuan
Yao Yuan
Zhao Penghao
Publication venue
Publication date: 23/02/2024
Field of study

Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current LLMs are predominantly pretrained on short text snippets, which compromises their effectiveness in processing the long-context prompts that are frequently encountered in practical scenarios. This article offers a comprehensive survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs throughout the entire model lifecycle, from pre-training through to inference. We first delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. We then provide a taxonomy and the landscape of upgrades on Transformer architecture to solve these problems. Afterwards, we provide an investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as optimization toolkits such as libraries, frameworks, and compilers to boost the efficacy of LLMs across different stages in runtime. Finally, we discuss the challenges and potential avenues for future research. A curated repository of relevant literature, continuously updated, is available at https://github.com/Strivin0311/long-llms-learning.Comment: 40 pages, 3 figures, 4 table

arXiv.org e-Print Archive

Foundation Models for Natural Language Processing

Author: Giesselbach Sven
Paaß Gerhard
Publication venue
Publication date
Field of study

This open access book provides a comprehensive overview of the state of the art in research and applications of Foundation Models and is intended for readers familiar with basic Natural Language Processing (NLP) concepts. Over the recent years, a revolutionary new paradigm has been developed for training models for NLP. These models are first pre-trained on large collections of text documents to acquire general syntactic knowledge and semantic information. Then, they are fine-tuned for specific tasks, which they can often solve with superhuman accuracy. When the models are large enough, they can be instructed by prompts to solve new tasks without any fine-tuning. Moreover, they can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. Because they provide a blueprint for solving many tasks in artificial intelligence, they have been called Foundation Models. After a brief introduction to basic NLP models the main pre-trained language models BERT, GPT and sequence-to-sequence transformer are described, as well as the concepts of self-attention and context-sensitive embedding. Then, different approaches to improving these models are discussed, such as expanding the pre-training criteria, increasing the length of input texts, or including extra knowledge. An overview of the best-performing models for about twenty application areas is then presented, e.g., question answering, translation, story generation, dialog systems, generating images from text, etc. For each application area, the strengths and weaknesses of current models are discussed, and an outlook on further developments is given. In addition, links are provided to freely available program code. A concluding chapter summarizes the economic opportunities, mitigation of risks, and potential developments of AI

OAPEN Library

Beyond Extractive: Advancing Abstractive Automatic Text Summarization in Norwegian with Transformers

Author: Korsvik Jon-Mikkel Ryen
Navjord Jørgen Johnsen
Publication venue: Norwegian University of Life Sciences
Publication date: 01/01/2023
Field of study

Automatic summarization is a key area in natural language processing (NLP) and machine learning which attempts to generate informative summaries of articles and documents. Despite its evolution since the 1950s, research on automatically summarising Norwegian text has remained relatively underdeveloped. Though there have been some strides made in extractive systems, which generate summaries by selecting and condensing key phrases directly from the source material, the field of abstractive summarization remains unexplored for the Norwegian language. Abstractive summarization is distinct as it generates summaries incorporating new words and phrases not present in the original text. This Master's thesis revolves around one key question: Is it possible to create a machine learning system capable of performing abstractive summarization in Norwegian? To answer this question, we generate and release the first two Norwegian datasets for creating and evaluating Norwegian summarization models. One of these datasets is a web scrape of Store Norske Leksikon (SNL), and the other is a machine-translated version of CNN/Daily Mail. Using these datasets, we fine-tune two Norwegian T5 language models with 580M and 1.2B parameters to create summaries. To assess the quality of the models, we employed both automatic ROUGE scores and human evaluations on the generated summaries. In an effort to better understand the model's behaviour, we measure how a model generates summaries with various metrics, including our own novel contribution which we name "Match Ratio" which measures sentence similarities between summaries and articles based on Levenshtein distances. The top-performing models achieved ROUGE-1 scores of 35.07 and 34.02 on SNL and CNN/DM, respectively. In terms of human evaluation, the best model yielded an average score of 3.96/5.00 for SNL and 4.64/5.00 for CNN/Daily Mail across various criteria. Based on these results, we conclude that it is possible to perform abstractive summarization of Norwegian with high-quality summaries. With this research, we have laid a foundation that hopefully will facilitate future research, empowering others to build upon our findings and contribute further to the development of Norwegian summarization models

Brage NMBU

Efficient Methods for Natural Language Processing: A Survey

Author: Balasubramanian Niranjan
Cao Qingqing
Ciosici Manuel R.
Derczynski Leon
Dodge Jesse
Forde Jessica Zosa
Gurevych Iryna
Hassid Michael
Heafield Kenneth
Hooker Sara
Ji Tianchu
Lee Ji-Ung
Martins André F. T.
Martins Pedro H.
Milder Peter
Raffel Colin
Schwartz Roy
Simpson Edwin
Slonim Noam
Strubell Emma
Treviso Marcos
van Aken Betty
Publication venue
Publication date: 01/01/2023
Field of study

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.Comment: Accepted at TACL, pre publication versio

arXiv.org e-Print Archive

TUbiblio

Crossref

Directory of Open Access Journals

Explore Bristol Research

Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review

Author: Jiang Chufeng
Jin Xin
Jing Zhi
Lin Junhong
Ma Hongda
Qiao Yuxin
Su Jing
Wei Rong
Xiao Tingsong
Xu Jiajun
Publication venue
Publication date: 15/02/2024
Field of study

This systematic literature review comprehensively examines the application of Large Language Models (LLMs) in forecasting and anomaly detection, highlighting the current state of research, inherent challenges, and prospective future directions. LLMs have demonstrated significant potential in parsing and analyzing extensive datasets to identify patterns, predict future events, and detect anomalous behavior across various domains. However, this review identifies several critical challenges that impede their broader adoption and effectiveness, including the reliance on vast historical datasets, issues with generalizability across different contexts, the phenomenon of model hallucinations, limitations within the models' knowledge boundaries, and the substantial computational resources required. Through detailed analysis, this review discusses potential solutions and strategies to overcome these obstacles, such as integrating multimodal data, advancements in learning methodologies, and emphasizing model explainability and computational efficiency. Moreover, this review outlines critical trends that are likely to shape the evolution of LLMs in these fields, including the push toward real-time processing, the importance of sustainable modeling practices, and the value of interdisciplinary collaboration. Conclusively, this review underscores the transformative impact LLMs could have on forecasting and anomaly detection while emphasizing the need for continuous innovation, ethical considerations, and practical solutions to realize their full potential

arXiv.org e-Print Archive

A Survey of Large Language Models

Author: Chen Yushuo
Chen Zhipeng
Dong Zican
Du Yifan
Hou Yupeng
Jiang Jinhao
Li Junyi
Li Yifan
Liu Peiyu
Liu Zikang
Min Yingqian
Nie Jian-Yun
Ren Ruiyang
Tang Tianyi
Tang Xinyu
Wang Xiaolei
Wen Ji-Rong
Yang Chen
Zhang Beichen
Zhang Junjie
Zhao Wayne Xin
Zhou Kun
Publication venue
Publication date: 31/03/2023
Field of study

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.Comment: ongoing work; 51 page

arXiv.org e-Print Archive

Recommended from our members

Orchestration Systems to Support Deep Learning at Scale

Author: Nagrecha Kabir
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Deep learning (DL)’s dramatic rise in popularity across the domain sciences and industry has been accompanied by a correspondingly aggressive increase in the scale and computational complexity of DL workloads. In order to adopt state-of-the-art techniques, practitioners must wrestle with systems challenges of performance, cost, and scalability. In this dissertation, we identify the need for orchestration systems, which ease scaling burdens across the DL lifecycle through holistic, workload-aware optimizations. Drawing on both established techniques from data management research and new bespoke algorithms, we build practical orchestration engines to optimize three common DL workloads in the large-scale setting: model selection, data processing, and high-throughput serving. Our systems — which exploit workload- and context- specific opportunities — address a new layer of the large-scale DL optimization stack, more granular than current cluster managers and data systems, but still abstracted away low-level kernel & compiler optimizations. Empirical evaluations show that our orchestration techniques and systems can accelerate large-scale DL workloads by a large margin, even in complex, real-world settings. Our approach introduces a new technical lens, unifying systems, databases, and DL research, ultimately focused on democratizing and amplifying state-of-the-art DL innovations. Some of the systems proposed in this dissertation have already been adopted in production-scale industry pipelines, demonstrating the value of such orchestration optimizers for real-world DL

eScholarship - University of California