31 research outputs found
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
Vision-language models, while effective in general domains and showing strong
performance in diverse multi-modal applications like visual question-answering
(VQA), struggle to maintain the same level of effectiveness in more specialized
domains, e.g., medical. We propose a medical vision-language model that
integrates large vision and language models adapted for the medical domain.
This model goes through three stages of parameter-efficient training using
three separate biomedical and radiology multi-modal visual and text datasets.
The proposed model achieves state-of-the-art performance on the SLAKE 1.0
medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates
strong performance on another MedVQA dataset, VQA-RAD, achieving an overall
accuracy of 73.2%.Comment: Clinical NLP @ NAACL 202
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
With the surge of ChatGPT,the use of large models has significantly
increased,rapidly rising to prominence across the industry and sweeping across
the internet. This article is a comprehensive review of fine-tuning methods for
large models. This paper investigates the latest technological advancements and
the application of advanced methods in aspects such as task-adaptive
fine-tuning,domain-adaptive fine-tuning,few-shot learning,knowledge
distillation,multi-task learning,parameter-efficient fine-tuning,and dynamic
fine-tuning
Parallel and Scalable Hyperparameter Optimization for Distributed Deep Learning Methods on High-Performance Computing Systems
The design of Deep Learning (DL) models is a complex task, involving decisions
on the general architecture of the model (e.g., the number of layers of the Neural
Network (NN)) and on the optimization algorithms (e.g., the learning rate). These
so-called hyperparameters significantly influence the performance (e.g., accuracy or
error rates) of the final DL model and are, therefore, of great importance. However,
optimizing these hyperparameters is a computationally intensive process due to the
necessity of evaluating many combinations to identify the best-performing ones. Often,
the optimization is manually performed.
This Ph.D. thesis leverages the power of High-Performance Computing (HPC) systems
to perform automatic and efficient Hyperparameter Optimization (HPO) for DL models
that are trained on large quantities of scientific data. On modern HPO systems, equipped
with a high number of Graphics Processing Units (GPUs), it becomes possible to not
only evaluate multiple models with different hyperparameter combinations in parallel but
also to distribute the training of the models themselves to multiple GPUs. State-of-the-art HPO methods, based on the concepts of early stopping, have demonstrated significant
reductions in the runtime of the HPO process. Their performance at scale, particularly
in the context of HPC environments and when applied to large scientific datasets, has
remained unexplored. This thesis thus researches parallel and scalable HPO methods
that leverage new inherent capabilities of HPC systems and innovative workflows
incorporating novel computing paradigms. The developed HPO methods are validated
on different scientific datasets ranging from the Computational Fluid Dynamics (CFD)
to Remote Sensing (RS) domain, spanning multiple hundred Gigabytes (GBs) to several
Terabytes (TBs) in size.Að hanna Deep Learning (DL) kerfi er flókið verkefni, sem felur í sér ákvarðanir um
almennan arkitektúr kerfisins (t.d. fjölda laga) og fínstillingu á breytum (t.d. við innleiðingu kerfisins). Þessar svokölluðu ofurfæribreytur hafa veruleg áhrif á frammistöðu
staðbundna DL líkansins og eru því mjög mikilvægar. Hins vegar getur fínstilling þessa
færibreyta verið auðlindafrekt (resource-intensive) ferli vegna þess að það þarf að meta
margar samsetningar til að finna þær sem standa sig best og skila besta árangri.
Þessi Ph.D. ritgerð miðar að nýta kraftinn í ofurtölvu kerfum (High-Performance
Computing/HPC) til að framkvæma skilvirka Hyperparameter Optimization (HPO)
fyrir DL líkön sem eru þjálfuð á stórum vísinda-gagnasöfnum. Í nútíma HPC kerfum, búin fjölda grafískra vinnslueininga (GPU), verður ekki aðeins hægt að mæla
margar gerðir með mismunandi samsetningar samhliða, heldur einnig að keyra þjálfun líkananna sjálfra á mörgum GPU einingum. Nýjustu HPO aðferðir, sem byggja á
hugmyndinni um snemmbúna stöðvun, hafa sýnt verulega lækkun á keyrslutíma HPO
ferlisins. Frammistaða þeirra í stærðargráðu, sérstaklega í tengslum við HPC umhverfi
og þegar þau eru notuð í stórum vísindalegum gagnagrunnum, hefur hingað til verið
órannsakað svið. Í þessari ritgerð er leitast við að brúa þetta bil með því að innleiða
hliðstæðar og stigstærðar/skalanlegar HPO aðferðir sem nýta eðlislæga eiginleika HPC
kerfisins og verkflæði sem fela í sér innlimun nýrra reikniviðmiða. HPO aðferðirnar og
virkni þeirra hafa verið staðfest (validated) á mismunandi vísindalegum gagnasöfnum
og sviðum, allt frá Computational Fluid Dynamics (CFD) til fjarkönnunar (Remote
Sensing/RS), sem spannar nokkur hundruð gígabæt til nokkurra terabæta að stærð
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Transformer-based Large Language Models (LLMs) have been applied in diverse
areas such as knowledge bases, human interfaces, and dynamic agents, and
marking a stride towards achieving Artificial General Intelligence (AGI).
However, current LLMs are predominantly pretrained on short text snippets,
which compromises their effectiveness in processing the long-context prompts
that are frequently encountered in practical scenarios. This article offers a
comprehensive survey of the recent advancement in Transformer-based LLM
architectures aimed at enhancing the long-context capabilities of LLMs
throughout the entire model lifecycle, from pre-training through to inference.
We first delineate and analyze the problems of handling long-context input and
output with the current Transformer-based models. We then provide a taxonomy
and the landscape of upgrades on Transformer architecture to solve these
problems. Afterwards, we provide an investigation on wildly used evaluation
necessities tailored for long-context LLMs, including datasets, metrics, and
baseline models, as well as optimization toolkits such as libraries,
frameworks, and compilers to boost the efficacy of LLMs across different stages
in runtime. Finally, we discuss the challenges and potential avenues for future
research. A curated repository of relevant literature, continuously updated, is
available at https://github.com/Strivin0311/long-llms-learning.Comment: 40 pages, 3 figures, 4 table
Foundation Models for Natural Language Processing
This open access book provides a comprehensive overview of the state of the art in research and applications of Foundation Models and is intended for readers familiar with basic Natural Language Processing (NLP) concepts. Over the recent years, a revolutionary new paradigm has been developed for training models for NLP. These models are first pre-trained on large collections of text documents to acquire general syntactic knowledge and semantic information. Then, they are fine-tuned for specific tasks, which they can often solve with superhuman accuracy. When the models are large enough, they can be instructed by prompts to solve new tasks without any fine-tuning. Moreover, they can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. Because they provide a blueprint for solving many tasks in artificial intelligence, they have been called Foundation Models. After a brief introduction to basic NLP models the main pre-trained language models BERT, GPT and sequence-to-sequence transformer are described, as well as the concepts of self-attention and context-sensitive embedding. Then, different approaches to improving these models are discussed, such as expanding the pre-training criteria, increasing the length of input texts, or including extra knowledge. An overview of the best-performing models for about twenty application areas is then presented, e.g., question answering, translation, story generation, dialog systems, generating images from text, etc. For each application area, the strengths and weaknesses of current models are discussed, and an outlook on further developments is given. In addition, links are provided to freely available program code. A concluding chapter summarizes the economic opportunities, mitigation of risks, and potential developments of AI
Beyond Extractive: Advancing Abstractive Automatic Text Summarization in Norwegian with Transformers
Automatic summarization is a key area in natural language processing (NLP) and machine learning which attempts to generate informative summaries of articles and documents. Despite its evolution since the 1950s, research on automatically summarising Norwegian text has remained relatively underdeveloped. Though there have been some strides made in extractive systems, which generate summaries by selecting and condensing key phrases directly from the source material, the field of abstractive summarization remains unexplored for the Norwegian language. Abstractive summarization is distinct as it generates summaries incorporating new words and phrases not present in the original text.
This Master's thesis revolves around one key question: Is it possible to create a machine learning system capable of performing abstractive summarization in Norwegian? To answer this question, we generate and release the first two Norwegian datasets for creating and evaluating Norwegian summarization models. One of these datasets is a web scrape of Store Norske Leksikon (SNL), and the other is a machine-translated version of CNN/Daily Mail. Using these datasets, we fine-tune two Norwegian T5 language models with 580M and 1.2B parameters to create summaries. To assess the quality of the models, we employed both automatic ROUGE scores and human evaluations on the generated summaries. In an effort to better understand the model's behaviour, we measure how a model generates summaries with various metrics, including our own novel contribution which we name "Match Ratio" which measures sentence similarities between summaries and articles based on Levenshtein distances.
The top-performing models achieved ROUGE-1 scores of 35.07 and 34.02 on SNL and CNN/DM, respectively. In terms of human evaluation, the best model yielded an average score of 3.96/5.00 for SNL and 4.64/5.00 for CNN/Daily Mail across various criteria. Based on these results, we conclude that it is possible to perform abstractive summarization of Norwegian with high-quality summaries. With this research, we have laid a foundation that hopefully will facilitate future research, empowering others to build upon our findings and contribute further to the development of Norwegian summarization models
Efficient Methods for Natural Language Processing: A Survey
Recent work in natural language processing (NLP) has yielded appealing
results from scaling model parameters and training data; however, using only
scale to improve performance means that resource consumption also grows. Such
resources include data, time, storage, or energy, all of which are naturally
limited and unevenly distributed. This motivates research into efficient
methods that require fewer resources to achieve similar results. This survey
synthesizes and relates current methods and findings in efficient NLP. We aim
to provide both guidance for conducting NLP under limited resources, and point
towards promising research directions for developing more efficient methods.Comment: Accepted at TACL, pre publication versio
Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review
This systematic literature review comprehensively examines the application of
Large Language Models (LLMs) in forecasting and anomaly detection, highlighting
the current state of research, inherent challenges, and prospective future
directions. LLMs have demonstrated significant potential in parsing and
analyzing extensive datasets to identify patterns, predict future events, and
detect anomalous behavior across various domains. However, this review
identifies several critical challenges that impede their broader adoption and
effectiveness, including the reliance on vast historical datasets, issues with
generalizability across different contexts, the phenomenon of model
hallucinations, limitations within the models' knowledge boundaries, and the
substantial computational resources required. Through detailed analysis, this
review discusses potential solutions and strategies to overcome these
obstacles, such as integrating multimodal data, advancements in learning
methodologies, and emphasizing model explainability and computational
efficiency. Moreover, this review outlines critical trends that are likely to
shape the evolution of LLMs in these fields, including the push toward
real-time processing, the importance of sustainable modeling practices, and the
value of interdisciplinary collaboration. Conclusively, this review underscores
the transformative impact LLMs could have on forecasting and anomaly detection
while emphasizing the need for continuous innovation, ethical considerations,
and practical solutions to realize their full potential
A Survey of Large Language Models
Language is essentially a complex, intricate system of human expressions
governed by grammatical rules. It poses a significant challenge to develop
capable AI algorithms for comprehending and grasping a language. As a major
approach, language modeling has been widely studied for language understanding
and generation in the past two decades, evolving from statistical language
models to neural language models. Recently, pre-trained language models (PLMs)
have been proposed by pre-training Transformer models over large-scale corpora,
showing strong capabilities in solving various NLP tasks. Since researchers
have found that model scaling can lead to performance improvement, they further
study the scaling effect by increasing the model size to an even larger size.
Interestingly, when the parameter scale exceeds a certain level, these enlarged
language models not only achieve a significant performance improvement but also
show some special abilities that are not present in small-scale language
models. To discriminate the difference in parameter scale, the research
community has coined the term large language models (LLM) for the PLMs of
significant size. Recently, the research on LLMs has been largely advanced by
both academia and industry, and a remarkable progress is the launch of ChatGPT,
which has attracted widespread attention from society. The technical evolution
of LLMs has been making an important impact on the entire AI community, which
would revolutionize the way how we develop and use AI algorithms. In this
survey, we review the recent advances of LLMs by introducing the background,
key findings, and mainstream techniques. In particular, we focus on four major
aspects of LLMs, namely pre-training, adaptation tuning, utilization, and
capacity evaluation. Besides, we also summarize the available resources for
developing LLMs and discuss the remaining issues for future directions.Comment: ongoing work; 51 page
Recommended from our members
Orchestration Systems to Support Deep Learning at Scale
Deep learning (DL)’s dramatic rise in popularity across the domain sciences and industry has been accompanied by a correspondingly aggressive increase in the scale and computational complexity of DL workloads. In order to adopt state-of-the-art techniques, practitioners must wrestle with systems challenges of performance, cost, and scalability. In this dissertation, we identify the need for orchestration systems, which ease scaling burdens across the DL lifecycle through holistic, workload-aware optimizations. Drawing on both established techniques from data management research and new bespoke algorithms, we build practical orchestration engines to optimize three common DL workloads in the large-scale setting: model selection, data processing, and high-throughput serving. Our systems — which exploit workload- and context- specific opportunities — address a new layer of the large-scale DL optimization stack, more granular than current cluster managers and data systems, but still abstracted away low-level kernel & compiler optimizations. Empirical evaluations show that our orchestration techniques and systems can accelerate large-scale DL workloads by a large margin, even in complex, real-world settings. Our approach introduces a new technical lens, unifying systems, databases, and DL research, ultimately focused on democratizing and amplifying state-of-the-art DL innovations. Some of the systems proposed in this dissertation have already been adopted in production-scale industry pipelines, demonstrating the value of such orchestration optimizers for real-world DL
