10 research outputs found
Overview of ImageCLEFmedical 2022 – Caption Prediction and Concept Detection
The 2022 ImageCLEFmedical caption prediction and concept detection tasks follow similar challenges that were already run from 2017–2021. The objective is to extract Unified Medical Language System (UMLS) concept annotations and/or captions from the image data that are then compared against the original text captions of the images. The images used for both tasks are a subset of the extended Radiology Objects in COntext (ROCO) data set which was used in ImageCLEFmedical 2020. In the caption prediction task, lexical similarity with the original image captions is evaluated with the BiLingual Evaluation Understudy (BLEU) score. In the concept detection task, UMLS terms are extracted from the original text captions, combined with manually curated concepts for image modality and anatomy, and compared against the predicted concepts in a multi-label way. The F1-score was used to assess the performance. The task attracted a strong participation with 20 registered teams. In the end, 12 teams submitted 157 graded runs for the two subtasks. Results show that there is a variety of techniques that can lead to good prediction results for the two tasks. Participants used image retrieval systems for both tasks, while multi-label classification systems were used mainly for the concept detection, and Transformer-based architectures
primarily for the caption prediction subtask
On the Impact of Cross-Domain Data on German Language Models
Traditionally, large language models have been either trained on general web
crawls or domain-specific data. However, recent successes of generative large
language models, have shed light on the benefits of cross-domain datasets. To
examine the significance of prioritizing data diversity over quality, we
present a German dataset comprising texts from five domains, along with another
dataset aimed at containing high-quality data. Through training a series of
models ranging between 122M and 750M parameters on both datasets, we conduct a
comprehensive benchmark on multiple downstream tasks. Our findings demonstrate
that the models trained on the cross-domain dataset outperform those trained on
quality data alone, leading to improvements up to over the previous
state-of-the-art. The models are available at
https://huggingface.co/ikim-uk-essenComment: 13 pages, 1 figure, accepted at Findings of the Association for
Computational Linguistics: EMNLP 202
Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Recent advances in natural language processing (NLP) can be largely
attributed to the advent of pre-trained language models such as BERT and
RoBERTa. While these models demonstrate remarkable performance on general
datasets, they can struggle in specialized domains such as medicine, where
unique domain-specific terminologies, domain-specific abbreviations, and
varying document structures are common. This paper explores strategies for
adapting these models to domain-specific requirements, primarily through
continuous pre-training on domain-specific data. We pre-trained several German
medical language models on 2.4B tokens derived from translated public English
medical data and 3B tokens of German clinical data. The resulting models were
evaluated on various German downstream tasks, including named entity
recognition (NER), multi-label classification, and extractive question
answering. Our results suggest that models augmented by clinical and
translation-based pre-training typically outperform general domain models in
medical contexts. We conclude that continuous pre-training has demonstrated the
ability to match or even exceed the performance of clinical models trained from
scratch. Furthermore, pre-training on clinical data or leveraging translated
texts have proven to be reliable methods for domain adaptation in medical NLP
tasks.Comment: Accepted at LREC-COLING 202
ImageCLEF 2022: Multimedia Retrieval in Medical, Nature, Fusion, and Internet Applications
ImageCLEF is part of the Conference and Labs of the Evaluation Forum (CLEF) since 2003. CLEF 2022 will take place in Bologna, Italy. ImageCLEF is an ongoing evaluation initiative which promotes the evaluation of technologies for annotation, indexing, and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In its 20th edition, ImageCLEF will have four main tasks: (i) a Medical task addressing concept annotation, caption prediction, and tuberculosis detection; (ii) a Coral task addressing the annotation and localisation of substrates in coral reef images; (iii) an Aware task addressing the prediction of real-life consequences of online photo sharing; and (iv) a new Fusion task addressing late fusion techniques based on the expertise of the pool of classifiers. In 2021, over 100 research groups registered at ImageCLEF with 42 groups submitting more than 250 runs. These numbers show that, despite the COVID-19 pandemic, there is strong interest in the evaluation campaign
ImageCLEF 2023 highlight ::multimedia retrieval in medical, social media and content recommendation applications
In this paper, we provide an overview of the upcoming ImageCLEF campaign. ImageCLEF is part of the CLEF Conference and Labs of the Evaluation Forum since 2003. ImageCLEF, the Multimedia Retrieval task in CLEF, is an ongoing evaluation initiative that promotes the evaluation of technologies for annotation, indexing, and retrieval of multimodal data with the aim of providing information access to large collections of data in various usage scenarios and domains. In its 21st edition, ImageCLEF 2023 will have four main tasks: (i) a Medical task addressing automatic image captioning, synthetic medical images created with GANs, Visual Question Answering for colonoscopy images, and medical dialogue summarization; (ii) an Aware task addressing the prediction of real-life consequences of online photo sharing; (iii) a Fusion task addressing late fusion techniques based on the expertise of a pool of classifiers; and (iv) a Recommending task addressing cultural heritage content-recommendation. In 2022, ImageCLEF received the participation of over 25 groups submitting more than 258 runs. These numbers show the impact of the campaign. With the COVID-19 pandemic now over, we expect that the interest in participating, especially at the physical CLEF sessions, will increase significantly in 2023
Overview of ImageCLEFmedical 2022 ::caption prediction and concept detection
The 2022 ImageCLEFmedical caption prediction and concept detection tasks follow similar challenges that were already run from 2017–2021. The objective is to extract Unified Medical Language System (UMLS) concept annotations and/or captions from the image data that are then compared against the original text captions of the images. The images used for both tasks are a subset of the extended Radiology Objects in COntext (ROCO) data set which was used in ImageCLEFmedical 2020. In the caption prediction task, lexical similarity with the original image captions is evaluated with the BiLingual Evaluation Understudy (BLEU) score. In the concept detection task, UMLS terms are extracted from the original text captions, combined with manually curated concepts for image modality and anatomy, and compared against the predicted concepts in a multi-label way. The F1-score was used to assess the performance. The task attracted a strong participation with 20 registered teams. In the end, 12 teams submitted 157 graded runs for the two subtasks. Results show that there is a variety of techniques that can lead to good prediction results for the two tasks. Participants used image retrieval systems for both tasks, while multi-label classification systems were used mainly for the concept detection, and Transformer-based architectures primarily for the caption prediction subtask
Overview of the ImageCLEF 2022 : multimedia retrieval in medical, social media and nature applications
This paper presents an overview of the ImageCLEF 2022 lab that was organized as part of the Conference and Labs of the Evaluation Forum – CLEF Labs 2022. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2022, the 20th edition of ImageCLEF runs four main tasks: (i) a medical task that groups two previous tasks, i.e., caption analysis and tuberculosis prediction, (ii) a social media aware task on estimating potential real-life effects of online image sharing, (iii) a nature coral task about segmenting and labeling collections of coral reef images, and (iv) a new fusion task addressing the design of late fusion schemes for boosting the performance, with two real-world applications: image search diversification (retrieval) and prediction of visual interestingness (regression). The benchmark campaign received the participation of over 25 groups submitting more than 258 runs
Overview of the imageCLEF 2023: Multimedia retrieval in medical social media and internet applications
14th International Conference of the CLEF Association, CLEF 2023, Thessaloniki, Greece, September 18–21, 2023, ProceedingsInternational audienceThis paper presents an overview of the ImageCLEF 2023lab, which was organized in the frame of the Conference and Labs ofthe Evaluation Forum – CLEF Labs 2023. ImageCLEF is an ongoingevaluation event that started in 2003 and that encourage the evaluationof the technologies for annotation, indexing and retrieval of multimodaldata with the goal of providing information access to large collections ofdata in various usage scenarios and domains. In 2023, the 21st edition ofImageCLEF runs three main tasks: (i) a medical task which included thesequel of the caption analysis task and three new tasks, namely, GANsfor medical images, Visual Question Answering for colonoscopy images,and medical dialogue summarization; (ii) a sequel of the fusion task addressing the design of late fusion schemes for boosting the performance,with two real-world applications: image search diversification (retrieval)and prediction of visual interestingness (regression); and (iii) a sequelof the social media aware task on potential real-life effects awareness ofonline image sharing. The benchmark campaign was a real success andThis version of the article has been accepted for publication, after peer review (whenapplicable) and is subject to Springer Nature’s AM terms of use, but is not the Versionof Record and does not reflect post-acceptance improvements, or any corrections.The Version of Record is available online at: https://doi.org/10.1007/978-3-031-42448-9_25received the participation of over 45 groups submitting more than 240runs
ImageCLEF 2023 highlight: Multimedia retrieval in medical, social media and content recommendation applications
ECIR 2023. : Advances in Information Retrieval ; 45th European Conference on Information Retrieval, Dublin, Ireland, April 2–6, 2023, Proceedings, Part IIIInternational audienceIn this paper, we provide an overview of the upcoming ImageCLEF campaign. ImageCLEF is part of the CLEF Conference and Labs of the Evaluation Forum since 2003. ImageCLEF, the Multimedia Retrieval task in CLEF, is an ongoing evaluation initiative that promotes the evaluation of technologies for annotation, indexing, and retrieval of multimodal data with the aim of providing information access to large collections of data in various usage scenarios and domains. In its 21stedition, ImageCLEF 2023 will have four main tasks: (i) a Medical task addressing automatic image captioning, synthetic medical images created with GANs, Visual Question Answering for colonoscopy images, and medical dialogue summarization; (ii) an Aware task addressing the prediction of real-life consequences of online photo sharing; (iii) a Fusion task addressing late fusion techniques based on the expertise of a pool of classifiers; and (iv) a Recommending task addressing cultural heritage content-recommendation. In 2022, ImageCLEF received the participation of over 25 groups submitting more than 258 runs. These numbers show the impact of the campaign. With the COVID-19 pandemic now over, we expect that the interest in participating, especially at the physical CLEF sessions, will increase significantly in 202
Overview of the ImageCLEF 2024: Multimedia retrieval in medical applications
International audienceThis paper presents an overview of the ImageCLEF 2024 lab, organized as part of the Conference and Labs of the Evaluation Forum -CLEF Labs 2024. ImageCLEF, an ongoing evaluation event since 2003, encourages the evaluation of technologies for annotation, indexing and retrieval of multimodal data. The goal is to provide information access to large collections of data across various usage scenarios and domains. In 2024, the 22st edition of ImageCLEF runs three main tasks: (i) a medical task, continuing the caption analysis, Visual Question Answering for colonoscopy images alongside GANs for medical images, and medical dialogue summarization; (ii) a novel task related to image retrieval/generation for arguments for visual communication, aimed at augmenting the effectiveness of arguments; and (iii)ToPicto, a new task focused on translating natural language, whether spoken or textual, into a sequence of pictograms. The benchmarking capaign was a real success and received the participation of over 35 groups submitting more than 220 runs