Search CORE

10 research outputs found

Overview of ImageCLEFmedical 2022 – Caption Prediction and Concept Detection

Author: Ben Abacha Asma
Bloch Louise
Brüngel Raphael
Friedrich Christoph M
Garcia Seco De Herrera Alba
Idrissi-Yaghir Ahmad
Müller Henning
Rückert Johannes
Schäfer Henning
Publication venue: CEUR Workshop Proceedings
Publication date: 01/07/2022
Field of study

The 2022 ImageCLEFmedical caption prediction and concept detection tasks follow similar challenges that were already run from 2017–2021. The objective is to extract Unified Medical Language System (UMLS) concept annotations and/or captions from the image data that are then compared against the original text captions of the images. The images used for both tasks are a subset of the extended Radiology Objects in COntext (ROCO) data set which was used in ImageCLEFmedical 2020. In the caption prediction task, lexical similarity with the original image captions is evaluated with the BiLingual Evaluation Understudy (BLEU) score. In the concept detection task, UMLS terms are extracted from the original text captions, combined with manually curated concepts for image modality and anatomy, and compared against the predicted concepts in a multi-label way. The F1-score was used to assess the performance. The task attracted a strong participation with 20 registered teams. In the end, 12 teams submitted 157 graded runs for the two subtasks. Results show that there is a variety of techniques that can lead to good prediction results for the two tasks. Participants used image retrieval systems for both tasks, while multi-label classification systems were used mainly for the concept detection, and Transformer-based architectures primarily for the caption prediction subtask

University of Essex Research Repository

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

On the Impact of Cross-Domain Data on German Language Models

Author: Bian Jiang
Chen Aokun
Dada Amin
Egger Jan
Friedrich Christoph M.
Heiliger Lars
Idrissi-Yaghir Ahmad
Kleesiek Jens
Li Jianning
Peng Cheng
Seibold Constantin Marc
Smith Kaleb E
Truhn Daniel
Wu Yonghui
Yang Xi
Publication venue
Publication date: 13/10/2023
Field of study

Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed at containing high-quality data. Through training a series of models ranging between 122M and 750M parameters on both datasets, we conduct a comprehensive benchmark on multiple downstream tasks. Our findings demonstrate that the models trained on the cross-domain dataset outperform those trained on quality data alone, leading to improvements up to

4.45\%

over the previous state-of-the-art. The models are available at https://huggingface.co/ikim-uk-essenComment: 13 pages, 1 figure, accepted at Findings of the Association for Computational Linguistics: EMNLP 202

arXiv.org e-Print Archive

Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

Author: Arzideh Kamyar
Baldini Giulia
Bauer Marie
Bewersdorff Jeanette
Bian Jiang
Dada Amin
Friedrich Christoph M.
Hasin Max
Horn Peter A.
Idrissi-Yaghir Ahmad
Kleesiek Jens
Nensa Felix
Schlötterer Jörg
Schmidt Cynthia S.
Schäfer Henning
Seifert Christin
Smith Kaleb E.
Trienes Jan
Wu Yonghui
Zesch Torsten
Publication venue
Publication date: 08/05/2024
Field of study

Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primarily through continuous pre-training on domain-specific data. We pre-trained several German medical language models on 2.4B tokens derived from translated public English medical data and 3B tokens of German clinical data. The resulting models were evaluated on various German downstream tasks, including named entity recognition (NER), multi-label classification, and extractive question answering. Our results suggest that models augmented by clinical and translation-based pre-training typically outperform general domain models in medical contexts. We conclude that continuous pre-training has demonstrated the ability to match or even exceed the performance of clinical models trained from scratch. Furthermore, pre-training on clinical data or leveraging translated texts have proven to be reliable methods for domain adaptation in medical NLP tasks.Comment: Accepted at LREC-COLING 202

arXiv.org e-Print Archive

ImageCLEF 2022: Multimedia Retrieval in Medical, Nature, Fusion, and Internet Applications

ImageCLEF is part of the Conference and Labs of the Evaluation Forum (CLEF) since 2003. CLEF 2022 will take place in Bologna, Italy. ImageCLEF is an ongoing evaluation initiative which promotes the evaluation of technologies for annotation, indexing, and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In its 20th edition, ImageCLEF will have four main tasks: (i) a Medical task addressing concept annotation, caption prediction, and tuberculosis detection; (ii) a Coral task addressing the annotation and localisation of substrates in coral reef images; (iii) an Aware task addressing the prediction of real-life consequences of online photo sharing; and (iv) a new Fusion task addressing late fusion techniques based on the expertise of the pool of classifiers. In 2021, over 100 research groups registered at ImageCLEF with 42 groups submitting more than 250 runs. These numbers show that, despite the COVID-19 pandemic, there is strong interest in the evaluation campaign

University of Essex Research Repository

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

PubliDo (Fachhochschule Dortmund)

HAL-CEA

ImageCLEF 2023 highlight ::multimedia retrieval in medical, social media and content recommendation applications

Author: Drăgulinescu Ana Maria
Idrissi-Yaghir Ahmad
Ionescu Bogdan
Müller Henning
Popescu Adrian
Publication venue: Cham, Springer
Publication date: 08/12/2023
Field of study

In this paper, we provide an overview of the upcoming ImageCLEF campaign. ImageCLEF is part of the CLEF Conference and Labs of the Evaluation Forum since 2003. ImageCLEF, the Multimedia Retrieval task in CLEF, is an ongoing evaluation initiative that promotes the evaluation of technologies for annotation, indexing, and retrieval of multimodal data with the aim of providing information access to large collections of data in various usage scenarios and domains. In its 21st edition, ImageCLEF 2023 will have four main tasks: (i) a Medical task addressing automatic image captioning, synthetic medical images created with GANs, Visual Question Answering for colonoscopy images, and medical dialogue summarization; (ii) an Aware task addressing the prediction of real-life consequences of online photo sharing; (iii) a Fusion task addressing late fusion techniques based on the expertise of a pool of classifiers; and (iv) a Recommending task addressing cultural heritage content-recommendation. In 2022, ImageCLEF received the participation of over 25 groups submitting more than 258 runs. These numbers show the impact of the campaign. With the COVID-19 pandemic now over, we expect that the interest in participating, especially at the physical CLEF sessions, will increase significantly in 2023

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Overview of ImageCLEFmedical 2022 ::caption prediction and concept detection

Author: Ben Abacha Asma
Bloch Louise
Brüngel Raphael
Friedrich Christoph M.
García Seco de Herrera Alba
Idrissi-Yaghir Ahmad
Müller Henning
Rückert Johannes
Schäfer Henning
Publication venue
Publication date: 29/11/2022
Field of study

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Overview of the ImageCLEF 2022 : multimedia retrieval in medical, social media and nature applications

Author: Abacha Asma Ben
Bloch Louise
Brüngel Raphael
Campello Antonio
Chamberlain Jon
Cid Yashin Dicente
Clark Adrian
Constantin Mihai Gabriel
de Herrera Alba G. Seco
Deshayes-Chossart Jérôme
Dogariu Mihai
Friedrich Christoph M.
Idrissi-Yaghir Ahmad
Ionescu Bogdan
Kovalev Vassili
Kozlovski Serge
Müller Henning
Popescu Adrian
Péteri Renaud
Rückert Johannes
Schindler Hugo
Schäfer Henning
Ştefan Liviu-Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

This paper presents an overview of the ImageCLEF 2022 lab that was organized as part of the Conference and Labs of the Evaluation Forum – CLEF Labs 2022. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2022, the 20th edition of ImageCLEF runs four main tasks: (i) a medical task that groups two previous tasks, i.e., caption analysis and tuberculosis prediction, (ii) a social media aware task on estimating potential real-life effects of online image sharing, (iii) a nature coral task about segmenting and labeling collections of coral reef images, and (iv) a new fusion task addressing the design of late fusion schemes for boosting the performance, with two real-world applications: image search diversification (retrieval) and prediction of visual interestingness (regression). The benchmark campaign received the participation of over 25 groups submitting more than 258 runs

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Warwick Research Archives Portal Repository

HAL-CEA

Overview of the imageCLEF 2023: Multimedia retrieval in medical social media and internet applications

14th International Conference of the CLEF Association, CLEF 2023, Thessaloniki, Greece, September 18–21, 2023, ProceedingsInternational audienceThis paper presents an overview of the ImageCLEF 2023lab, which was organized in the frame of the Conference and Labs ofthe Evaluation Forum – CLEF Labs 2023. ImageCLEF is an ongoingevaluation event that started in 2003 and that encourage the evaluationof the technologies for annotation, indexing and retrieval of multimodaldata with the goal of providing information access to large collections ofdata in various usage scenarios and domains. In 2023, the 21st edition ofImageCLEF runs three main tasks: (i) a medical task which included thesequel of the caption analysis task and three new tasks, namely, GANsfor medical images, Visual Question Answering for colonoscopy images,and medical dialogue summarization; (ii) a sequel of the fusion task addressing the design of late fusion schemes for boosting the performance,with two real-world applications: image search diversification (retrieval)and prediction of visual interestingness (regression); and (iii) a sequelof the social media aware task on potential real-life effects awareness ofonline image sharing. The benchmark campaign was a real success andThis version of the article has been accepted for publication, after peer review (whenapplicable) and is subject to Springer Nature’s AM terms of use, but is not the Versionof Record and does not reflect post-acceptance improvements, or any corrections.The Version of Record is available online at: https://doi.org/10.1007/978-3-031-42448-9_25received the participation of over 45 groups submitting more than 240runs

HAL-CEA

ImageCLEF 2023 highlight: Multimedia retrieval in medical, social media and content recommendation applications

ECIR 2023. : Advances in Information Retrieval ; 45th European Conference on Information Retrieval, Dublin, Ireland, April 2–6, 2023, Proceedings, Part IIIInternational audienceIn this paper, we provide an overview of the upcoming ImageCLEF campaign. ImageCLEF is part of the CLEF Conference and Labs of the Evaluation Forum since 2003. ImageCLEF, the Multimedia Retrieval task in CLEF, is an ongoing evaluation initiative that promotes the evaluation of technologies for annotation, indexing, and retrieval of multimodal data with the aim of providing information access to large collections of data in various usage scenarios and domains. In its 21stedition, ImageCLEF 2023 will have four main tasks: (i) a Medical task addressing automatic image captioning, synthetic medical images created with GANs, Visual Question Answering for colonoscopy images, and medical dialogue summarization; (ii) an Aware task addressing the prediction of real-life consequences of online photo sharing; (iii) a Fusion task addressing late fusion techniques based on the expertise of a pool of classifiers; and (iv) a Recommending task addressing cultural heritage content-recommendation. In 2022, ImageCLEF received the participation of over 25 groups submitting more than 258 runs. These numbers show the impact of the campaign. With the COVID-19 pandemic now over, we expect that the interest in participating, especially at the physical CLEF sessions, will increase significantly in 202

HAL-CEA

Overview of the ImageCLEF 2024: Multimedia retrieval in medical applications

International audienceThis paper presents an overview of the ImageCLEF 2024 lab, organized as part of the Conference and Labs of the Evaluation Forum -CLEF Labs 2024. ImageCLEF, an ongoing evaluation event since 2003, encourages the evaluation of technologies for annotation, indexing and retrieval of multimodal data. The goal is to provide information access to large collections of data across various usage scenarios and domains. In 2024, the 22st edition of ImageCLEF runs three main tasks: (i) a medical task, continuing the caption analysis, Visual Question Answering for colonoscopy images alongside GANs for medical images, and medical dialogue summarization; (ii) a novel task related to image retrieval/generation for arguments for visual communication, aimed at augmenting the effectiveness of arguments; and (iii)ToPicto, a new task focused on translating natural language, whether spoken or textual, into a sequence of pictograms. The benchmarking capaign was a real success and received the participation of over 35 groups submitting more than 220 runs

Hal - Université Grenoble Alpes