Search CORE

45 research outputs found

Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Author: Akbik Alan
Golde Jonas
Haller Patrick
Hamborg Felix
Risch Julian
Publication venue
Publication date: 02/02/2024
Field of study

Most NLP tasks are modeled as supervised learning and thus require labeled training data to train effective models. However, manually producing such data at sufficient quality and quantity is known to be costly and time-intensive. Current research addresses this bottleneck by exploring a novel paradigm called zero-shot learning via dataset generation. Here, a powerful LLM is prompted with a task description to generate labeled data that can be used to train a downstream NLP model. For instance, an LLM might be prompted to "generate 500 movie reviews with positive overall sentiment, and another 500 with negative sentiment." The generated data could then be used to train a binary sentiment classifier, effectively leveraging an LLM as a teacher to a smaller student model. With this demo, we introduce Fabricator, an open-source Python toolkit for dataset generation. Fabricator implements common dataset generation workflows, supports a wide range of downstream NLP tasks (such as text classification, question answering, and entity recognition), and is integrated with well-known libraries to facilitate quick experimentation. With Fabricator, we aim to support researchers in conducting reproducible dataset generation experiments using LLMs and help practitioners apply this approach to train models for downstream tasks.Comment: 3 Figures and 2 Table

arXiv.org e-Print Archive

Analysis of riboflavin/ultraviolet a corneal cross-linking by molecular spectroscopy

Author: Galli Roberta
Golde Jonas
Herber Robert
Koch Edmund
Melcher Steven
Raiskup Frederik
Steiner Gerald
Zimmerer Cordelia
Publication venue: London [u.a.] : Elsevier
Publication date: 01/01/2023
Field of study

Corneal cross-linking (CXL) with riboﬂavin and ultraviolet A light is a therapeutic procedure to restore the mechanical stability of corneal tissue. The treatment method is applied to pathological tissue, such as keratoconus and induces the formation of new cross-links. At present, the molecular mechanisms of induced cross-linking are still not known exactly. In this study, we investigated molecular alterations within porcine cornea tissue after treatment with riboﬂavin and ultraviolet A light by surface enhanced Raman spectroscopy (SERS). For that purpose, after CXL treatment a thin silver layer was vapor-deposited onto cornea flaps. To explore molecular alterations induced by the photochemical process hierarchical cluster analysis (HCA) was used. The detailed analysis of SERS spectra reveals that there is no general change in collagen secondary structure while modifications on amino acid side chains are the most dominant outcome. The formation of secondary and aromatic amine groups as well as methylene and carbonyl groups were observed. Even though successful cross-linking could not be registered in all treated samples, Raman signals of newly formed chemical groups are already present in riboflavin only treated corneas

PubMed Central

Repositorium für Naturwissenschaften und Technik (TIB Hannover)

Imaging the tympanic membrane oscillation ex vivo with Doppler optical coherence tomography during simulated Eustachian catarrh

Author: Bornitz Matthias
Burkhardt Anke
Golde Jonas
Kemper Max
Kirsten Lars
Koch Edmund
Stoppe Thomas
Walther Julia
Zahnert Thomas
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 29/08/2019
Field of study

Recently, optical coherence tomography (OCT) was utilized in multiple studies for structural and functional imaging of the middle ear and the tympanic membrane. Since Doppler OCT allows both, the spatially resolved measurement of the tympanic membrane oscillation and high-resolution imaging, it is regarded as a promising tool for future in vivo applications. In this study, Doppler OCT is utilized for the visualization of the tympanic membrane oscillation in temporal bones with simulated Eustachian catarrh, which was realized by generating a depression in the tympanic cavity. The transfer function, meaning the oscillation amplitude normalized to the applied sound pressure, is measured frequency resolved in the range from 0.5 kHz to 6 kHz and with a lateral spatial resolution of 0.4 mm. Typical oscillation patterns could be observed in case of ambient pressure in the tympanic cavity. Under depression the characteristic oscillation patterns were observed with widely congruent appearance but at higher frequencies

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Core–shell bioprinting as a strategy to apply differentiation factors in a spatially defined manner inside osteochondral tissue substitutes

Author: Ahlfeld Tilman
Bernhardt Anne
Cometta Silvia
Emmermacher Julia
Gelinsky Michael
Golde Jonas
Kilian David
Lode Anja
Taymour Rania
Publication venue: IOP Publishing
Publication date: 06/06/2024
Field of study

One of the key challenges in osteochondral tissue engineering is to define specified zones with varying material properties, cell types and biochemical factors supporting locally adjusted differentiation into the osteogenic and chondrogenic lineage, respectively. Herein, extrusion-based core–shell bioprinting is introduced as a potent tool allowing a spatially defined delivery of cell types and differentiation factors TGF-β3 and BMP-2 in separated compartments of hydrogel strands, and, therefore, a local supply of matching factors for chondrocytes and osteoblasts. Ink development was based on blends of alginate and methylcellulose, in combination with varying concentrations of the nanoclay Laponite whose high affinity binding capacity for various molecules was exploited. Release kinetics of model molecules was successfully tuned by Laponite addition. Core–shell bioprinting was proven to generate well-oriented compartments within one strand as monitored by optical coherence tomography in a non-invasive manner. Chondrocytes and osteoblasts were applied each in the shell while the respective differentiation factors (TGF-β3, BMP-2) were provided by a Laponite-supported core serving as central factor depot within the strand, allowing directed differentiation of cells in close contact to the core. Experiments with bi-zonal constructs, comprising an osteogenic and a chondrogenic zone, revealed that the local delivery of the factors from the core reduces effects of these factors on the cells in the other scaffold zone. These observations prove the general suitability of the suggested system for co-differentiation of different cell types within a zonal construct

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Non-rigid Point Cloud Registration for Middle Ear Diagnostics with Endoscopic Optical Coherence Tomography

Author: Bodenstedt Sebastian
Chen Zhaoyu
Golde Jonas
Hu Yujia
Koch Edmund
Li Chenpan
Liu Peng
Morgenstern Joseph
Neudert Marcus
Speidel Stefanie
Publication venue
Publication date: 26/04/2023
Field of study

Purpose: Middle ear infection is the most prevalent inflammatory disease, especially among the pediatric population. Current diagnostic methods are subjective and depend on visual cues from an otoscope, which is limited for otologists to identify pathology. To address this shortcoming, endoscopic optical coherence tomography (OCT) provides both morphological and functional in-vivo measurements of the middle ear. However, due to the shadow of prior structures, interpretation of OCT images is challenging and time-consuming. To facilitate fast diagnosis and measurement, improvement in the readability of OCT data is achieved by merging morphological knowledge from ex-vivo middle ear models with OCT volumetric data, so that OCT applications can be further promoted in daily clinical settings. Methods: We propose C2P-Net: a two-staged non-rigid registration pipeline for complete to partial point clouds, which are sampled from ex-vivo and in-vivo OCT models, respectively. To overcome the lack of labeled training data, a fast and effective generation pipeline in Blender3D is designed to simulate middle ear shapes and extract in-vivo noisy and partial point clouds. Results: We evaluate the performance of C2P-Net through experiments on both synthetic and real OCT datasets. The results demonstrate that C2P-Net is generalized to unseen middle ear point clouds and capable of handling realistic noise and incompleteness in synthetic and real OCT data. Conclusion: In this work, we aim to enable diagnosis of middle ear structures with the assistance of OCT images. We propose C2P-Net: a two-staged non-rigid registration pipeline for point clouds to support the interpretation of in-vivo noisy and partial OCT images for the first time. Code is available at: https://gitlab.com/nct\_tso\_public/c2p-net

arXiv.org e-Print Archive

In vivo imaging of human oral hard and soft tissues by polarizationsensitive optical coherence tomography

Author: Golde Jonas
Hannig Christian
Hempel Franz
Kirsten Lars
Koch Edmund
Rosenauer Tobias
Tetschke Florian
Walther Julia
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 09/09/2019
Field of study

Since optical coherence tomography (OCT) provides three-dimensional high-resolution images of biological tissue, the benefit of polarization contrast in the field of dentistry is highlighted in this study. Polarization-sensitive OCT (PS OCT) with phase-sensitive recording is used for imaging dental and mucosal tissues in the human oral cavity in vivo. An enhanced polarization contrast of oral structures is reached by analyzing the signals of the co- and crosspolarized channels of the swept source PS OCT system quantitatively with respect to reflectivity, retardation, optic axis orientation, and depolarization. The calculation of these polarization parameters enables a high tissue-specific contrast imaging for the detailed physical interpretation of human oral hard and soft tissues. For the proof-of-principle, imaging of composite restorations and mineralization defects at premolars as well as gingival, lingual, and labial oral mucosa was performed in vivo within the anterior oral cavity. The achieved contrast-enhanced results of the investigated human oral tissues by means of polarizationsensitive imaging are evaluated by the comparison with conventional intensity-based OCT

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Author: :
Abdollahi Arezoo
Abdulmumin Idris
Abrar Nafis
Adelani David Ifeoluwa
Aghagol Arash
Aji Alham Fikri
Ajibade Benjamin
Akiki Christopher
Akinlolu Martha
Al-shaibani Maged S.
Albanie Samuel
Alfassy Amit
Alizadeh Samira
allal Loubna Ben
Almubarak Khalid
Altay Gabriel
Alyafeai Zaid
Ammanamanchi Pawan Sasanka
Amuok Priscilla
An Ran
Antverg Omer
Bach Stephen H.
Bajaj Yash Shailesh
Bamberger Zachary
Bari M Saiful
Barth Fabio
Baruwa Ahmed
Bawden Rachel
Baylor Emi
Bayrak Giyaseddin
Behroozi Bahareh
Beilharz Benjamin
Bekman Stas
Belinkov Yonatan
Belkada Younes
Bello Imane
Beltagy Iz
Ben-David Srulik
Benyamina Hamza
Bers Tali
Bharati Sushil
Bhattacharjee Joydeep
Bhattacharya Indrani
Biderman Stella
Bogdanov Eli
Bommasani Rishi
Bose Shamik
Bourfoune Hatim
Bras Mathilde
Brito Caio
Broad Nicholas Michio
Brody Shaked
Bulchandani Lokesh
Burns Gully
Burynok Mykola
Cahyawijaya Samuel
Callahan Alison
Canalli Rodrigo
Carpuat Marine
Casper Jared
Castagné Roman
Castillo Maria A
Chaffin Antoine
Chandrasekhar Ramya
Chang Jonathan
Chen Kimbo
Cheng Newton
Cheveleva Anastasia
Chhablani Gunjan
Chim Jenny
Chung Hyung Won
Clinciu Miruna
Clive Jordan
Coavoux Maximin
Colombo Pierre
Contractor Danish
Cornette Pierre
Cullan Michael
Dahlberg Nathan
Danchev Valentin
Dash Ishani
Datta Debajyoti
David Davis
de Bykhovetz Madeleine Hahn
de Gibert Ona
de la Rosa Javier
De Toni Francesco
De Wolf Michiel
del Moral Albert Villanova
Deshmukh Shlok S
Dettmers Tim
Dey Manan
Dodge Jesse
Dupont Gérard
Dutra Livia
Eisenberg Renata
Elbadri Maraim
Elkott Nour
Elsahar Hady
Emezue Chris
Espejel Omar
Fahmy Nour
Fan Angela
Faranak Amy
Feizpour Amir
Ferrandis Carlos Muñoz
Fevry Thibault
Forde Jessica Zosa
Fourrier Clémentine
Freidank Moritz
Fries Jason Alan
Frohberg Jörg
Fuhrimann Florian
Fung Pascale
Gallé Matthias
Gandhi Sanchit
Gao Leo
Garda Samuele
Garrette Dan
Gehrmann Sebastian
Gerchick Marissa
Ghaleb Mustafa
Ghauri Muhammed
Gigant Théo
Giorgi John
Gokaslan Aaron
Golde Jonas
Gonzalez-Dios Itziar
Grandury María
HajiHosseini Azadeh
Haller Patrick
Hao Ryan
Harliman Rheza
Hazan Liam
Heinzerling Benjamin
Henderson Peter
Hesslow Daniel
Hevia Anthony
Huang Max
Ilić Suzana
Jain Chirag
Jauhar Mohammad A.
Jernite Yacine
Jiang Mike Tian-Jian
Johnson Isaac
Jones Hessie
Kainuma Tomoya
Kalo Jan-Christoph
Kang Jihyun
Kang Myungsun
Kasai Jungo
Kashyap Abhinav Ramesh
Kasner Zdeněk
Kassner Nora
Kawamura Ken
Khamis Nurulaqilla
Khan Ammar
Kiblawi Sid
Kiela Douwe
Kim Ethan
Kim Najoung
Kim Taewoon
Klamm Christopher
Kromann Rasmus
Kruszewski Germán
Kumar Srishti
Kusa Wojciech
Labrak Yanis
Lacroix Rémi
Laippala Veronika
Lansky David
Laud Tanmay
Launay Julien
Laurençon Hugo
Lavallée Pierre François
Le Thanh
Le Trieu
Lee Wilson Y.
Leong Colin
Lepercq Violette
Levkovizh Efrat
Lhoest Quentin
Li Conglong
Ligozat Anne-Laure
Limisiewicz Tomasz
Liu Lu
Liu Minna
Lo Kyle
Longpre Shayne
Lovering Charles
Luccioni Alexandra Sasha
López Roberto Luis
Manica Matteo
Manjavacas Enrique
Martin Robert
Masoud Maraim
McKenna Michael
McMillan-Major Angelina
Mielke Sabrina J.
Mieskes Margot
Mihaljcic Mina
Mikhailov Vladislav
Miranda-Escalada Antonio
Mirkin Shachar
Mirza Fatima
Mishra Mayank
Mishra Shubhanshu
Mitchell Margaret
Molano Daniel
Mou Chenghao
Muellner Nikolaus
Muennighoff Niklas
Muhammad Shamsuddeen Hassan
Muñoz Manuel Romero
Nagel Sebastian
Narayanan Deepak
Natan Eyal Bar
Nayak Nihal
Neeraj Trishala
Nejadgholi Isar
Nezhurina Marianna
Nguyen Duong A.
Nguyen Huu
Nguyen Olivier
Nguyen Zach
Nikoulina Vassilina
Nikpoor Somaieh
Nitzav Ariel Kreisberg
Novikova Jekaterina
Névéol Aurélie
Ononiwu Frankline
Osei Salomey
Ott Simon
Oyebade Tobi
Ozoani Ezinwanne
Pai Suhas
Pais Shani
Palasciano Alfredo
Pandey Harshit
Passmore Jesse
Patil Suraj
Patry Nicolas
Pavlick Ellie
Periñán Daniel León
Pestana Amanda
Peyrounette Myriam
Phan Long
Phang Jason
Pistilli Giada
Ponferrada Eduardo González
Posada Jose David
Prabhu Vrinda
Press Ofir
Protasov Vitaly
Pruksachatkun Yada
Pyysalo Sampo
Pàmies Marc
Qiu Mike
Radev Dragomir
Raffel Colin
Raja Arun
Rajani Nazneen
Rajbhandari Samyam
Rasley Jeff
Raunak Vikas
Reiter Ehud
Requena Stéphane
Rezanejad Habib
Ribeiro Rui
Rieser Verena
Roberts Adam
Rogers Anna
Roy Sourav
Rozen Jos
Rueda Alice
Rush Alexander M.
Ruwase Olatunji
Ryabinin Max
Sagot Benoît
Salesky Elizabeth
Samagaio Mairon
Samuel Olanrewaju
Samwald Matthias
Sang-aroonsiri Sinee
Sanh Victor
Sanseviero Omar
Santilli Andrea
Santos Ana
Sanz Julio Bonis
Saulnier Lucile
Saxena Bharat
Scao Teven Le
Schick Timo
Schoelkopf Hailey
Schweter Stefan
Scialom Thomas
Sedenko Irina
Seelam Natasha
Seltzer Josh
Serikov Oleg
Sharma Abheesht
Sharma Shanya
Shavrina Tatiana
Shen Sheng
Shinzato Luisa
Shoeybi Mohammad
Shubber Sarmad
Shukla Anima
Si Chenglei
Silberberg Stanislav
Simhi Adi
Singh Amanpreet
Singh Ayush
Singh Mayank
Sivaraman Karthik Rangasai
Smith Shaden
Solaiman Irene
Soroa Aitor
Stiegler Arnaud
Strobelt Hendrik
Su Rosaline
Su Ruisi
Suarez Pedro Ortiz
Subramani Nishant
Subramonian Arjun
Sun Zhiqing
Sutawika Lintang
Szczechla Eliza
Sänger Mario
Tae Jaesung
Takeuchi Maiko
Taktasheva Ekaterina
Talat Zeerak
Tammour Aycha
Tan Edward
Tan Samson
Tan Zhe
Tang Xiangru
Tanguy Ludovic
Tazi Nouamane
Taşar Davut Emre
Teehan Ryan
Thakker Urmish
Thrush Tristan
Tobing Joseph
Tojarieh Hadar
Torrent Tiago Timponi
Tow Jonathan
Tran Hieu
Tunuguntla Deepak
Unldreaj Antigona
Uri Yallow
van der Wal Oskar
van Strien Daniel
Venkatraman Yash
Viguier Sylvain
Villegas Paulo
Voloshina Ekaterina
von Platen Patrick
Von Werra Leandro
Vrabec Helena U.
Vu Minh Chien
Wang Bo
Wang Han
Wang Silas
Wang Thomas
Weber Leon
Webson Albert
Weinberg Michael
Winata Genta Indra
Wolf Thomas
Workshop BigScience
Xie Zhongli
Xu Canwen
Xu Chuxin
Xu Yifan
Xu Yingxin
Xu Yu
Yang Yoyo
Ye Zifan
Yong Zheng-Xin
Yu Dian
Yu Ian
Yun Tian
Yvon François
Zhang Minjia
Zhang Rui
Zhang Ruochen
Zhou Chenxi
Zhu Jian
Zink Sydney
Šaško Mario
Publication venue
Publication date: 10/12/2022
Field of study

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License

arXiv.org e-Print Archive

Polarization sensitive optical coherence tomography utilizing a buffered swept source laser

Author: Golde Jonas
Kirsten Lars
Koch Edmund
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/09/2017
Field of study

We present an approach for polarization sensitive optical coherence tomography (PS-OCT) that solely requires a modification of the light source, a buffered swept source laser. For this purpose a single-mode fiber-based Fourier domain mode locked laser is extended by fourfold buffering with manual fiber polarization controllers to emit alternating sweep polarizations, while the polarization contrast calibration is realized by a high-speed polarimeter. As the introduced setup utilizes standard scanning and detection units, the proposed method is a promising way to enhance various swept source OCT systems by polarization sensitive imaging. Preliminary measurements of a human finger nail with different polarization contrasts demonstrate the feasibility of the concept

Directory of Open Access Journals

Large-Scale Label Interpretation Learning for Few-Shot Named Entity Recognition

Author: Akbik Alan
Golde Jonas
Hamborg Felix
Publication venue
Publication date: 21/03/2024
Field of study

Few-shot named entity recognition (NER) detects named entities within text using only a few annotated examples. One promising line of research is to leverage natural language descriptions of each entity type: the common label PER might, for example, be verbalized as ''person entity.'' In an initial label interpretation learning phase, the model learns to interpret such verbalized descriptions of entity types. In a subsequent few-shot tagset extension phase, this model is then given a description of a previously unseen entity type (such as ''music album'') and optionally a few training examples to perform few-shot NER for this type. In this paper, we systematically explore the impact of a strong semantic prior to interpret verbalizations of new entity types by massively scaling up the number and granularity of entity types used for label interpretation learning. To this end, we leverage an entity linking benchmark to create a dataset with orders of magnitude of more distinct entity types and descriptions as currently used datasets. We find that this increased signal yields strong results in zero- and few-shot NER in in-domain, cross-domain, and even cross-lingual settings. Our findings indicate significant potential for improving few-shot NER through heuristical data-based optimization.Comment: 8 page

arXiv.org e-Print Archive

Differentiation of Occlusal Discolorations and Carious Lesions with Hyperspectral Imaging In Vitro

Author: Golde Jonas
Hannig Christian
Koch Edmund
Tetschke Florian
Vosahlo Robin
Walther Julia
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 19/04/2024
Field of study

Stains and stained incipient lesions can be challenging to differentiate with established clinical tools. New diagnostic techniques are required for improved distinction to enable early noninvasive treatment. This in vitro study evaluates the performance of artificial intelligence (AI)-based classification of hyperspectral imaging data for early occlusal lesion detection and differentiation from stains. Sixty-five extracted permanent human maxillary and mandibular bicuspids and molars (International Caries Detection and Assessment System [ICDAS] II 0–4) were imaged with a hyperspectral camera (Diaspective Vision TIVITA® Tissue, Diaspective Vision, Pepelow, Germany) at a distance of 350 mm, acquiring spatial and spectral information in the wavelength range 505–1000 nm; 650 fissural spectra were used to train classification algorithms (models) for automated distinction between stained but sound enamel and stained lesions. Stratified 10-fold cross-validation was used. The model with the highest classification performance, a fine k-nearest neighbor classification algorithm, was used to classify five additional tooth fissural areas. Polarization microscopy of ground sections served as reference. Compared to stained lesions, stained intact enamel showed higher reflectance in the wavelength range 525–710 nm but lower reflectance in the wavelength range 710–1000 nm. A fine k-nearest neighbor classification algorithm achieved the highest performance with a Matthews correlation coefficient (MCC) of 0.75, a sensitivity of 0.95 and a specificity of 0.80 when distinguishing between intact stained and stained lesion spectra. The superposition of color-coded classification results on further tooth occlusal projections enabled qualitative assessment of the entire fissure’s enamel health. AI-based evaluation of hyperspectral images is highly promising as a complementary method to visual and radiographic examination for early occlusal lesion detection

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa