45 research outputs found
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Most NLP tasks are modeled as supervised learning and thus require labeled
training data to train effective models. However, manually producing such data
at sufficient quality and quantity is known to be costly and time-intensive.
Current research addresses this bottleneck by exploring a novel paradigm called
zero-shot learning via dataset generation. Here, a powerful LLM is prompted
with a task description to generate labeled data that can be used to train a
downstream NLP model. For instance, an LLM might be prompted to "generate 500
movie reviews with positive overall sentiment, and another 500 with negative
sentiment." The generated data could then be used to train a binary sentiment
classifier, effectively leveraging an LLM as a teacher to a smaller student
model. With this demo, we introduce Fabricator, an open-source Python toolkit
for dataset generation. Fabricator implements common dataset generation
workflows, supports a wide range of downstream NLP tasks (such as text
classification, question answering, and entity recognition), and is integrated
with well-known libraries to facilitate quick experimentation. With Fabricator,
we aim to support researchers in conducting reproducible dataset generation
experiments using LLMs and help practitioners apply this approach to train
models for downstream tasks.Comment: 3 Figures and 2 Table
Analysis of riboflavin/ultraviolet a corneal cross-linking by molecular spectroscopy
Corneal cross-linking (CXL) with riboflavin and ultraviolet A light is a therapeutic procedure to restore the mechanical stability of corneal tissue. The treatment method is applied to pathological tissue, such as keratoconus and induces the formation of new cross-links. At present, the molecular mechanisms of induced cross-linking are still not known exactly. In this study, we investigated molecular alterations within porcine cornea tissue after treatment with riboflavin and ultraviolet A light by surface enhanced Raman spectroscopy (SERS). For that purpose, after CXL treatment a thin silver layer was vapor-deposited onto cornea flaps. To explore molecular alterations induced by the photochemical process hierarchical cluster analysis (HCA) was used. The detailed analysis of SERS spectra reveals that there is no general change in collagen secondary structure while modifications on amino acid side chains are the most dominant outcome. The formation of secondary and aromatic amine groups as well as methylene and carbonyl groups were observed. Even though successful cross-linking could not be registered in all treated samples, Raman signals of newly formed chemical groups are already present in riboflavin only treated corneas
Imaging the tympanic membrane oscillation ex vivo with Doppler optical coherence tomography during simulated Eustachian catarrh
Recently, optical coherence tomography (OCT) was utilized in multiple studies for structural and functional imaging of the middle ear and the tympanic membrane. Since Doppler OCT allows both, the spatially resolved measurement of the tympanic membrane oscillation and high-resolution imaging, it is regarded as a promising tool for future in vivo applications. In this study, Doppler OCT is utilized for the visualization of the tympanic membrane oscillation in temporal bones with simulated Eustachian catarrh, which was realized by generating a depression in the tympanic cavity. The transfer function, meaning the oscillation amplitude normalized to the applied sound pressure, is measured frequency resolved in the range from 0.5 kHz to 6 kHz and with a lateral spatial resolution of 0.4 mm. Typical oscillation patterns could be observed in case of ambient pressure in the tympanic cavity. Under depression the characteristic oscillation patterns were observed with widely congruent appearance but at higher frequencies
Core–shell bioprinting as a strategy to apply differentiation factors in a spatially defined manner inside osteochondral tissue substitutes
One of the key challenges in osteochondral tissue engineering is to define specified zones with varying material properties, cell types and biochemical factors supporting locally adjusted differentiation into the osteogenic and chondrogenic lineage, respectively. Herein, extrusion-based core–shell bioprinting is introduced as a potent tool allowing a spatially defined delivery of cell types and differentiation factors TGF-β3 and BMP-2 in separated compartments of hydrogel strands, and, therefore, a local supply of matching factors for chondrocytes and osteoblasts. Ink development was based on blends of alginate and methylcellulose, in combination with varying concentrations of the nanoclay Laponite whose high affinity binding capacity for various molecules was exploited. Release kinetics of model molecules was successfully tuned by Laponite addition. Core–shell bioprinting was proven to generate well-oriented compartments within one strand as monitored by optical coherence tomography in a non-invasive manner. Chondrocytes and osteoblasts were applied each in the shell while the respective differentiation factors (TGF-β3, BMP-2) were provided by a Laponite-supported core serving as central factor depot within the strand, allowing directed differentiation of cells in close contact to the core. Experiments with bi-zonal constructs, comprising an osteogenic and a chondrogenic zone, revealed that the local delivery of the factors from the core reduces effects of these factors on the cells in the other scaffold zone. These observations prove the general suitability of the suggested system for co-differentiation of different cell types within a zonal construct
Non-rigid Point Cloud Registration for Middle Ear Diagnostics with Endoscopic Optical Coherence Tomography
Purpose: Middle ear infection is the most prevalent inflammatory disease,
especially among the pediatric population. Current diagnostic methods are
subjective and depend on visual cues from an otoscope, which is limited for
otologists to identify pathology. To address this shortcoming, endoscopic
optical coherence tomography (OCT) provides both morphological and functional
in-vivo measurements of the middle ear. However, due to the shadow of prior
structures, interpretation of OCT images is challenging and time-consuming. To
facilitate fast diagnosis and measurement, improvement in the readability of
OCT data is achieved by merging morphological knowledge from ex-vivo middle ear
models with OCT volumetric data, so that OCT applications can be further
promoted in daily clinical settings. Methods: We propose C2P-Net: a two-staged
non-rigid registration pipeline for complete to partial point clouds, which are
sampled from ex-vivo and in-vivo OCT models, respectively. To overcome the lack
of labeled training data, a fast and effective generation pipeline in Blender3D
is designed to simulate middle ear shapes and extract in-vivo noisy and partial
point clouds. Results: We evaluate the performance of C2P-Net through
experiments on both synthetic and real OCT datasets. The results demonstrate
that C2P-Net is generalized to unseen middle ear point clouds and capable of
handling realistic noise and incompleteness in synthetic and real OCT data.
Conclusion: In this work, we aim to enable diagnosis of middle ear structures
with the assistance of OCT images. We propose C2P-Net: a two-staged non-rigid
registration pipeline for point clouds to support the interpretation of in-vivo
noisy and partial OCT images for the first time. Code is available at:
https://gitlab.com/nct\_tso\_public/c2p-net
In vivo imaging of human oral hard and soft tissues by polarizationsensitive optical coherence tomography
Since optical coherence tomography (OCT) provides three-dimensional high-resolution images of biological tissue, the benefit of polarization contrast in the field of dentistry is highlighted in this study. Polarization-sensitive OCT (PS OCT) with phase-sensitive recording is used for imaging dental and mucosal tissues in the human oral cavity in vivo. An enhanced polarization contrast of oral structures is reached by analyzing the signals of the co- and crosspolarized channels of the swept source PS OCT system quantitatively with respect to reflectivity, retardation, optic axis orientation, and depolarization. The calculation of these polarization parameters enables a high tissue-specific contrast imaging for the detailed physical interpretation of human oral hard and soft tissues. For the proof-of-principle, imaging of composite restorations and mineralization defects at premolars as well as gingival, lingual, and labial oral mucosa was performed in vivo within the anterior oral cavity. The achieved contrast-enhanced results of the investigated human oral tissues by means of polarizationsensitive imaging are evaluated by the comparison with conventional intensity-based OCT
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Large language models (LLMs) have been shown to be able to perform new tasks
based on a few demonstrations or natural language instructions. While these
capabilities have led to widespread adoption, most LLMs are developed by
resource-rich organizations and are frequently kept from the public. As a step
towards democratizing this powerful technology, we present BLOOM, a
176B-parameter open-access language model designed and built thanks to a
collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer
language model that was trained on the ROOTS corpus, a dataset comprising
hundreds of sources in 46 natural and 13 programming languages (59 in total).
We find that BLOOM achieves competitive performance on a wide variety of
benchmarks, with stronger results after undergoing multitask prompted
finetuning. To facilitate future research and applications using LLMs, we
publicly release our models and code under the Responsible AI License
Polarization sensitive optical coherence tomography utilizing a buffered swept source laser
We present an approach for polarization sensitive optical coherence tomography (PS-OCT) that solely requires a modification of the light source, a buffered swept source laser. For this purpose a single-mode fiber-based Fourier domain mode locked laser is extended by fourfold buffering with manual fiber polarization controllers to emit alternating sweep polarizations, while the polarization contrast calibration is realized by a high-speed polarimeter. As the introduced setup utilizes standard scanning and detection units, the proposed method is a promising way to enhance various swept source OCT systems by polarization sensitive imaging. Preliminary measurements of a human finger nail with different polarization contrasts demonstrate the feasibility of the concept
Large-Scale Label Interpretation Learning for Few-Shot Named Entity Recognition
Few-shot named entity recognition (NER) detects named entities within text
using only a few annotated examples. One promising line of research is to
leverage natural language descriptions of each entity type: the common label
PER might, for example, be verbalized as ''person entity.'' In an initial label
interpretation learning phase, the model learns to interpret such verbalized
descriptions of entity types. In a subsequent few-shot tagset extension phase,
this model is then given a description of a previously unseen entity type (such
as ''music album'') and optionally a few training examples to perform few-shot
NER for this type. In this paper, we systematically explore the impact of a
strong semantic prior to interpret verbalizations of new entity types by
massively scaling up the number and granularity of entity types used for label
interpretation learning. To this end, we leverage an entity linking benchmark
to create a dataset with orders of magnitude of more distinct entity types and
descriptions as currently used datasets. We find that this increased signal
yields strong results in zero- and few-shot NER in in-domain, cross-domain, and
even cross-lingual settings. Our findings indicate significant potential for
improving few-shot NER through heuristical data-based optimization.Comment: 8 page
Differentiation of Occlusal Discolorations and Carious Lesions with Hyperspectral Imaging In Vitro
Stains and stained incipient lesions can be challenging to differentiate with established clinical tools. New diagnostic techniques are required for improved distinction to enable early noninvasive treatment. This in vitro study evaluates the performance of artificial intelligence (AI)-based classification of hyperspectral imaging data for early occlusal lesion detection and differentiation from stains. Sixty-five extracted permanent human maxillary and mandibular bicuspids and molars (International Caries Detection and Assessment System [ICDAS] II 0–4) were imaged with a hyperspectral camera (Diaspective Vision TIVITA® Tissue, Diaspective Vision, Pepelow, Germany) at a distance of 350 mm, acquiring spatial and spectral information in the wavelength range 505–1000 nm; 650 fissural spectra were used to train classification algorithms (models) for automated distinction between stained but sound enamel and stained lesions. Stratified 10-fold cross-validation was used. The model with the highest classification performance, a fine k-nearest neighbor classification algorithm, was used to classify five additional tooth fissural areas. Polarization microscopy of ground sections served as reference. Compared to stained lesions, stained intact enamel showed higher reflectance in the wavelength range 525–710 nm but lower reflectance in the wavelength range 710–1000 nm. A fine k-nearest neighbor classification algorithm achieved the highest performance with a Matthews correlation coefficient (MCC) of 0.75, a sensitivity of 0.95 and a specificity of 0.80 when distinguishing between intact stained and stained lesion spectra. The superposition of color-coded classification results on further tooth occlusal projections enabled qualitative assessment of the entire fissure’s enamel health. AI-based evaluation of hyperspectral images is highly promising as a complementary method to visual and radiographic examination for early occlusal lesion detection