66 research outputs found
Few-shot adaptation for morphology-independent cell instance segmentation
Microscopy data collections are becoming larger and more frequent. Accurate
and precise quantitative analysis tools like cell instance segmentation are
necessary to benefit from them. This is challenging due to the variability in
the data, which requires retraining the segmentation model to maintain high
accuracy on new collections. This is needed especially for segmenting cells
with elongated and non-convex morphology like bacteria. We propose to reduce
the amount of annotation and computing power needed for retraining the model by
introducing a few-shot domain adaptation approach that requires annotating only
one to five cells of the new data to process and that quickly adapts the model
to maintain high accuracy. Our results show a significant boost in accuracy
after adaptation to very challenging bacteria datasets.Comment: ISBI 202
Automated Classification of Vowel Category and Speaker Type in the High-Frequency Spectrum
The high-frequency region of vowel signals (above the third formant or F3) has received little research attention. Recent evidence, however, has documented the perceptual utility of high-frequency information in the speech signal above the traditional frequency bandwidth known to contain important cues for speech and speaker recognition. The purpose of this study was to determine if high-pass filtered vowels could be separated by vowel category and speaker type in a supervised learning framework. Mel frequency cepstral coefficients (MFCCs) were extracted from productions of six vowel categories produced by two male, two female, and two child speakers. Results revealed that the filtered vowels were well separated by vowel category and speaker type using MFCCs from the high-frequency spectrum. This demonstrates the presence of useful information for automated classification from the high-frequency region and is the first study to report findings of this nature in a supervised learning framework
Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images
Keypoint detection and matching is a fundamental task in many computer vision
problems, from shape reconstruction, to structure from motion, to AR/VR
applications and robotics. It is a well-studied problem with remarkable
successes such as SIFT, and more recent deep learning approaches. While great
robustness is exhibited by these techniques with respect to noise, illumination
variation, and rigid motion transformations, less attention has been placed on
image distortion sensitivity. In this work, we focus on the case when this is
caused by the geometry of the cameras used for image acquisition, and consider
the keypoint detection and matching problem between the hybrid scenario of a
fisheye and a projective image. We build on a state-of-the-art approach and
derive a self-supervised procedure that enables training an interest point
detector and descriptor network. We also collected two new datasets for
additional training and testing in this unexplored scenario, and we demonstrate
that current approaches are suboptimal because they are designed to work in
traditional projective conditions, while the proposed approach turns out to be
the most effective.Comment: CVPR Workshop on Omnidirectional Computer Vision, 202
Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Precise 3D environmental mapping is pivotal in robotics. Existing methods
often rely on predefined concepts during training or are time-intensive when
generating semantic maps. This paper presents Open-Fusion, a groundbreaking
approach for real-time open-vocabulary 3D mapping and queryable scene
representation using RGB-D data. Open-Fusion harnesses the power of a
pre-trained vision-language foundation model (VLFM) for open-set semantic
comprehension and employs the Truncated Signed Distance Function (TSDF) for
swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based
embeddings and their associated confidence maps. These are then integrated with
3D knowledge from TSDF using an enhanced Hungarian-based feature-matching
mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D
segmentation for open-vocabulary without necessitating additional 3D training.
Benchmark tests on the ScanNet dataset against leading zero-shot methods
highlight Open-Fusion's superiority. Furthermore, it seamlessly combines the
strengths of region-based VLFM and TSDF, facilitating real-time 3D scene
comprehension that includes object concepts and open-world semantics. We
encourage the readers to view the demos on our project page:
https://uark-aicv.github.io/OpenFusio
SAM3D: Segment Anything Model in Volumetric Medical Images
Image segmentation remains a pivotal component in medical image analysis,
aiding in the extraction of critical information for precise diagnostic
practices. With the advent of deep learning, automated image segmentation
methods have risen to prominence, showcasing exceptional proficiency in
processing medical imagery. Motivated by the Segment Anything Model (SAM)-a
foundational model renowned for its remarkable precision and robust
generalization capabilities in segmenting 2D natural images-we introduce SAM3D,
an innovative adaptation tailored for 3D volumetric medical image analysis.
Unlike current SAM-based methods that segment volumetric data by converting the
volume into separate 2D slices for individual analysis, our SAM3D model
processes the entire 3D volume image in a unified approach. Extensive
experiments are conducted on multiple medical image datasets to demonstrate
that our network attains competitive results compared with other
state-of-the-art methods in 3D medical segmentation tasks while being
significantly efficient in terms of parameters. Code and checkpoints are
available at https://github.com/UARK-AICV/SAM3D.Comment: Accepted at ISBI 202
Current Topological and Machine Learning Applications for Bias Detection in Text
Institutional bias can impact patient outcomes, educational attainment, and
legal system navigation. Written records often reflect bias, and once bias is
identified; it is possible to refer individuals for training to reduce bias.
Many machine learning tools exist to explore text data and create predictive
models that can search written records to identify real-time bias. However, few
previous studies investigate large language model embeddings and geometric
models of biased text data to understand geometry's impact on bias modeling
accuracy. To overcome this issue, this study utilizes the RedditBias database
to analyze textual biases. Four transformer models, including BERT and RoBERTa
variants, were explored. Post-embedding, t-SNE allowed two-dimensional
visualization of data. KNN classifiers differentiated bias types, with lower
k-values proving more effective. Findings suggest BERT, particularly mini BERT,
excels in bias classification, while multilingual models lag. The
recommendation emphasizes refining monolingual models and exploring
domain-specific biases
- …