134 research outputs found
Deep Learning of Unified Region, Edge, and Contour Models for Automated Image Segmentation
Image segmentation is a fundamental and challenging problem in computer
vision with applications spanning multiple areas, such as medical imaging,
remote sensing, and autonomous vehicles. Recently, convolutional neural
networks (CNNs) have gained traction in the design of automated segmentation
pipelines. Although CNN-based models are adept at learning abstract features
from raw image data, their performance is dependent on the availability and
size of suitable training datasets. Additionally, these models are often unable
to capture the details of object boundaries and generalize poorly to unseen
classes. In this thesis, we devise novel methodologies that address these
issues and establish robust representation learning frameworks for
fully-automatic semantic segmentation in medical imaging and mainstream
computer vision. In particular, our contributions include (1) state-of-the-art
2D and 3D image segmentation networks for computer vision and medical image
analysis, (2) an end-to-end trainable image segmentation framework that unifies
CNNs and active contour models with learnable parameters for fast and robust
object delineation, (3) a novel approach for disentangling edge and texture
processing in segmentation networks, and (4) a novel few-shot learning model in
both supervised settings and semi-supervised settings where synergies between
latent and image spaces are leveraged to learn to segment images given limited
training data.Comment: PhD dissertation, UCLA, 202
ViR: Vision Retention Networks
Vision Transformers (ViTs) have attracted a lot of popularity in recent
years, due to their exceptional capabilities in modeling long-range spatial
dependencies and scalability for large scale training. Although the training
parallelism of self-attention mechanism plays an important role in retaining
great performance, its quadratic complexity baffles the application of ViTs in
many scenarios which demand fast inference. This effect is even more pronounced
in applications in which autoregressive modeling of input features is required.
In Natural Language Processing (NLP), a new stream of efforts have proposed
parallelizable models with recurrent formulation that allows for efficient
inference in generative applications. Inspired by this trend, we propose a new
class of computer vision models, dubbed Vision Retention Networks (ViR), with
dual parallel and recurrent formulations, which strike an optimal balance
between fast inference and parallel training with competitive performance. In
particular, ViR scales favorably for image throughput and memory consumption in
tasks that require higher-resolution images due to its flexible formulation in
processing large sequence lengths. The ViR is the first attempt to realize dual
parallel and recurrent equivalency in a general vision backbone for recognition
tasks. We have validated the effectiveness of ViR through extensive experiments
with different dataset sizes and various image resolutions and achieved
competitive performance. Our code and pretrained models will be made publicly
available.Comment: Tech Repor
Global Context Vision Transformers
We propose global context vision transformer (GC ViT), a novel architecture
that enhances parameter and compute utilization for computer vision tasks. The
core of the novel model are global context self-attention modules, joint with
standard local self-attention, to effectively yet efficiently model both long
and short-range spatial interactions, as an alternative to complex operations
such as an attention masks or local windows shifting. While the local
self-attention modules are responsible for modeling short-range information,
the global query tokens are shared across all global self-attention modules to
interact with local key and values. In addition, we address the lack of
inductive bias in ViTs and improve the modeling of inter-channel dependencies
by proposing a novel downsampler which leverages a parameter-efficient fused
inverted residual block. The proposed GC ViT achieves new state-of-the-art
performance across image classification, object detection and semantic
segmentation tasks. On ImageNet-1K dataset for classification, the tiny, small
and base variants of GC ViT with 28M, 51M and 90M parameters achieve 83.4%,
83.9% and 84.4% Top-1 accuracy, respectively, surpassing comparably-sized prior
art such as CNN-based ConvNeXt and ViT-based Swin Transformer. Pre-trained GC
ViT backbones in downstream tasks of object detection, instance segmentation,
and semantic segmentation on MS COCO and ADE20K datasets outperform prior work
consistently, sometimes by large margins. Code and pre-trained models are
available at https://github.com/NVlabs/GCViT.Comment: 15 pages, 8 figure
Well-Being and Coping Capacities of Adolescent Students with Hearing Loss in Mainstream Schools
AbstractObjectivesCoping strategies used by adolescents has an important role in Preventing or decreasing their stresses and also increasing their wellbeings.This study aimed at evaluating the coping capacity and wellbeing of adolescent students with hearing loss in mainstream schools and also the correlations between their coping strategies and positivecharacteristics of well-being (engagement, perseverance, optimism,connectedness and happiness (EPOCH).Materials & MethodsIn this correlational study, 122 adolescent students with hearing loss were randomly selected from mainstream schools. Data collection was done by EPOCH Measure of Adolescent Well-Being and the Ways of Coping Questionnaire (WAYS). The Spearman correlation coefficient was used for determining the correlations between variables.ResultsThe mean scores of using different coping strategies varied from 1.36i n problem solving to 1.44 in seeking support. Among the positive characteristics of well-being, happiness had the lowest (11.04) and connectedness showed the highest score (12.33). The findings also showed a significant correlation between all coping strategies and EPOCH, however there was a strong positive correlation between total coping strategy score and perseverance (0.648) and happiness(0.629).ConclusionBased on the results, the score of happiness in students with hearingl oss was the lowest among positive characteristics of well-being and also happiness showed a strong association with total scores in coping strategies. Accordingly, interventional studies are needed to examine whether training students with hearing loss to use coping strategies is effective in increasing their happiness and overall well-being
- …