187 research outputs found
Learning Multiscale Consistency for Self-supervised Electron Microscopy Instance Segmentation
Instance segmentation in electron microscopy (EM) volumes is tough due to
complex shapes and sparse annotations. Self-supervised learning helps but still
struggles with intricate visual patterns in EM. To address this, we propose a
pretraining framework that enhances multiscale consistency in EM volumes. Our
approach leverages a Siamese network architecture, integrating both strong and
weak data augmentations to effectively extract multiscale features. We uphold
voxel-level coherence by reconstructing the original input data from these
augmented instances. Furthermore, we incorporate cross-attention mechanisms to
facilitate fine-grained feature alignment between these augmentations. Finally,
we apply contrastive learning techniques across a feature pyramid, allowing us
to distill distinctive representations spanning various scales. After
pretraining on four large-scale EM datasets, our framework significantly
improves downstream tasks like neuron and mitochondria segmentation, especially
with limited finetuning data. It effectively captures voxel and feature
consistency, showing promise for learning transferable representations for EM
analysis
Continuous Cross-resolution Remote Sensing Image Change Detection
Most contemporary supervised Remote Sensing (RS) image Change Detection (CD)
approaches are customized for equal-resolution bitemporal images. Real-world
applications raise the need for cross-resolution change detection, aka, CD
based on bitemporal images with different spatial resolutions. Given training
samples of a fixed bitemporal resolution difference (ratio) between the
high-resolution (HR) image and the low-resolution (LR) one, current
cross-resolution methods may fit a certain ratio but lack adaptation to other
resolution differences. Toward continuous cross-resolution CD, we propose
scale-invariant learning to enforce the model consistently predicting HR
results given synthesized samples of varying resolution differences.
Concretely, we synthesize blurred versions of the HR image by random
downsampled reconstructions to reduce the gap between HR and LR images. We
introduce coordinate-based representations to decode per-pixel predictions by
feeding the coordinate query and corresponding multi-level embedding features
into an MLP that implicitly learns the shape of land cover changes, therefore
benefiting recognizing blurred objects in the LR image. Moreover, considering
that spatial resolution mainly affects the local textures, we apply
local-window self-attention to align bitemporal features during the early
stages of the encoder. Extensive experiments on two synthesized and one
real-world different-resolution CD datasets verify the effectiveness of the
proposed method. Our method significantly outperforms several vanilla CD
methods and two cross-resolution CD methods on the three datasets both in
in-distribution and out-of-distribution settings. The empirical results suggest
that our method could yield relatively consistent HR change predictions
regardless of varying bitemporal resolution ratios. Our code is available at
\url{https://github.com/justchenhao/SILI_CD}.Comment: 21 pages, 11 figures. Accepted article by IEEE TGR
Convolutional nets for reconstructing neural circuits from brain images acquired by serial section electron microscopy
Neural circuits can be reconstructed from brain images acquired by serial
section electron microscopy. Image analysis has been performed by manual labor
for half a century, and efforts at automation date back almost as far.
Convolutional nets were first applied to neuronal boundary detection a dozen
years ago, and have now achieved impressive accuracy on clean images. Robust
handling of image defects is a major outstanding challenge. Convolutional nets
are also being employed for other tasks in neural circuit reconstruction:
finding synapses and identifying synaptic partners, extending or pruning
neuronal reconstructions, and aligning serial section images to create a 3D
image stack. Computational systems are being engineered to handle petavoxel
images of cubic millimeter brain volumes
Unsupervised Training of Deep Neural Networks for Motion Estimation
PhDThis thesis addresses the problem of motion estimation, that is, the estimation of a eld that describes how pixels move from a reference frame to a target frame, using Deep Neural Networks (DNNs). In contrast to classic methods, we don't solve an optimization problem at test time. We train DNNs once and apply it in one pass during the test which reduces the computational complexity. The major contribution is that in contrast to a supervised method, we train our DNNs in an unsupervised way. By unsupervised, we mean without the need for ground truth motion elds which are expensive to obtain for real scenes. More speci cally, we have trained our networks by designing cost functions inspired by classical optical ow estimation schemes and generative methods in Computer Vision. We rst propose a straightforward CNN method that is trained to optimize the brightness constancy constraint and we embed it in a classical multiscale scheme in order to predict motions that are large in magnitude (GradNet). We show that GradNet generalizes well to an unknown dataset and performed comparably with state-of-the-art unsupervised methods at that time. Second, we propose a convolutional Siamese architecture wherein is embedded a new soft warping scheme applied in a multiscale framework and is trained to optimize a higher-level feature constancy constraint (LikeNet). The architecture of LikeNet allows a trade-o between the computational load and memory and is 98% smaller than other SOA methods in terms of learned parameters. We show that LikeNet performs on par with SOA approaches and the best among uni-directional methods, methods that calculate motion eld in one pass. Third, we propose a novel approach to distill slower LikeNet in a much faster regression neural network without losing much of the accuracy (QLikeNet). The results show that using DNNs is a promising direction for motion estimation, although further improvements are required as classical methods yet perform the best
A survey of face recognition techniques under occlusion
The limited capacity to recognize faces under occlusions is a long-standing
problem that presents a unique challenge for face recognition systems and even
for humans. The problem regarding occlusion is less covered by research when
compared to other challenges such as pose variation, different expressions,
etc. Nevertheless, occluded face recognition is imperative to exploit the full
potential of face recognition for real-world applications. In this paper, we
restrict the scope to occluded face recognition. First, we explore what the
occlusion problem is and what inherent difficulties can arise. As a part of
this review, we introduce face detection under occlusion, a preliminary step in
face recognition. Second, we present how existing face recognition methods cope
with the occlusion problem and classify them into three categories, which are
1) occlusion robust feature extraction approaches, 2) occlusion aware face
recognition approaches, and 3) occlusion recovery based face recognition
approaches. Furthermore, we analyze the motivations, innovations, pros and
cons, and the performance of representative approaches for comparison. Finally,
future challenges and method trends of occluded face recognition are thoroughly
discussed
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability
Video segmentation encompasses a wide range of categories of problem
formulation, e.g., object, scene, actor-action and multimodal video
segmentation, for delineating task-specific scene components with pixel-level
masks. Recently, approaches in this research area shifted from concentrating on
ConvNet-based to transformer-based models. In addition, various
interpretability approaches have appeared for transformer models and video
temporal dynamics, motivated by the growing interest in basic scientific
understanding, model diagnostics and societal implications of real-world
deployment. Previous surveys mainly focused on ConvNet models on a subset of
video segmentation tasks or transformers for classification tasks. Moreover,
component-wise discussion of transformer-based video segmentation models has
not yet received due focus. In addition, previous reviews of interpretability
methods focused on transformers for classification, while analysis of video
temporal dynamics modelling capabilities of video models received less
attention. In this survey, we address the above with a thorough discussion of
various categories of video segmentation, a component-wise discussion of the
state-of-the-art transformer-based models, and a review of related
interpretability methods. We first present an introduction to the different
video segmentation task categories, their objectives, specific challenges and
benchmark datasets. Next, we provide a component-wise review of recent
transformer-based models and document the state of the art on different video
segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc
interpretability methods for transformer models and interpretability methods
for understanding the role of the temporal dimension in video models. Finally,
we conclude our discussion with future research directions
- …