1,320 research outputs found
3D MRI brain tumor segmentation using autoencoder regularization
Automated segmentation of brain tumors from 3D magnetic resonance images
(MRIs) is necessary for the diagnosis, monitoring, and treatment planning of
the disease. Manual delineation practices require anatomical knowledge, are
expensive, time consuming and can be inaccurate due to human error. Here, we
describe a semantic segmentation network for tumor subregion segmentation from
3D MRIs based on encoder-decoder architecture. Due to a limited training
dataset size, a variational auto-encoder branch is added to reconstruct the
input image itself in order to regularize the shared decoder and impose
additional constraints on its layers. The current approach won 1st place in the
BraTS 2018 challenge
Going Deeper in Facial Expression Recognition using Deep Neural Networks
Automated Facial Expression Recognition (FER) has remained a challenging and
interesting problem. Despite efforts made in developing various methods for
FER, existing approaches traditionally lack generalizability when applied to
unseen images or those that are captured in wild setting. Most of the existing
approaches are based on engineered features (e.g. HOG, LBPH, and Gabor) where
the classifier's hyperparameters are tuned to give best recognition accuracies
across a single database, or a small collection of similar databases.
Nevertheless, the results are not significant when they are applied to novel
data. This paper proposes a deep neural network architecture to address the FER
problem across multiple well-known standard face datasets. Specifically, our
network consists of two convolutional layers each followed by max pooling and
then four Inception layers. The network is a single component architecture that
takes registered facial images as the input and classifies them into either of
the six basic or the neutral expressions. We conducted comprehensive
experiments on seven publically available facial expression databases, viz.
MultiPIE, MMI, CK+, DISFA, FERA, SFEW, and FER2013. The results of proposed
architecture are comparable to or better than the state-of-the-art methods and
better than traditional convolutional neural networks and in both accuracy and
training time.Comment: To be appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 2016 {Accepted in first round submission
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Masked visual modeling (MVM) has been recently proven effective for visual
pre-training. While similar reconstructive objectives on video inputs (e.g.,
masked frame modeling) have been explored in video-language (VidL)
pre-training, previous studies fail to find a truly effective MVM strategy that
can largely benefit the downstream performance. In this work, we systematically
examine the potential of MVM in the context of VidL learning. Specifically, we
base our study on a fully end-to-end VIdeO-LanguagE Transformer (VIOLET), where
the supervision from MVM training can be backpropagated to the video pixel
space. In total, eight different reconstructive targets of MVM are explored,
from low-level pixel values and oriented gradients to high-level depth maps,
optical flow, discrete visual tokens, and latent visual features. We conduct
comprehensive experiments and provide insights into the factors leading to
effective MVM training, resulting in an enhanced model VIOLETv2. Empirically,
we show VIOLETv2 pre-trained with MVM objective achieves notable improvements
on 13 VidL benchmarks, ranging from video question answering, video captioning,
to text-to-video retrieval.Comment: CVPR'23; the first two authors contributed equally; code is available
at https://github.com/tsujuifu/pytorch_empirical-mv
Dense soft tissue 3D reconstruction refined with super-pixel segmentation for robotic abdominal surgery
Purpose: Single-incision laparoscopic surgery decreases postoperative infections, but introduces limitations in the surgeon’s maneuverability and in the surgical field of view. This work aims at enhancing intra-operative surgical visualization by exploiting the 3D information about the surgical site. An interactive guidance system is proposed wherein the pose of preoperative tissue models is updated online. A critical process involves the intra-operative acquisition of tissue surfaces. It can be achieved using stereoscopic imaging and 3D reconstruction techniques. This work contributes to this process by proposing new methods for improved dense 3D reconstruction of soft tissues, which allows a more accurate deformation identification and facilitates the registration process.
Methods: Two methods for soft tissue 3D reconstruction are proposed: Method 1 follows the traditional approach of the block matching algorithm. Method 2 performs a nonparametric modified census transform to be more robust to illumination variation. The simple linear iterative clustering (SLIC) super-pixel algorithm is exploited for disparity refinement by filling holes in the disparity images.
Results: The methods were validated using two video datasets from the Hamlyn Centre, achieving an accuracy of 2.95 and 1.66 mm, respectively. A comparison with ground-truth data demonstrated the disparity refinement procedure: (1) increases the number of reconstructed points by up to 43% and (2) does not affect the accuracy of the 3D reconstructions significantly.
Conclusion: Both methods give results that compare favorably with the state-of-the-art methods. The computational time constraints their applicability in real time, but can be greatly improved by using a GPU implementation
Calibrating Depth Sensors with a Genetic Algorithm
In this report, we deal with the optimization of the transformation estimate between the coordinate systems of depth sensors, \ie sensors that produce 3D measurements. For that, we present a novel method using a genetic algorithm to refine the six degrees of freedom (6 DoF) transformation via three rotational and three translational offsets. First, we demonstrate the necessity for an accurate depth sensor calibration using a depth error model of stereo cameras. The fusion of stereo disparity assumes a Gaussian disparity error distribution, which we examine with different stereo matching algorithms on the widely-used KITTI visual odometry dataset. Our analysis shows that the existing calibration is not adequate for accurate disparity fusion. As a consequence, we employ our genetic algorithm on this particular dataset, which results in a greatly improved calibration between the mounted stereo camera and the Lidar. Thus, stereo disparity estimates show improved results in quantitative evaluations
Regularized Inverse Holographic Volume Reconstruction for 3D Particle Tracking
The key limitations of digital inline holography (DIH) for particle tracking
applications are poor longitudinal resolution, particle concentration limits,
and case-specific processing. We utilize an inverse problem method with fused
lasso regularization to perform full volumetric reconstructions of particle
fields. By exploiting data sparsity in the solution and utilizing GPU
processing, we dramatically reduce the computational cost usually associated
with inverse reconstruction approaches. We demonstrate the accuracy of the
proposed method using synthetic and experimental holograms. Finally, we present
two practical applications (high concentration microorganism swimming and
microfiber rotation) to extend the capabilities of DIH beyond what was possible
using prior methods.Comment: 15 pages, 6 figure
FastEMRIWaveforms: New tools for millihertz gravitational-wave data analysis
We present the FastEMRIWaveforms (FEW) package, a collection of tools to
build and analyze extreme mass ratio inspiral (EMRI) waveforms. Here, we expand
on the Physical Review Letter that introduced the first fast and accurate
fully-relativistic EMRI waveform template model. We discuss the construction of
the overall framework; constituent modules; and the general methods used to
accelerate EMRI waveforms. Because the fully relativistic FEW model waveforms
are for now limited to eccentric orbits in the Schwarzschild spacetime, we also
introduce an improved Augmented Analytic Kludge (AAK) model that describes
generic Kerr inspirals. Both waveform models can be accelerated using graphics
processing unit (GPU) hardware. With the GPU-accelerated waveforms in hand, a
variety of studies are performed including an analysis of EMRI mode content,
template mismatch, and fully Bayesian Markov Chain Monte Carlo-based EMRI
parameter estimation. We find relativistic EMRI waveform templates can be
generated with fewer harmonic modes () without biasing signal
extraction. However, we show for the first time that extraction of a
relativistic injection with semi-relativistic amplitudes can lead to strong
bias and anomalous structure in the posterior distribution for certain regions
of parameter space.Comment: 26 pages, 12 Figures, FastEMRIWaveforms Package:
bhptoolkit.org/FastEMRIWaveforms
Test-Time Training for Deformable Multi-Scale Image Registration
Registration is a fundamental task in medical robotics and is often a crucial
step for many downstream tasks such as motion analysis, intra-operative
tracking and image segmentation. Popular registration methods such as ANTs and
NiftyReg optimize objective functions for each pair of images from scratch,
which are time-consuming for 3D and sequential images with complex
deformations. Recently, deep learning-based registration approaches such as
VoxelMorph have been emerging and achieve competitive performance. In this
work, we construct a test-time training for deep deformable image registration
to improve the generalization ability of conventional learning-based
registration model. We design multi-scale deep networks to consecutively model
the residual deformations, which is effective for high variational
deformations. Extensive experiments validate the effectiveness of multi-scale
deep registration with test-time training based on Dice coefficient for image
segmentation and mean square error (MSE), normalized local cross-correlation
(NLCC) for tissue dense tracking tasks. Two videos are in
https://www.youtube.com/watch?v=NvLrCaqCiAE and
https://www.youtube.com/watch?v=pEA6ZmtTNuQComment: ICRA 2021; 8 pages, 4 figures, 2 big table
Leveraging Crowdsourced GPS Data for Road Extraction from Aerial Imagery
Deep learning is revolutionizing the mapping industry. Under lightweight
human curation, computer has generated almost half of the roads in Thailand on
OpenStreetMap (OSM) using high-resolution aerial imagery. Bing maps are
displaying 125 million computer-generated building polygons in the U.S. While
tremendously more efficient than manual mapping, one cannot map out everything
from the air. Especially for roads, a small prediction gap by image occlusion
renders the entire road useless for routing. Misconnections can be more
dangerous. Therefore computer-based mapping often requires local verifications,
which is still labor intensive. In this paper, we propose to leverage
crowdsourced GPS data to improve and support road extraction from aerial
imagery. Through novel data augmentation, GPS rendering, and 1D transpose
convolution techniques, we show almost 5% improvements over previous
competition winning models, and much better robustness when predicting new
areas without any new training data or domain adaptation.Comment: To be published in IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR) 201
Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning
In this paper, we present a new intrinsically motivated actor-critic
algorithm for learning continuous motor skills directly from raw visual input.
Our neural architecture is composed of a critic and an actor network. Both
networks receive the hidden representation of a deep convolutional autoencoder
which is trained to reconstruct the visual input, while the centre-most hidden
representation is also optimized to estimate the state value. Separately, an
ensemble of predictive world models generates, based on its learning progress,
an intrinsic reward signal which is combined with the extrinsic reward to guide
the exploration of the actor-critic learner. Our approach is more
data-efficient and inherently more stable than the existing actor-critic
methods for continuous control from pixel data. We evaluate our algorithm for
the task of learning robotic reaching and grasping skills on a realistic
physics simulator and on a humanoid robot. The results show that the control
policies learned with our approach can achieve better performance than the
compared state-of-the-art and baseline algorithms in both dense-reward and
challenging sparse-reward settings
- …