Search CORE

1,320 research outputs found

3D MRI brain tumor segmentation using autoencoder regularization

Author: Myronenko Andriy
Publication venue
Publication date: 19/11/2018
Field of study

Automated segmentation of brain tumors from 3D magnetic resonance images (MRIs) is necessary for the diagnosis, monitoring, and treatment planning of the disease. Manual delineation practices require anatomical knowledge, are expensive, time consuming and can be inaccurate due to human error. Here, we describe a semantic segmentation network for tumor subregion segmentation from 3D MRIs based on encoder-decoder architecture. Due to a limited training dataset size, a variational auto-encoder branch is added to reconstruct the input image itself in order to regularize the shared decoder and impose additional constraints on its layers. The current approach won 1st place in the BraTS 2018 challenge

arXiv.org e-Print Archive

Going Deeper in Facial Expression Recognition using Deep Neural Networks

Author: Chan David
Mahoor Mohammad H.
Mollahosseini Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/11/2015
Field of study

Automated Facial Expression Recognition (FER) has remained a challenging and interesting problem. Despite efforts made in developing various methods for FER, existing approaches traditionally lack generalizability when applied to unseen images or those that are captured in wild setting. Most of the existing approaches are based on engineered features (e.g. HOG, LBPH, and Gabor) where the classifier's hyperparameters are tuned to give best recognition accuracies across a single database, or a small collection of similar databases. Nevertheless, the results are not significant when they are applied to novel data. This paper proposes a deep neural network architecture to address the FER problem across multiple well-known standard face datasets. Specifically, our network consists of two convolutional layers each followed by max pooling and then four Inception layers. The network is a single component architecture that takes registered facial images as the input and classifies them into either of the six basic or the neutral expressions. We conducted comprehensive experiments on seven publically available facial expression databases, viz. MultiPIE, MMI, CK+, DISFA, FERA, SFEW, and FER2013. The results of proposed architecture are comparable to or better than the state-of-the-art methods and better than traditional convolutional neural networks and in both accuracy and training time.Comment: To be appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 2016 {Accepted in first round submission

arXiv.org e-Print Archive

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

Author: Fu Tsu-Jui
Gan Zhe
Li Linjie
Lin Kevin
Liu Zicheng
Wang Lijuan
Wang William Yang
Publication venue
Publication date: 31/05/2023
Field of study

Masked visual modeling (MVM) has been recently proven effective for visual pre-training. While similar reconstructive objectives on video inputs (e.g., masked frame modeling) have been explored in video-language (VidL) pre-training, previous studies fail to find a truly effective MVM strategy that can largely benefit the downstream performance. In this work, we systematically examine the potential of MVM in the context of VidL learning. Specifically, we base our study on a fully end-to-end VIdeO-LanguagE Transformer (VIOLET), where the supervision from MVM training can be backpropagated to the video pixel space. In total, eight different reconstructive targets of MVM are explored, from low-level pixel values and oriented gradients to high-level depth maps, optical flow, discrete visual tokens, and latent visual features. We conduct comprehensive experiments and provide insights into the factors leading to effective MVM training, resulting in an enhanced model VIOLETv2. Empirically, we show VIOLETv2 pre-trained with MVM objective achieves notable improvements on 13 VidL benchmarks, ranging from video question answering, video captioning, to text-to-video retrieval.Comment: CVPR'23; the first two authors contributed equally; code is available at https://github.com/tsujuifu/pytorch_empirical-mv

arXiv.org e-Print Archive

Dense soft tissue 3D reconstruction refined with super-pixel segmentation for robotic abdominal surgery

Author: De Momi Elena
Forgione Antonello
Mattos Leonardo S.
Ortiz Jesus
Penza Veronica
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Purpose: Single-incision laparoscopic surgery decreases postoperative infections, but introduces limitations in the surgeon’s maneuverability and in the surgical field of view. This work aims at enhancing intra-operative surgical visualization by exploiting the 3D information about the surgical site. An interactive guidance system is proposed wherein the pose of preoperative tissue models is updated online. A critical process involves the intra-operative acquisition of tissue surfaces. It can be achieved using stereoscopic imaging and 3D reconstruction techniques. This work contributes to this process by proposing new methods for improved dense 3D reconstruction of soft tissues, which allows a more accurate deformation identification and facilitates the registration process. Methods: Two methods for soft tissue 3D reconstruction are proposed: Method 1 follows the traditional approach of the block matching algorithm. Method 2 performs a nonparametric modified census transform to be more robust to illumination variation. The simple linear iterative clustering (SLIC) super-pixel algorithm is exploited for disparity refinement by filling holes in the disparity images. Results: The methods were validated using two video datasets from the Hamlyn Centre, achieving an accuracy of 2.95 and 1.66 mm, respectively. A comparison with ground-truth data demonstrated the disparity refinement procedure: (1) increases the number of reconstructed points by up to 43% and (2) does not affect the accuracy of the 3D reconstructions significantly. Conclusion: Both methods give results that compare favorably with the state-of-the-art methods. The computational time constraints their applicability in real time, but can be greatly improved by using a GPU implementation

Archivio istituzionale della ricerca - Politecnico di Milano

Calibrating Depth Sensors with a Genetic Algorithm

Author: Haeling Jonas
Necker Marc
Schilling Andreas
Publication venue: Universität Tübingen
Publication date: 11/04/2019
Field of study

In this report, we deal with the optimization of the transformation estimate between the coordinate systems of depth sensors, \ie sensors that produce 3D measurements. For that, we present a novel method using a genetic algorithm to refine the six degrees of freedom (6 DoF) transformation via three rotational and three translational offsets. First, we demonstrate the necessity for an accurate depth sensor calibration using a depth error model of stereo cameras. The fusion of stereo disparity assumes a Gaussian disparity error distribution, which we examine with different stereo matching algorithms on the widely-used KITTI visual odometry dataset. Our analysis shows that the existing calibration is not adequate for accurate disparity fusion. As a consequence, we employ our genetic algorithm on this particular dataset, which results in a greatly improved calibration between the mounted stereo camera and the Lidar. Thus, stereo disparity estimates show improved results in quantitative evaluations

Publikationsserver der Universität Tübingen

Regularized Inverse Holographic Volume Reconstruction for 3D Particle Tracking

Author: Hong Jiarong
Mallery Kevin
Publication venue: 'The Optical Society'
Publication date: 09/04/2019
Field of study

The key limitations of digital inline holography (DIH) for particle tracking applications are poor longitudinal resolution, particle concentration limits, and case-specific processing. We utilize an inverse problem method with fused lasso regularization to perform full volumetric reconstructions of particle fields. By exploiting data sparsity in the solution and utilizing GPU processing, we dramatically reduce the computational cost usually associated with inverse reconstruction approaches. We demonstrate the accuracy of the proposed method using synthetic and experimental holograms. Finally, we present two practical applications (high concentration microorganism swimming and microfiber rotation) to extend the capabilities of DIH beyond what was possible using prior methods.Comment: 15 pages, 6 figure

arXiv.org e-Print Archive

FastEMRIWaveforms: New tools for millihertz gravitational-wave data analysis

Author: Chua Alvin J. K.
Hughes Scott A.
Katz Michael L.
Speri Lorenzo
Warburton Niels
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2021
Field of study

We present the FastEMRIWaveforms (FEW) package, a collection of tools to build and analyze extreme mass ratio inspiral (EMRI) waveforms. Here, we expand on the Physical Review Letter that introduced the first fast and accurate fully-relativistic EMRI waveform template model. We discuss the construction of the overall framework; constituent modules; and the general methods used to accelerate EMRI waveforms. Because the fully relativistic FEW model waveforms are for now limited to eccentric orbits in the Schwarzschild spacetime, we also introduce an improved Augmented Analytic Kludge (AAK) model that describes generic Kerr inspirals. Both waveform models can be accelerated using graphics processing unit (GPU) hardware. With the GPU-accelerated waveforms in hand, a variety of studies are performed including an analysis of EMRI mode content, template mismatch, and fully Bayesian Markov Chain Monte Carlo-based EMRI parameter estimation. We find relativistic EMRI waveform templates can be generated with fewer harmonic modes (

\sim10-100

) without biasing signal extraction. However, we show for the first time that extraction of a relativistic injection with semi-relativistic amplitudes can lead to strong bias and anomalous structure in the posterior distribution for certain regions of parameter space.Comment: 26 pages, 12 Figures, FastEMRIWaveforms Package: bhptoolkit.org/FastEMRIWaveforms

arXiv.org e-Print Archive

Test-Time Training for Deformable Multi-Scale Image Registration

Author: Fan Wei
Huang Yufang
Qian Zhen
Xie Xiaohui
Xu Daguang
Zhu Wentao
Publication venue
Publication date: 24/03/2021
Field of study

Registration is a fundamental task in medical robotics and is often a crucial step for many downstream tasks such as motion analysis, intra-operative tracking and image segmentation. Popular registration methods such as ANTs and NiftyReg optimize objective functions for each pair of images from scratch, which are time-consuming for 3D and sequential images with complex deformations. Recently, deep learning-based registration approaches such as VoxelMorph have been emerging and achieve competitive performance. In this work, we construct a test-time training for deep deformable image registration to improve the generalization ability of conventional learning-based registration model. We design multi-scale deep networks to consecutively model the residual deformations, which is effective for high variational deformations. Extensive experiments validate the effectiveness of multi-scale deep registration with test-time training based on Dice coefficient for image segmentation and mean square error (MSE), normalized local cross-correlation (NLCC) for tissue dense tracking tasks. Two videos are in https://www.youtube.com/watch?v=NvLrCaqCiAE and https://www.youtube.com/watch?v=pEA6ZmtTNuQComment: ICRA 2021; 8 pages, 4 figures, 2 big table

arXiv.org e-Print Archive

Leveraging Crowdsourced GPS Data for Road Extraction from Aerial Imagery

Author: Che Pengyu
Di Zonglin
Liu Chun
Sun Tao
Wang Yin
Publication venue
Publication date: 04/05/2019
Field of study

Deep learning is revolutionizing the mapping industry. Under lightweight human curation, computer has generated almost half of the roads in Thailand on OpenStreetMap (OSM) using high-resolution aerial imagery. Bing maps are displaying 125 million computer-generated building polygons in the U.S. While tremendously more efficient than manual mapping, one cannot map out everything from the air. Especially for roads, a small prediction gap by image occlusion renders the entire road useless for routing. Misconnections can be more dangerous. Therefore computer-based mapping often requires local verifications, which is still labor intensive. In this paper, we propose to leverage crowdsourced GPS data to improve and support road extraction from aerial imagery. Through novel data augmentation, GPS rendering, and 1D transpose convolution techniques, we show almost 5% improvements over previous competition winning models, and much better robustness when predicting new areas without any new training data or domain adaptation.Comment: To be published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 201

arXiv.org e-Print Archive

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

Author: Hafez Muhammad Burhan
Kerzel Matthias
Weber Cornelius
Wermter Stefan
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 18/02/2019
Field of study

In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings

arXiv.org e-Print Archive