991 research outputs found
Studies on Next-Gen Vehicular Detection: A Fusion of RF Signal and Fisheye Camera Technologies
早稲田大å¦ä¿®å£«(å·¥å¦)master thesi
Multi-Viewpoint and Multi-Evaluation with Felicitous Inductive Bias Boost Machine Abstract Reasoning Ability
Great endeavors have been made to study AI's ability in abstract reasoning,
along with which different versions of RAVEN's progressive matrices (RPM) are
proposed as benchmarks. Previous works give inkling that without sophisticated
design or extra meta-data containing semantic information, neural networks may
still be indecisive in making decisions regarding RPM problems, after
relentless training. Evidenced by thorough experiments and ablation studies, we
showcase that end-to-end neural networks embodied with felicitous inductive
bias, intentionally design or serendipitously match, can solve RPM problems
elegantly, without the augment of any extra meta-data or preferences of any
specific backbone. Our work also reveals that multi-viewpoint with
multi-evaluation is a key learning strategy for successful reasoning. Finally,
potential explanations for the failure of connectionist models in
generalization are provided. We hope that these results will serve as
inspections of AI's ability beyond perception and toward abstract reasoning.
Source code can be found in https://github.com/QinglaiWeiCASIA/RavenSolver
FAS-UNet: A Novel FAS-driven Unet to Learn Variational Image Segmentation
Solving variational image segmentation problems with hidden physics is often
expensive and requires different algorithms and manually tunes model parameter.
The deep learning methods based on the U-Net structure have obtained
outstanding performances in many different medical image segmentation tasks,
but designing such networks requires a lot of parameters and training data, not
always available for practical problems. In this paper, inspired by traditional
multi-phase convexity Mumford-Shah variational model and full approximation
scheme (FAS) solving the nonlinear systems, we propose a novel
variational-model-informed network (denoted as FAS-Unet) that exploits the
model and algorithm priors to extract the multi-scale features. The proposed
model-informed network integrates image data and mathematical models, and
implements them through learning a few convolution kernels. Based on the
variational theory and FAS algorithm, we first design a feature extraction
sub-network (FAS-Solution module) to solve the model-driven nonlinear systems,
where a skip-connection is employed to fuse the multi-scale features. Secondly,
we further design a convolution block to fuse the extracted features from the
previous stage, resulting in the final segmentation possibility. Experimental
results on three different medical image segmentation tasks show that the
proposed FAS-Unet is very competitive with other state-of-the-art methods in
qualitative, quantitative and model complexity evaluations. Moreover, it may
also be possible to train specialized network architectures that automatically
satisfy some of the mathematical and physical laws in other image problems for
better accuracy, faster training and improved generalization.The code is
available at \url{https://github.com/zhuhui100/FASUNet}.Comment: 18 page
On Deep Machine Learning for Multi-view Object Detection and Neural Scene Rendering
This thesis addresses two contemporary computer vision tasks using a set of multiple-view imagery, namely the joint use of multi-view images to improve object detection and neural scene rendering via a novel volumetric input encoding for Neural Radiance Fields (NeRF). While the former focuses on improving the accuracy of object detection, the latter contribution allows for better scene reconstruction, which ultimately can be exploited to generate novel views and perform multi-view object detection.
Notwithstanding the significant advances in automatic object detection in the last decade, multi-view object detection has received little attention. For this reason, two contributions regarding multi-view object detection in the absence of explicit camera pose information are presented in this thesis. First, a multi-view epipolar filtering technique is introduced, using the distance of the detected object centre to a corresponding epipolar line as an additional probabilistic confidence. This technique removes false positives without a corresponding detection in other views, giving greater confidence to consistent detections across the views. The second contribution adds an attention-based layer, called Multi-view Vision Transformer, to the backbone of a deep machine learning object detector, effectively aggregating features from different views and creating a multi-view aware representation.
The final contribution explores another application for multi-view imagery, namely novel volumetric input encoding of NeRF. The proposed method derives an analytical solution for the average value of a sinusoidal (inducing a high-frequency component) within a pyramidal frustum region, whereas previous state-of-the-art NeRF methods approximate this with a Gaussian distribution. This parameterisation obtains a better representation of regions where the Gaussian approximation is poor, allowing more accurate synthesis of distant areas and depth map estimation.
Experimental evaluation is carried out across multiple established benchmark datasets to compare the proposed methods against contemporary state-of-the-art architectures such that the efficacy of the proposed methods can be both quantitively and qualitatively illustrated
HoEnTOA: Holoentropy and Taylor Assisted Optimization based Novel Image Quality Enhancement Algorithm for Multi-Focus Image FusionÂ
In machine vision as well as image processing applications, multi-focus image fusion strategy carries a prominent exposure. Normally, image fusion is a method of merging of information extracted out of two or more than two source images fused to produce a solitary image, which is much more instructive as well as much suitable for computer processing and visual perception. In this research paper authors have devised a novel image quality enhancement algorithm by fusing multi-focus images, in short, termed as HoEnTOA. Initially, contourlet transform is incorporated to both of the input images for generation of four respective sub-bands of each of input image. After converting into sub-bands further holoentropy along with proposed HoEnTOA is introduced to fuse multi-focus images. Here, the developed HoEnTOA is integration of Taylor series with ASSCA. After fusion, the inverse contourlet transform is incorporated for obtaining last fused image. Thus, the proposed HoEnTOA effectively performs the image fusion and has demonstrated better performance utilizing the five metrics i.e. Root Mean Square Error with a minimum value of 3.687, highest universal quality index value of 0.984, maximum Peak Signal to Noise Ratio of 42.08dB, maximal structural similarity index measurement of 0.943, as well as maximum mutual information of 1.651
- …