991 research outputs found

    Studies on Next-Gen Vehicular Detection: A Fusion of RF Signal and Fisheye Camera Technologies

    Get PDF
    早稲田大学修士(工学)master thesi

    Multi-Viewpoint and Multi-Evaluation with Felicitous Inductive Bias Boost Machine Abstract Reasoning Ability

    Full text link
    Great endeavors have been made to study AI's ability in abstract reasoning, along with which different versions of RAVEN's progressive matrices (RPM) are proposed as benchmarks. Previous works give inkling that without sophisticated design or extra meta-data containing semantic information, neural networks may still be indecisive in making decisions regarding RPM problems, after relentless training. Evidenced by thorough experiments and ablation studies, we showcase that end-to-end neural networks embodied with felicitous inductive bias, intentionally design or serendipitously match, can solve RPM problems elegantly, without the augment of any extra meta-data or preferences of any specific backbone. Our work also reveals that multi-viewpoint with multi-evaluation is a key learning strategy for successful reasoning. Finally, potential explanations for the failure of connectionist models in generalization are provided. We hope that these results will serve as inspections of AI's ability beyond perception and toward abstract reasoning. Source code can be found in https://github.com/QinglaiWeiCASIA/RavenSolver

    FAS-UNet: A Novel FAS-driven Unet to Learn Variational Image Segmentation

    Full text link
    Solving variational image segmentation problems with hidden physics is often expensive and requires different algorithms and manually tunes model parameter. The deep learning methods based on the U-Net structure have obtained outstanding performances in many different medical image segmentation tasks, but designing such networks requires a lot of parameters and training data, not always available for practical problems. In this paper, inspired by traditional multi-phase convexity Mumford-Shah variational model and full approximation scheme (FAS) solving the nonlinear systems, we propose a novel variational-model-informed network (denoted as FAS-Unet) that exploits the model and algorithm priors to extract the multi-scale features. The proposed model-informed network integrates image data and mathematical models, and implements them through learning a few convolution kernels. Based on the variational theory and FAS algorithm, we first design a feature extraction sub-network (FAS-Solution module) to solve the model-driven nonlinear systems, where a skip-connection is employed to fuse the multi-scale features. Secondly, we further design a convolution block to fuse the extracted features from the previous stage, resulting in the final segmentation possibility. Experimental results on three different medical image segmentation tasks show that the proposed FAS-Unet is very competitive with other state-of-the-art methods in qualitative, quantitative and model complexity evaluations. Moreover, it may also be possible to train specialized network architectures that automatically satisfy some of the mathematical and physical laws in other image problems for better accuracy, faster training and improved generalization.The code is available at \url{https://github.com/zhuhui100/FASUNet}.Comment: 18 page

    On Deep Machine Learning for Multi-view Object Detection and Neural Scene Rendering

    Get PDF
    This thesis addresses two contemporary computer vision tasks using a set of multiple-view imagery, namely the joint use of multi-view images to improve object detection and neural scene rendering via a novel volumetric input encoding for Neural Radiance Fields (NeRF). While the former focuses on improving the accuracy of object detection, the latter contribution allows for better scene reconstruction, which ultimately can be exploited to generate novel views and perform multi-view object detection. Notwithstanding the significant advances in automatic object detection in the last decade, multi-view object detection has received little attention. For this reason, two contributions regarding multi-view object detection in the absence of explicit camera pose information are presented in this thesis. First, a multi-view epipolar filtering technique is introduced, using the distance of the detected object centre to a corresponding epipolar line as an additional probabilistic confidence. This technique removes false positives without a corresponding detection in other views, giving greater confidence to consistent detections across the views. The second contribution adds an attention-based layer, called Multi-view Vision Transformer, to the backbone of a deep machine learning object detector, effectively aggregating features from different views and creating a multi-view aware representation. The final contribution explores another application for multi-view imagery, namely novel volumetric input encoding of NeRF. The proposed method derives an analytical solution for the average value of a sinusoidal (inducing a high-frequency component) within a pyramidal frustum region, whereas previous state-of-the-art NeRF methods approximate this with a Gaussian distribution. This parameterisation obtains a better representation of regions where the Gaussian approximation is poor, allowing more accurate synthesis of distant areas and depth map estimation. Experimental evaluation is carried out across multiple established benchmark datasets to compare the proposed methods against contemporary state-of-the-art architectures such that the efficacy of the proposed methods can be both quantitively and qualitatively illustrated

    HoEnTOA: Holoentropy and Taylor Assisted Optimization based Novel Image Quality Enhancement Algorithm for Multi-Focus Image Fusion 

    Get PDF
    In machine vision as well as image processing applications, multi-focus image fusion strategy carries a prominent exposure. Normally, image fusion is a method of merging of information extracted out of two or more than two source images fused to produce a solitary image, which is much more instructive as well as much suitable for computer processing and visual perception. In this research paper authors have devised a novel image quality enhancement algorithm by fusing multi-focus images, in short, termed as HoEnTOA. Initially, contourlet transform is incorporated to both of the input images for generation of four respective sub-bands of each of input image. After converting into sub-bands further holoentropy along with proposed HoEnTOA is introduced to fuse multi-focus images. Here, the developed HoEnTOA is integration of Taylor series with ASSCA. After fusion, the inverse contourlet transform is incorporated for obtaining last fused image. Thus, the proposed HoEnTOA effectively performs the image fusion and has demonstrated better performance utilizing the five metrics i.e. Root Mean Square Error with a minimum value of 3.687, highest universal quality index value of 0.984, maximum Peak Signal to Noise Ratio of 42.08dB, maximal structural similarity index measurement of 0.943, as well as maximum mutual information of 1.651
    • …
    corecore