125 research outputs found

    Tracking, Detection and Registration in Microscopy Material Images

    Get PDF
    Fast and accurate characterization of fiber micro-structures plays a central role for material scientists to analyze physical properties of continuous fiber reinforced composite materials. In materials science, this is usually achieved by continuously crosssectioning a 3D material sample for a sequence of 2D microscopic images, followed by a fiber detection/tracking algorithm through the obtained image sequence. To speed up this process and be able to handle larger-size material samples, we propose sparse sampling with larger inter-slice distance in cross sectioning and develop a new algorithm that can robustly track large-scale fibers from such a sparsely sampled image sequence. In particular, the problem is formulated as multi-target tracking and Kalman filters are applied to track each fiber along the image sequence. One main challenge in this tracking process is to correctly associate each fiber to its observation given that 1) fiber observations are of large scale, crowded and show very similar appearances in a 2D slice, and 2) there may be a large gap between the predicted location of a fiber and its observation in the sparse sampling. To address this challenge, a novel group-wise association algorithm is developed by leveraging the fact that fibers are implanted in bundles and the fibers in the same bundle are highly correlated through the image sequence. Tracking-by-detection algorithms rely heavily on detection accuracy, especially the recall performance. The state-of-the-art fiber detection algorithms perform well under ideal conditions, but are not accurate where there are local degradations of image quality, due to contaminants on the material surface and/or defocus blur. Convolutional Neural Networks (CNN) could be used for this problem, but would require a large number of manual annotated fibers, which are not available. We propose an unsupervised learning method to accurately detect fibers on the large scale, which is robust against local degradations of image quality. The proposed method does not require manual annotations, but uses fiber shape/size priors and spatio-temporal consistency in tracking to simulate the supervision in the training of the CNN. Due to the significant microscope movement during the data acquisition, the sampled microscopy images might be not well aligned, which increases the difficulties for further large-scale fiber tracking. In this dissertation, we design an object tracking system which could accurately track large-scale fibers and simultaneously perform satisfactory image registration. Large-scale fiber tracking task is accomplished by Kalman filters based tracking methods. With the assistance of fiber tracking, the image registration is performed in a coarse-to-fine way. To evaluate the proposed methods, a dataset was collected by Air Force Research Laboratories (AFRL). The material scientists in AFRL used a serial sectioning instrument to cross-section the 3D material samples. During sample preparation, the samples are ground, cleaned, and then imaged. Experimental results on this collected dataset have demonstrated that the proposed methods yield significant improvements in large-scale fiber tracking and detection, together with satisfactory image registration

    Co-interest Person Detection from Multiple Wearable Camera Videos

    Full text link
    Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.Comment: ICCV 201

    Bridging the Domain Gap for Multi-Agent Perception

    Full text link
    Existing multi-agent perception algorithms usually select to share deep neural features extracted from raw sensing data between agents, achieving a trade-off between accuracy and communication bandwidth limit. However, these methods assume all agents have identical neural networks, which might not be practical in the real world. The transmitted features can have a large domain gap when the models differ, leading to a dramatic performance drop in multi-agent perception. In this paper, we propose the first lightweight framework to bridge such domain gaps for multi-agent perception, which can be a plug-in module for most existing systems while maintaining confidentiality. Our framework consists of a learnable feature resizer to align features in multiple dimensions and a sparse cross-domain transformer for domain adaption. Extensive experiments on the public multi-agent perception dataset V2XSet have demonstrated that our method can effectively bridge the gap for features from different domains and outperform other baseline methods significantly by at least 8% for point-cloud-based 3D object detection.Comment: Accepted by ICRA2023.Code: https://github.com/DerrickXuNu/MPD

    Detecting phone-related pedestrian distracted behaviours via a two-branch convolutional neural network

    Get PDF
    The distracted phone-use behaviours among pedestrians, like Texting, Game Playing and Phone Calls, have caused increasing fatalities and injuries. However, the research of phonerelated distracted behaviour by pedestrians has not been systemically studied. It is desired to improve both the driving and pedestrian safety by automatically discovering the phonerelated pedestrian distracted behaviours. Herein, a new computer vision-based method is proposed to detect the phone-related pedestrian distracted behaviours from a view of intelligent and autonomous driving. Specifically, the first end-to-end deep learning based Two-Branch Convolutional Neural Network (CNN) is designed for this task. Taking one synchronised image pair by two front on-car GoPro cameras as the inputs, the proposed two-branch CNN will extract features for each camera, fuse the extracted features and perform a robust classification. This method can also be easily extended to video-based classification by confidence accumulation and voting. A new benchmark dataset of 448 synchronised video pairs of 53,760 images collected on a vehicle is proposed for this research. The experimental results show that using two synchronised cameras obtained better performance than using one single camera. Finally, the proposed method achieved an overall best classification accuracy of 84.3% on the new benchmark when compared to other methods

    Visual Attention Consistency under Image Transforms for Multi-Label Image Classification

    Get PDF
    Human visual perception shows good consistency for many multi-label image classification tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This has motivated the data augmentation strategy widely used in CNN classifier training -- transformed images are included for training by assuming the same class labels as their original images. In this paper, we further propose the assumption of perceptual consistency of visual attention regions for classification under such transforms, i.e., the attention region for a classification follows the same transform if the input image is spatially transformed. While the attention regions of CNN classifiers can be derived as an attention heatmap in middle layers of the network, we find that their consistency under many transforms are not preserved. To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches. This new loss is then combined with multi-label image classification loss for network training. Experiments on three datasets verify the superiority of the proposed network by achieving new state-of-the-art classification performance

    SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

    Full text link
    Recently, self-supervised monocular depth estimation has gained popularity with numerous applications in autonomous driving and robotics. However, existing solutions primarily seek to estimate depth from immediate visual features, and struggle to recover fine-grained scene details with limited generalization. In this paper, we introduce SQLdepth, a novel approach that can effectively learn fine-grained scene structures from motion. In SQLdepth, we propose a novel Self Query Layer (SQL) to build a self-cost volume and infer depth from it, rather than inferring depth from feature maps. The self-cost volume implicitly captures the intrinsic geometry of the scene within a single frame. Each individual slice of the volume signifies the relative distances between points and objects within a latent space. Ultimately, this volume is compressed to the depth map via a novel decoding approach. Experimental results on KITTI and Cityscapes show that our method attains remarkable state-of-the-art performance (AbsRel = 0.0820.082 on KITTI, 0.0520.052 on KITTI with improved ground-truth and 0.1060.106 on Cityscapes), achieves 9.9%9.9\%, 5.5%5.5\% and 4.5%4.5\% error reduction from the previous best. In addition, our approach showcases reduced training complexity, computational efficiency, improved generalization, and the ability to recover fine-grained scene details. Moreover, the self-supervised pre-trained and metric fine-tuned SQLdepth can surpass existing supervised methods by significant margins (AbsRel = 0.0430.043, 14%14\% error reduction). self-matching-oriented relative distance querying in SQL improves the robustness and zero-shot generalization capability of SQLdepth. Code and the pre-trained weights will be publicly available. Code is available at \href{https://github.com/hisfog/SQLdepth-Impl}{https://github.com/hisfog/SQLdepth-Impl}.Comment: 14 pages, 9 figure

    Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications

    Get PDF
    It’s critical for an autonomous vehicle to acquire accurate and real-time information of the objects in its vicinity, which will fully guarantee the safety of the passengers and vehicle in various environment. 3D LIDAR can directly obtain the position and geometrical structure of the object within its detection range, while vision camera is very suitable for object recognition. Accordingly, this paper presents a novel object detection and identification method fusing the complementary information of two kind of sensors. We first utilize the 3D LIDAR data to generate accurate object-region proposals effectively. Then, these candidates are mapped into the image space where the regions of interest (ROI) of the proposals are selected and input to a convolutional neural network (CNN) for further object recognition. In order to identify all sizes of objects precisely, we combine the features of the last three layers of the CNN to extract multi-scale features of the ROIs. The evaluation results on the KITTI dataset demonstrate that : (1) Unlike sliding windows that produce thousands of candidate object-region proposals, 3D LIDAR provides an average of 86 real candidates per frame and the minimal recall rate is higher than 95%, which greatly lowers the proposals extraction time; (2) The average processing time for each frame of the proposed method is only 66.79ms, which meets the real-time demand of autonomous vehicles; (3) The average identification accuracies of our method for car and pedestrian on the moderate level are 89.04% and 78.18% respectively, which outperform most previous methods

    Domain Adaptation For Vehicle Detection In Traffic Surveillance Images From Daytime To Nighttime

    Get PDF
    Vehicle detection in traffic surveillance images is an important approach to obtain vehicle data and rich traffic flow parameters. Recently, deep learning based methods have been widely used in vehicle detection with high accuracy and efficiency. However, deep learning based methods require a large number of manually labeled ground truths (bounding box of each vehicle in each image) to train the Convolutional Neural Networks (CNN). In the modern urban surveillance cameras, there are already many manually labeled ground truths in daytime images for training CNN, while there are little or much less manually labeled ground truths in nighttime images. In this paper, we focus on the research to make maximum usage of labeled daytime images (Source Domain) to help the vehicle detection in unlabeled nighttime images (Target Domain). For this purpose, we propose a new method based on Faster R-CNN with Domain Adaptation (DA) to improve the vehicle detection at nighttime. With the assistance of DA, the domain distribution discrepancy of Source and Target Domains is reduced. We collected a new dataset of 2,200 traffic images (1,200 for daytime and 1,000 for nighttime) of 57,059 vehicles for training and testing CNN. In the experiment, only using the manually labeled ground truths of daytime data, Faster R- CNN obtained 82.84% as F-measure on the nighttime vehicle detection, while the proposed method (Faster R-CNN+DA) achieved 86.39% as F-measure on the nighttime vehicle detection

    Heterogeneous Trajectory Forecasting via Risk and Scene Graph Learning

    Full text link
    Heterogeneous trajectory forecasting is critical for intelligent transportation systems, while it is challenging because of the difficulty for modeling the complex interaction relations among the heterogeneous road agents as well as their agent-environment constraint. In this work, we propose a risk and scene graph learning method for trajectory forecasting of heterogeneous road agents, which consists of a Heterogeneous Risk Graph (HRG) and a Hierarchical Scene Graph (HSG) from the aspects of agent category and their movable semantic regions. HRG groups each kind of road agents and calculates their interaction adjacency matrix based on an effective collision risk metric. HSG of driving scene is modeled by inferring the relationship between road agents and road semantic layout aligned by the road scene grammar. Based on this formulation, we can obtain an effective trajectory forecasting in driving situations, and superior performance to other state-of-the-art approaches is demonstrated by exhaustive experiments on the nuScenes, ApolloScape, and Argoverse datasets.Comment: Submitted to IEEE Transactions on Intelligent Transportation Systems, 202
    • …
    corecore