193,171 research outputs found

    Control Design and Implementation of Autonomous 2-DOF Wireless Visual Object Tracking System

    Get PDF
    Due to large scale implementation of visual detection and tracking as a mean of sensor and navigation tool, target detection and tracking using image manipulation for autonomous robotic system becomes an interesting object of study for many researchers. In addition, there have been attempts to develop a system that can detect and track a moving target by using an image or video processing in a real time condition. Despite that, visual object tracking can be a subject of noise because of image manipulation. The noise can create uncertainty on state and observation model that can lead to control instability, especially that in remote operation. Therefore, an effective filter that can tackle or reduce this noise is needed in developing a visual object tracking system. In this work, a 2-degree of freedom (2-DOF) visual object tracking system was developed with an information filter. The system consists of an image capture unit, an image processing unit, a wireless communication unit, and a manipulator. Then to observe the filter effectiveness on real time visual object tracking in remote operation, performances of this visual object tracking system with and without the filter were tested based on video simulation and real time tracking. In the live streaming test, the information filter can reduce the error of the measurement about 30% than that without it

    Multi-modal Queried Object Detection in the Wild

    Full text link
    We introduce MQ-Det, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, Multi-modal Queried object Detection, for real-world detection with both open-vocabulary categories and various granularity. MQ-Det incorporates vision queries into existing well-established language-queried-only detectors. A plug-and-play gated class-scalable perceiver module upon the frozen detector is proposed to augment category text with class-wise visual information. To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed. MQ-Det's simple yet effective architecture and training strategy design is compatible with most language-queried object detectors, thus yielding versatile applications. Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the state-of-the-art open-set detector GLIP by +7.8% zero-shot AP on the LVIS benchmark and averagely +6.3% AP on 13 few-shot downstream tasks, with merely 3% pre-training time required by GLIP. Code is available at https://github.com/YifanXu74/MQ-Det.Comment: Under revie

    Measuring in-plane deflections and strains through visual sensing techniques for civil infrastructure applications

    Get PDF
    Maintaining the integrity and safety of civil infrastructures such as bridges, dams, tunnels and high-rise buildings is an essential task for civil engineers. Collapse or damage of these civil infrastructures may lead to a tremendous amount of injuries and casualties. To alleviate this situation, a real-time surveillance method enabled by visual sensing techniques is proposed in this thesis. The advances of applying visual sensing techniques, for instance, are allowing practical deployment for large extended systems in a more cost-effective way. Also, the image or video data can be easily used for long-term condition assessments.;The proposed method entails applying visual sensing techniques to measure in-plane deflections and strains of structural members for civil infrastructure applications. In specific, it employs visual sensors (digital/industrial cameras) to capture and record a series of continuous image frames of the targets. Then automated feature detection and matching algorithms are applied to detect and match object features in the consecutive image frames. Based on the location information of the detected features, the in-plane object displacement can be accurately calculated through keeping tracking those features in the continuous image frames. Next, an optimized interpolation procedure is conducted to obtain dense displacement field for the object. And the strains can be consequently recovered from the displacement field through computing its derivatives.;In this research, firstly, the work of evaluating the optimum feature detection and matching algorithm is reported, which is the key task to achieve accurate surveillance. A series of experiments were conducted to compare the three algorithms: DIC (Digital Image Correlation), SIFT (Scale Invariant Feature Transform), and SURF (Speeded-Up Robust Features). The experimental result indicated that the DIC algorithm reveals superiority among the three algorithms and holds the most potential for measuring in-plane deflections and strains of civil infrastructures. To further validate our method, we employed high-speed industrial camera (Manta G223B) to capture a series of continuous image frames of deformed real-world scenarios. The DIC algorithm was adopted for the feature detection and matching process. As the output, the displacement and strains were calculated and then compared with the ground truth in order to evaluate the accuracy performance of the method. Colored strain maps were generated by using different colors to reflect different strain levels in an intuitive way. The experimental result indicated that our method can achieve highly accurate measuring performance of computing in-plane displacements and strains for civil infrastructure applications. The proposed method has several advantages when compared to pre-existing methods (such as sensor networks). It can generate accurate full-field deflections and strains of the target. Besides, the cost-effective equipment and much more convenient set-up procedures will enable engineers to operate periodically and apply for different scales of civil infrastructure applications

    Towards Developing Computer Vision Algorithms and Architectures for Real-world Applications

    Get PDF
    abstract: Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading. To detect and classify objects in video, the objects have to be separated from the background, and then the discriminant features are extracted from the region of interest before feeding to a classifier. Effective object segmentation and feature extraction are often application specific, and posing major challenges for object detection and classification tasks. In this dissertation, we address effective object flow based ROI generation algorithm for segmenting moving objects in video data, which can be applied in surveillance and self driving vehicle areas. Optical flow can also be used as features in human action recognition algorithm, and we present using optical flow feature in pre-trained convolutional neural network to improve performance of human action recognition algorithms. Both algorithms outperform the state-of-the-arts at their time. Medical images and videos pose unique challenges for image understanding mainly due to the fact that the tissues and cells are often irregularly shaped, colored, and textured, and hand selecting most discriminant features is often difficult, thus an automated feature selection method is desired. Sparse learning is a technique to extract the most discriminant and representative features from raw visual data. However, sparse learning with \textit{L1} regularization only takes the sparsity in feature dimension into consideration; we improve the algorithm so it selects the type of features as well; less important or noisy feature types are entirely removed from the feature set. We demonstrate this algorithm to analyze the endoscopy images to detect unhealthy abnormalities in esophagus and stomach, such as ulcer and cancer. Besides sparsity constraint, other application specific constraints and prior knowledge may also need to be incorporated in the loss function in sparse learning to obtain the desired results. We demonstrate how to incorporate similar-inhibition constraint, gaze and attention prior in sparse dictionary selection for gastroscopic video summarization that enable intelligent key frame extraction from gastroscopic video data. With recent advancement in multi-layer neural networks, the automatic end-to-end feature learning becomes feasible. Convolutional neural network mimics the mammal visual cortex and can extract most discriminant features automatically from training samples. We present using convolutinal neural network with hierarchical classifier to grade the severity of Follicular Lymphoma, a type of blood cancer, and it reaches 91\% accuracy, on par with analysis by expert pathologists. Developing real world computer vision applications is more than just developing core vision algorithms to extract and understand information from visual data; it is also subject to many practical requirements and constraints, such as hardware and computing infrastructure, cost, robustness to lighting changes and deformation, ease of use and deployment, etc.The general processing pipeline and system architecture for the computer vision based applications share many similar design principles and architecture. We developed common processing components and a generic framework for computer vision application, and a versatile scale adaptive template matching algorithm for object detection. We demonstrate the design principle and best practices by developing and deploying a complete computer vision application in real life, building a multi-channel water level monitoring system, where the techniques and design methodology can be generalized to other real life applications. The general software engineering principles, such as modularity, abstraction, robust to requirement change, generality, etc., are all demonstrated in this research.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Receptive Field Block Net for Accurate and Fast Object Detection

    Full text link
    Current top-performing object detectors depend on deep CNN backbones, such as ResNet-101 and Inception, benefiting from their powerful feature representations but suffering from high computational costs. Conversely, some lightweight model based detectors fulfil real time processing, while their accuracies are often criticized. In this paper, we explore an alternative to build a fast and accurate detector by strengthening lightweight features using a hand-crafted mechanism. Inspired by the structure of Receptive Fields (RFs) in human visual systems, we propose a novel RF Block (RFB) module, which takes the relationship between the size and eccentricity of RFs into account, to enhance the feature discriminability and robustness. We further assemble RFB to the top of SSD, constructing the RFB Net detector. To evaluate its effectiveness, experiments are conducted on two major benchmarks and the results show that RFB Net is able to reach the performance of advanced very deep detectors while keeping the real-time speed. Code is available at https://github.com/ruinmessi/RFBNet.Comment: Accepted by ECCV 201

    Vision-based Real-Time Aerial Object Localization and Tracking for UAV Sensing System

    Get PDF
    The paper focuses on the problem of vision-based obstacle detection and tracking for unmanned aerial vehicle navigation. A real-time object localization and tracking strategy from monocular image sequences is developed by effectively integrating the object detection and tracking into a dynamic Kalman model. At the detection stage, the object of interest is automatically detected and localized from a saliency map computed via the image background connectivity cue at each frame; at the tracking stage, a Kalman filter is employed to provide a coarse prediction of the object state, which is further refined via a local detector incorporating the saliency map and the temporal information between two consecutive frames. Compared to existing methods, the proposed approach does not require any manual initialization for tracking, runs much faster than the state-of-the-art trackers of its kind, and achieves competitive tracking performance on a large number of image sequences. Extensive experiments demonstrate the effectiveness and superior performance of the proposed approach.Comment: 8 pages, 7 figure
    • …
    corecore