1,559 research outputs found

    Coding local and global binary visual features extracted from video sequences

    Get PDF
    Binary local features represent an effective alternative to real-valued descriptors, leading to comparable results for many visual analysis tasks, while being characterized by significantly lower computational complexity and memory requirements. When dealing with large collections, a more compact representation based on global features is often preferred, which can be obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW) model. Several applications, including for example visual sensor networks and mobile augmented reality, require visual features to be transmitted over a bandwidth-limited network, thus calling for coding techniques that aim at reducing the required bit budget, while attaining a target level of efficiency. In this paper we investigate a coding scheme tailored to both local and global binary features, which aims at exploiting both spatial and temporal redundancy by means of intra- and inter-frame coding. In this respect, the proposed coding scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC) paradigm. That is, visual features are extracted from the acquired content, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast with the traditional approach, in which visual content is acquired at a node, compressed and then sent to a central unit for further processing, according to the Compress-Then-Analyze (CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of rate-efficiency curves in the context of two different visual analysis tasks: homography estimation and content-based retrieval. Our results show that the novel ATC paradigm based on the proposed coding primitives can be competitive with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin

    Semantic Validation in Structure from Motion

    Full text link
    The Structure from Motion (SfM) challenge in computer vision is the process of recovering the 3D structure of a scene from a series of projective measurements that are calculated from a collection of 2D images, taken from different perspectives. SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure from estimated intrinsic and extrinsic parameters and features. A problem encountered in SfM is that scenes lacking texture or with repetitive features can cause erroneous feature matching between frames. Semantic segmentation offers a route to validate and correct SfM models by labelling pixels in the input images with the use of a deep convolutional neural network. The semantic and geometric properties associated with classes in the scene can be taken advantage of to apply prior constraints to each class of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab were used. This, along with planar reconstruction of the dense model, were used to determine erroneous points that may be occluded from the calculated camera position, given the semantic label, and thus prior constraint of the reconstructed plane. Herein, semantic segmentation is integrated into SfM to apply priors on the 3D point cloud, given the object detection in the 2D input images. Additionally, the semantic labels of matched keypoints are compared and inconsistent semantically labelled points discarded. Furthermore, semantic labels on input images are used for the removal of objects associated with motion in the output SfM models. The proposed approach is evaluated on a data-set of 1102 images of a repetitive architecture scene. This project offers a novel method for improved validation of 3D SfM models

    Plant image retrieval using color, shape and texture features

    Get PDF
    We present a content-based image retrieval system for plant image retrieval, intended especially for the house plant identification problem. A plant image consists of a collection of overlapping leaves and possibly flowers, which makes the problem challenging.We studied the suitability of various well-known color, shape and texture features for this problem, as well as introducing some new texture matching techniques and shape features. Feature extraction is applied after segmenting the plant region from the background using the max-flow min-cut technique. Results on a database of 380 plant images belonging to 78 different types of plants show promise of the proposed new techniques and the overall system: in 55% of the queries, the correct plant image is retrieved among the top-15 results. Furthermore, the accuracy goes up to 73% when a 132-image subset of well-segmented plant images are considered

    A novel sketch based face recognition in unconstrained video for criminal investigation

    Get PDF
    Face recognition in video surveillance helps to identify an individual by comparing facial features of given photograph or sketch with a video for criminal investigations. Generally, face sketch is used by the police when suspect’s photo is not available. Manual matching of facial sketch with suspect’s image in a long video is tedious and time-consuming task. To overcome these drawbacks, this paper proposes an accurate face recognition technique to recognize a person based on his sketch in an unconstrained video surveillance. In the proposed method, surveillance video and sketch of suspect is taken as an input. Firstly, input video is converted into frames and summarized using the proposed quality indexed three step cross search algorithm. Next, faces are detected by proposed modified Viola-Jones algorithm. Then, necessary features are selected using the proposed salp-cat optimization algorithm. Finally, these features are fused with scale-invariant feature transform (SIFT) features and Euclidean distance is computed between feature vectors of sketch and each face in a video. Face from the video having lowest Euclidean distance with query sketch is considered as suspect’s face. The proposed method’s performance is analyzed on Chokepoint dataset and the system works efficiently with 89.02% of precision, 91.25% of recall and 90.13% of F-measure

    Large-scale environment mapping and immersive human-robot interaction for agricultural mobile robot teleoperation

    Full text link
    Remote operation is a crucial solution to problems encountered in agricultural machinery operations. However, traditional video streaming control methods fall short in overcoming the challenges of single perspective views and the inability to obtain 3D information. In light of these issues, our research proposes a large-scale digital map reconstruction and immersive human-machine remote control framework for agricultural scenarios. In our methodology, a DJI unmanned aerial vehicle(UAV) was utilized for data collection, and a novel video segmentation approach based on feature points was introduced. To tackle texture richness variability, an enhanced Structure from Motion (SfM) using superpixel segmentation was implemented. This method integrates the open Multiple View Geometry (openMVG) framework along with Local Features from Transformers (LoFTR). The enhanced SfM results in a point cloud map, which is further processed through Multi-View Stereo (MVS) to generate a complete map model. For control, a closed-loop system utilizing TCP for VR control and positioning of agricultural machinery was introduced. Our system offers a fully visual-based immersive control method, where upon connection to the local area network, operators can utilize VR for immersive remote control. The proposed method enhances both the robustness and convenience of the reconstruction process, thereby significantly facilitating operators in acquiring more comprehensive on-site information and engaging in immersive remote control operations. The code is available at: https://github.com/LiuTao1126/Enhance-SF

    Research on a modifeied RANSAC and its applications to ellipse detection from a static image and motion detection from active stereo video sequences

    Get PDF
    制度:新 ; 報告番号:甲3091号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2010/2/24 ; 早大学位記番号:新535
    corecore