430,546 research outputs found

    A Performance Evaluation of Exact and Approximate Match Kernels for Object Recognition

    Get PDF
    Local features have repeatedly shown their effectiveness for object recognition during the last years, and they have consequently become the preferred descriptor for this type of problems. The solution of the correspondence problem is traditionally approached with exact or approximate techniques. In this paper we are interested in methods that solve the correspondence problem via the definition of a kernel function that makes it possible to use local features as input to a support vector machine. We single out the match kernel, an exact approach, and the pyramid match kernel, that uses instead an approximate strategy. We present a thorough experimental evaluation of the two methods on three different databases. Results show that the exact method performs consistently better than the approximate one, especially for the object identification task, when training on a decreasing number of images. Based on these findings and on the computational cost of each approach, we suggest some criteria for choosing between the two kernels given the application at hand

    Progressive Domain Adaptation with Contrastive Learning for Object Detection in the Satellite Imagery

    Full text link
    State-of-the-art object detection methods applied to satellite and drone imagery largely fail to identify small and dense objects. One reason is the high variability of content in the overhead imagery due to the terrestrial region captured and the high variability of acquisition conditions. Another reason is that the number and size of objects in aerial imagery are very different than in the consumer data. In this work, we propose a small object detection pipeline that improves the feature extraction process by spatial pyramid pooling, cross-stage partial networks, heatmap-based region proposal network, and object localization and identification through a novel image difficulty score that adapts the overall focal loss measure based on the image difficulty. Next, we propose novel contrastive learning with progressive domain adaptation to produce domain-invariant features across aerial datasets using local and global components. We show we can alleviate the degradation of object identification in previously unseen datasets. We create a first-ever domain adaptation benchmark using contrastive learning for the object detection task in highly imbalanced satellite datasets with significant domain gaps and dominant small objects. The proposed method results in a 7.4% increase in mAP performance measure over the best state-of-art

    Learning and Visualizing Localized Geometric Features Using 3D-CNN: An Application to Manufacturability Analysis of Drilled Holes

    Get PDF
    3D Convolutional Neural Networks (3D-CNN) have been used for object recognition based on the voxelized shape of an object. However, interpreting the decision making process of these 3D-CNNs is still an infeasible task. In this paper, we present a unique 3D-CNN based Gradient-weighted Class Activation Mapping method (3D-GradCAM) for visual explanations of the distinct local geometric features of interest within an object. To enable efficient learning of 3D geometries, we augment the voxel data with surface normals of the object boundary. We then train a 3D-CNN with this augmented data and identify the local features critical for decision-making using 3D GradCAM. An application of this feature identification framework is to recognize difficult-to-manufacture drilled hole features in a complex CAD geometry. The framework can be extended to identify difficult-to-manufacture features at multiple spatial scales leading to a real-time design for manufacturability decision support system.This is a proceeding preprint from Ghadai, Sambit, Aditya Balu, Adarsh Krishnamurthy, and Soumik Sarkar. "Learning and visualizing localized geometric features using 3d-cnn: An application to manufacturability analysis of drilled holes." arXiv preprint arXiv:1711.04851 (2017). doi: https://doi.org/10.48550/arXiv.1711.04851. Copyright 2017 The Authors

    Multi-Object tracking using Multi-Channel Part Appearance Representation

    Get PDF
    International audienceAppearance based multi-object tracking (MOT) is a challenging task, specially in complex scenes where objects have similar appearance or are occluded by background or other objects. Such factors motivate researchers to propose effective trackers which should satisfy real-time processing and object trajectory recovery criteria. In order to handle both mentioned requirements, we propose a robust online multi-object tracking method that extends the features and methods proposed for re-identification to MOT. The proposed tracker combines a local and a global tracker in a comprehensive two-step framework. In the local tracking step, we use the frame-to-frame association to generate online object trajectories. Each object trajectory is called tracklet and is represented by a set of multi-modal feature distributions modeled by GMMs. In the global tracking step, occlusions and mis-detections are recovered by tracklet bipartite association method based on learning Mahalanobis metric between GMM components using KISSME metric learning algorithm. Experiments on two public datasets show that our tracker performs well when compared to state-of-the-art tracking algorithms

    Effective identification of terrain positions from gridded DEM data using multimodal classification integration

    Get PDF
    Terrain positions are widely used to describe the Earth’s topographic features and play an important role in the studies of landform evolution, soil erosion and hydrological modeling. This work develops a new multimodal classification system with enhanced classification performance by integrating different approaches for terrain position identification. The adopted classification approaches include local terrain attribute (LA)-based and regional terrain attribute (RA)-based, rule-based and supervised, and pixel-based and object-oriented methods. Firstly, a double-level definition scheme is presented for terrain positions. Then, utilizing a hierarchical framework, a multimodal approach is developed by integrating different classification techniques. Finally, an assessment method is established to evaluate the new classification system from different aspects. The experimental results, obtained at a Loess Plateau region in northern China on a 5 m digital elevation model (DEM), show reasonably positional relationship, and larger inter-class and smaller intra-class variances. This indicates that identified terrain positions are consistent with the actual topography from both overall and local perspectives, and have relatively good integrity and rationality. This study demonstrates that the current multimodal classification system, developed by taking advantage of various classification methods, can reflect the geographic meanings and topographic features of terrain positions from different levels

    Method for identification of geopulses to include into the Geophysical Signal Catalogue

    Get PDF
    A new system approach to the identification and systematization of geophysical pulses is described. It includes stages of detection, analysis, object and structural description, pulse classification. A method based on the adaptive threshold calculation is proposed for pulse detection in geoacoustic emission signals. The authors propose to analyze detected pulses using sparse approximation methods, in particular, the method of the adaptive matching pursuit. This method allows one to decompose a pulse into basis functions of combined Gauss-Berlage dictionary with minimum spatial and temporal costs and with the required accuracy of constructed approximations. The obtained sparse representations are described under the object approach as a combination of informative features identified during the analysis, for example, the number of functions in decomposition, function parameters etc. The object description of geoacoustic emission pulses is supplemented by an original structural description based on the relations of pulse local extremums. That made it possible to reduce significantly the variety of pulse shapes for their further identification. The results obtained during the application of the described system approach to the analysis of geoacoustic signals are summarized in a Geophysical Signal Catalogue.The research was supported by Russian Science Foundation, project No. 18-11-00087

    Curvature scale space corner detector with adaptive threshold and dynamic region of support

    Get PDF
    Corners play an important role in object identification methods used in machine vision and image processing systems. Single-scale feature detection finds it hard to detect both fine and coarse features at the same time. On the other hand, multi-scale feature detection is inherently able to solve this problem. This paper proposes an improved multi-scale corner detector with dynamic region of support, which is based on Curvature Scale Space (CSS) technique. The proposed detector first uses an adaptive local curvature threshold instead of a single global threshold as in the original and enhanced CSS methods. Second, the angles of corner candidates are checked in a dynamic region of support for eliminating falsely detected corners. The proposed method has been evaluated over a number of images and compared with some popular corner detectors. The results showed that the proposed method offers a robust and effective solution to images containing widely different size features.published_or_final_versio

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    Deep Feature Learning and Adaptation for Computer Vision

    Get PDF
    We are living in times when a revolution of deep learning is taking place. In general, deep learning models have a backbone that extracts features from the input data followed by task-specific layers, e.g. for classification. This dissertation proposes various deep feature extraction and adaptation methods to improve task-specific learning, such as visual re-identification, tracking, and domain adaptation. The vehicle re-identification (VRID) task requires identifying a given vehicle among a set of vehicles under variations in viewpoint, illumination, partial occlusion, and background clutter. We propose a novel local graph aggregation module for feature extraction to improve VRID performance. We also utilize a class-balanced loss to compensate for the unbalanced class distribution in the training dataset. Overall, our framework achieves state-of-the-art (SOTA) performance in multiple VRID benchmarks. We further extend our VRID method for visual object tracking under occlusion conditions. We motivate visual object tracking from aerial platforms by conducting a benchmarking of tracking methods on aerial datasets. Our study reveals that the current techniques have limited capabilities to re-identify objects when fully occluded or out of view. The Siamese network based trackers perform well compared to others in overall tracking performance. We utilize our VRID work in visual object tracking and propose Siam-ReID, a novel tracking method using a Siamese network and VRID technique. In another approach, we propose SiamGauss, a novel Siamese network with a Gaussian Head for improved confuser suppression and real time performance. Our approach achieves SOTA performance on aerial visual object tracking datasets. A related area of research is developing deep learning based domain adaptation techniques. We propose continual unsupervised domain adaptation, a novel paradigm for domain adaptation in data constrained environments. We show that existing works fail to generalize when the target domain data are acquired in small batches. We propose to use a buffer to store samples that are previously seen by the network and a novel loss function to improve the performance of continual domain adaptation. We further extend our continual unsupervised domain adaptation research for gradually varying domains. Our method outperforms several SOTA methods even though they have the entire domain data available during adaptation
    corecore