42 research outputs found

    A Siamese transformer network for zero-shot ancient coin classification

    Get PDF
    Ancient numismatics, the study of ancient coins, has in recent years become an attractive domain for the application of computer vision and machine learning. Though rich in research problems, the predominant focus in this area to date has been on the task of attributing a coin from an image, that is of identifying its issue. This may be considered the cardinal problem in the field and it continues to challenge automatic methods. In the present paper, we address a number of limitations of previous work. Firstly, the existing methods approach the problem as a classification task. As such, they are unable to deal with classes with no or few exemplars (which would be most, given over 50,000 issues of Roman Imperial coins alone), and require retraining when exemplars of a new class become available. Hence, rather than seeking to learn a representation that distinguishes a particular class from all the others, herein we seek a representation that is overall best at distinguishing classes from one another, thus relinquishing the demand for exemplars of any specific class. This leads to our adoption of the paradigm of pairwise coin matching by issue, rather than the usual classification paradigm, and the specific solution we propose in the form of a Siamese neural network. Furthermore, while adopting deep learning, motivated by its successes in the field and its unchallenged superiority over classical computer vision approaches, we also seek to leverage the advantages that transformers have over the previously employed convolutional neural networks, and in particular their non-local attention mechanisms, which ought to be particularly useful in ancient coin analysis by associating semantically but not visually related distal elements of a coin’s design. Evaluated on a large data corpus of 14,820 images and 7605 issues, using transfer learning and only a small training set of 542 images of 24 issues, our Double Siamese ViT model is shown to surpass the state of the art by a large margin, achieving an overall accuracy of 81%. Moreover, our further investigation of the results shows that the majority of the method’s errors are unrelated to the intrinsic aspects of the algorithm itself, but are rather a consequence of unclean data, which is a problem that can be easily addressed in practice by simple pre-processing and quality checking.Publisher PDFPeer reviewe

    CherryPicker: Semantic Skeletonization and Topological Reconstruction of Cherry Trees

    Full text link
    In plant phenotyping, accurate trait extraction from 3D point clouds of trees is still an open problem. For automatic modeling and trait extraction of tree organs such as blossoms and fruits, the semantically segmented point cloud of a tree and the tree skeleton are necessary. Therefore, we present CherryPicker, an automatic pipeline that reconstructs photo-metric point clouds of trees, performs semantic segmentation and extracts their topological structure in form of a skeleton. Our system combines several state-of-the-art algorithms to enable automatic processing for further usage in 3D-plant phenotyping applications. Within this pipeline, we present a method to automatically estimate the scale factor of a monocular reconstruction to overcome scale ambiguity and obtain metrically correct point clouds. Furthermore, we propose a semantic skeletonization algorithm build up on Laplacian-based contraction. We also show by weighting different tree organs semantically, our approach can effectively remove artifacts induced by occlusion and structural size variations. CherryPicker obtains high-quality topology reconstructions of cherry trees with precise details.Comment: Accepted by CVPR 2023 Vision for Agriculture Worksho

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Radar and RGB-depth sensors for fall detection: a review

    Get PDF
    This paper reviews recent works in the literature on the use of systems based on radar and RGB-Depth (RGB-D) sensors for fall detection, and discusses outstanding research challenges and trends related to this research field. Systems to detect reliably fall events and promptly alert carers and first responders have gained significant interest in the past few years in order to address the societal issue of an increasing number of elderly people living alone, with the associated risk of them falling and the consequences in terms of health treatments, reduced well-being, and costs. The interest in radar and RGB-D sensors is related to their capability to enable contactless and non-intrusive monitoring, which is an advantage for practical deployment and users’ acceptance and compliance, compared with other sensor technologies, such as video-cameras, or wearables. Furthermore, the possibility of combining and fusing information from The heterogeneous types of sensors is expected to improve the overall performance of practical fall detection systems. Researchers from different fields can benefit from multidisciplinary knowledge and awareness of the latest developments in radar and RGB-D sensors that this paper is discussing

    Revealing More Details: Image Super-Resolution for Real-World Applications

    Get PDF

    Motorcycles that see: Multifocal stereo vision sensor for advanced safety systems in tilting vehicles

    Get PDF
    Advanced driver assistance systems, ADAS, have shown the possibility to anticipate crash accidents and effectively assist road users in critical traffic situations. This is not the case for motorcyclists, in fact ADAS for motorcycles are still barely developed. Our aim was to study a camera-based sensor for the application of preventive safety in tilting vehicles. We identified two road conflict situations for which automotive remote sensors installed in a tilting vehicle are likely to fail in the identification of critical obstacles. Accordingly, we set two experiments conducted in real traffic conditions to test our stereo vision sensor. Our promising results support the application of this type of sensors for advanced motorcycle safety applications

    FULL 3D RECONSTRUCTION OF DYNAMIC NON-RIGID SCENES: ACQUISITION AND ENHANCEMENT

    Get PDF
    Recent advances in commodity depth or 3D sensing technologies have enabled us to move closer to the goal of accurately sensing and modeling the 3D representations of complex dynamic scenes. Indeed, in domains such as virtual reality, security, surveillance and e-health, there is now a greater demand for aff ordable and flexible vision systems which are capable of acquiring high quality 3D reconstructions. Available commodity RGB-D cameras, though easily accessible, have limited fi eld-of-view, and acquire noisy and low-resolution measurements which restricts their direct usage in building such vision systems. This thesis targets these limitations and builds approaches around commodity 3D sensing technologies to acquire noise-free and feature preserving full 3D reconstructions of dynamic scenes containing, static or moving, rigid or non-rigid objects. A mono-view system based on a single RGB-D camera is incapable of acquiring full 360 degrees 3D reconstruction of a dynamic scene instantaneously. For this purpose, a multi-view system composed of several RGB-D cameras covering the whole scene is used. In the first part of this thesis, the domain of correctly aligning the information acquired from RGB-D cameras in a multi-view system to provide full and textured 3D reconstructions of dynamic scenes, instantaneously, is explored. This is achieved by solving the extrinsic calibration problem. This thesis proposes an extrinsic calibration framework which uses the 2D photometric and 3D geometric information, acquired with RGB-D cameras, according to their relative (in)accuracies, a ffected by the presence of noise, in a single weighted bi-objective optimization. An iterative scheme is also proposed, which estimates the parameters of noise model aff ecting both 2D and 3D measurements, and solves the extrinsic calibration problem simultaneously. Results show improvement in calibration accuracy as compared to state-of-art methods. In the second part of this thesis, the domain of enhancement of noisy and low-resolution 3D data acquired with commodity RGB-D cameras in both mono-view and multi-view systems is explored. This thesis extends the state-of-art in mono-view template-free recursive 3D data enhancement which targets dynamic scenes containing rigid-objects, and thus requires tracking only the global motions of those objects for view-dependent surface representation and fi ltering. This thesis proposes to target dynamic scenes containing non-rigid objects which introduces the complex requirements of tracking relatively large local motions and maintaining data organization for view-dependent surface representation. The proposed method is shown to be e ffective in handling non-rigid objects of changing topologies. Building upon the previous work, this thesis overcomes the requirement of data organization by proposing an approach based on view-independent surface representation. View-independence decreases the complexity of the proposed algorithm and allows it the flexibility to process and enhance noisy data, acquired with multiple cameras in a multi-view system, simultaneously. Moreover, qualitative and quantitative experimental analysis shows this method to be more accurate in removing noise to produce enhanced 3D reconstructions of non-rigid objects. Although, extending this method to a multi-view system would allow for obtaining instantaneous enhanced full 360 degrees 3D reconstructions of non-rigid objects, it still lacks the ability to explicitly handle low-resolution data. Therefore, this thesis proposes a novel recursive dynamic multi-frame 3D super-resolution algorithm together with a novel 3D bilateral total variation regularization to filter out the noise, recover details and enhance the resolution of data acquired from commodity cameras in a multi-view system. Results show that this method is able to build accurate, smooth and feature preserving full 360 degrees 3D reconstructions of the dynamic scenes containing non-rigid objects

    Three-dimensional reconstruction optimization of tunnel face and intelligent extraction of discontinuity orientation based on binocular stereo vision

    Get PDF
    In the process of grading and dynamically optimizing the design and construction parameters of the surrounding rock mass of a rock tunnel face, efficiently and accurately acquiring the geometrical parameters of the rock discontinuities is an important basic task. To address the problems of time consuming, low accuracy, and high danger associated with traditional methods of obtaining the structural information of rock mass, this paper proposes a method for three-dimensional reconstruction and intelligent information extraction of tunnel face based on binocular stereo vision (BSV). First, the parallel binocular device with a single camera was improved, calibrated using the checkerboard calibration method. By integrating with the semi-global matching algorithm, the BSV based method for the three-dimensional reconstruction of the rock mass of the tunnel face was optimized. Furthermore, based on the results from on-site engineering applications, this study leveraged two parameters, point cloud density and algorithm runtime, to determine the optimal values for the disparity range and window size parameters within the semi-global stereo matching algorithm. This enhancement improved the performance of the 3D reconstruction method based on binocular stereo vision. Finally, efficient and refined intelligent methods for extracting structural parameters of the rock mass were proposed based on k-nearest neighbor search and kernel density estimation. The research results can provide reliable technical support for the intelligent and efficient acquisition of rock mass structural information in rock tunnel engineering faces

    Eye detection using discriminatory features and an efficient support vector machine

    Get PDF
    Accurate and efficient eye detection has broad applications in computer vision, machine learning, and pattern recognition. This dissertation presents a number of accurate and efficient eye detection methods using various discriminatory features and a new efficient Support Vector Machine (eSVM). This dissertation first introduces five popular image representation methods - the gray-scale image representation, the color image representation, the 2D Haar wavelet image representation, the Histograms of Oriented Gradients (HOG) image representation, and the Local Binary Patterns (LBP) image representation - and then applies these methods to derive five types of discriminatory features. Comparative assessments are then presented to evaluate the performance of these discriminatory features on the problem of eye detection. This dissertation further proposes two discriminatory feature extraction (DFE) methods for eye detection. The first DFE method, discriminant component analysis (DCA), improves upon the popular principal component analysis (PCA) method. The PCA method can derive the optimal features for data representation but not for classification. In contrast, the DCA method, which applies a new criterion vector that is defined on two novel measure vectors, derives the optimal discriminatory features in the whitened PCA space for two-class classification problems. The second DFE method, clustering-based discriminant analysis (CDA), improves upon the popular Fisher linear discriminant (FLD) method. A major disadvantage of the FLD is that it may not be able to extract adequate features in order to achieve satisfactory performance, especially for two-class problems. To address this problem, three CDA models (CDA-1, -2, and -3) are proposed by taking advantage of the clustering technique. For every CDA model anew between-cluster scatter matrix is defined. The CDA method thus can derive adequate features to achieve satisfactory performance for eye detection. Furthermore, the clustering nature of the three CDA models and the nonparametric nature of the CDA-2 and -3 models can further improve the detection performance upon the conventional FLD method. This dissertation finally presents a new efficient Support Vector Machine (eSVM) for eye detection that improves the computational efficiency of the conventional Support Vector Machine (SVM). The eSVM first defines a Θ set that consists of the training samples on the wrong side of their margin derived from the conventional soft-margin SVM. The Θ set plays an important role in controlling the generalization performance of the eSVM. The eSVM then introduces only a single slack variable for all the training samples in the Θ set, and as a result, only a very small number of those samples in the Θ set become support vectors. The eSVM hence significantly reduces the number of support vectors and improves the computational efficiency without sacrificing the generalization performance. A modified Sequential Minimal Optimization (SMO) algorithm is then presented to solve the large Quadratic Programming (QP) problem defined in the optimization of the eSVM. Three large-scale face databases, the Face Recognition Grand challenge (FRGC) version 2 database, the BioID database, and the FERET database, are applied to evaluate the proposed eye detection methods. Experimental results show the effectiveness of the proposed methods that improve upon some state-of-the-art eye detection methods

    Image-Based Approaches to Hair Modeling

    Get PDF
    Hair is a relevant characteristic of virtual characters, therefore the modeling of plausible facial hair and hairstyles is an essential step in the generation of computer generated (CG) avatars. However, the inherent geometric complexity of hair together with the huge number of filaments of an average human head make the task of modeling hairstyles a very challenging one. To date this is commonly a manual process which requires artist skills or very specialized and costly acquisition software. In this work we present an image-based approach to model facial hair (beard and eyebrows) and (head) hairstyles. Since facial hair is usually much shorter than the average head hair two different methods are resented, adapted to the characteristics of the hair to be modeled. Facial hair is modeled using data extracted from facial texture images and missing information is inferred by means of a database-driven prior model. Our hairstyle reconstruction technique employs images of the hair to be modeled taken with a thermal camera. The major advantage of our thermal image-based method over conventional image-based techniques lies on the fact that during data capture the hairstyle is "lit from the inside": the thermal camera captures heat irradiated by the head and actively re-emitted by the hair filaments almost isotropically. Following this approach we can avoid several issues of conventional image-based techniques, like shadowing or anisotropy in reflectance. The presented technique requires minimal user interaction and a simple acquisition setup. Several challenging examples demonstrate the potential of the proposed approach
    corecore