13 research outputs found

    Распределение значений локальной кривизны как структурный признак для off-line верификации рукописной подписи

    Get PDF
    In the paper, a new feature for describing a digital image of a handwritten signature based on the frequency distribution of the values of the local curvature of the signature contours, is proposed. The calculation of this feature on the binary image of a signature is described in detail. A normalized histogram of distributions of local curvature values for 40 bins is formed. The frequency values recorded as a 40-dimensional vector are called the local curvature code of the signature.During verification, the proximity of signature pairs is determined by correlation between curvature codes and LBP codes described by the authors in [23]. To perform the signature verification procedure, a two-dimensional feature space is constructed containing images of the proximity of signature pairs. When verifying a signature with N authentic signatures of the same person, N(N-1)/2 patterns of the proximity of pairs of genuine signatures and N images of pairs of proximity of the analyzed signature with genuine signatures are presented in the feature space. The Support Vector Machine (SVM) is used as a classifier.Experimental studies were carried out on digitized images of genuine and fake signatures from two databases. The accuracy of automatic verification of signatures on the publicly available CEDAR database was 99,77 % and on TUIT was 88,62 %.В работе предложен новый признак описания цифрового изображения рукописной подписи на базе частотного распределения значений локальной кривизны контуров этой подписи. Подробно описывается вычисление этого признака на бинарном изображении подписи. Формируется нормализованная гистограмма распределений значений локальной кривизны для 40 интервалов. Частотные значения, записанные в виде 40-мерного вектора, названы кодом локальной кривизны подписи.При верификации близость двух подписей определяется корреляцией между кодами кривизны и LBP-кодами, описанными авторами в работе [23]. Для выполнения процедуры верификации подписи строится двумерное признаковое пространство, содержащее образы корреляционной близости пар подписей. При верификации подписи с N подлинными подписями этого же человека в признаковом пространстве представлено N(N-1)/2 образов близости пар подлинных подписей и N образов пар близости анализируемой подписи с подлинными. В качестве классификатора используется машина опорных векторов (SVM).Экспериментальные исследования выполнены на оцифрованных изображениях подлинных и фальшивых подписей из двух баз. Точность автоматической верификации подписей на общедоступной базе CEDAR составила 99,77 %, а на базе TUIT 88,62 %

    Распределение значений локальной кривизны как структурный признак для off-line верификации рукописной подписи

    Get PDF
    В работе предложен новый признак описания цифрового изображения рукописной подписи на базе частотного распределения значений локальной кривизны контуров этой подписи. Подробно описывается вычисление этого признака на бинарном изображении подписи. Формируется нормализованная гистограмма распределений значений локальной кривизны для 40 интервалов. Частотные значения, записанные в виде 40-мерного вектора, названы кодом локальной кривизны подписи. При верификации близость двух подписей определяется корреляцией между кодами кривизны и LBP-кодами, описанными авторами в работе [23]. Для выполнения процедуры верификации подписи строится двумерное признаковое пространство, содержащее образы корреляционной близости пар подписей. При верификации подписи с N подлинными подписями этого же человека в признаковом пространстве представлено N(N-1)/2 образов близости пар подлинных подписей и N образов пар близости анализируемой подписи с подлинными. В качестве классификатора используется машина опорных векторов (SVM). Экспериментальные исследования выполнены на оцифрованных изображениях подлинных и фальшивых подписей из двух баз. Точность автоматической верификации подписей на общедоступной базе CEDAR составила 99,77 %, а на базе TUIT 88,62 %

    The Bubble Box: Towards an Automated Visual Sensor for 3D Analysis and Characterization of Marine Gas Release Sites

    Get PDF
    Several acoustic and optical techniques have been used for characterizing natural and anthropogenic gas leaks (carbon dioxide, methane) from the ocean floor. Here, single-camera based methods for bubble stream observation have become an important tool, as they help estimating flux and bubble sizes under certain assumptions. However, they record only a projection of a bubble into the camera and therefore cannot capture the full 3D shape, which is particularly important for larger, non-spherical bubbles. The unknown distance of the bubble to the camera (making it appear larger or smaller than expected) as well as refraction at the camera interface introduce extra uncertainties. In this article, we introduce our wide baseline stereo-camera deep-sea sensor bubble box that overcomes these limitations, as it observes bubbles from two orthogonal directions using calibrated cameras. Besides the setup and the hardware of the system, we discuss appropriate calibration and the different automated processing steps deblurring, detection, tracking, and 3D fitting that are crucial to arrive at a 3D ellipsoidal shape and rise speed of each bubble. The obtained values for single bubbles can be aggregated into statistical bubble size distributions or fluxes for extrapolation based on diffusion and dissolution models and large scale acoustic surveys. We demonstrate and evaluate the wide baseline stereo measurement model using a controlled test setup with ground truth information

    Movie101: A New Movie Understanding Benchmark

    Full text link
    To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for automatic systems to meet the needs of real application scenarios. To narrow this gap, we construct a large-scale Chinese movie benchmark, named Movie101. Closer to real scenarios, the Movie Clip Narrating (MCN) task in our benchmark asks models to generate role-aware narration paragraphs for complete movie clips where no actors are speaking. External knowledge, such as role information and movie genres, is also provided for better movie understanding. Besides, we propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation, which achieves the best correlation with human evaluation. Our benchmark also supports the Temporal Narration Grounding (TNG) task to investigate clip localization given text descriptions. For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines. The dataset and codes are released at https://github.com/yuezih/Movie101.Comment: Accepted to ACL 202

    A Siamese transformer network for zero-shot ancient coin classification

    Get PDF
    Ancient numismatics, the study of ancient coins, has in recent years become an attractive domain for the application of computer vision and machine learning. Though rich in research problems, the predominant focus in this area to date has been on the task of attributing a coin from an image, that is of identifying its issue. This may be considered the cardinal problem in the field and it continues to challenge automatic methods. In the present paper, we address a number of limitations of previous work. Firstly, the existing methods approach the problem as a classification task. As such, they are unable to deal with classes with no or few exemplars (which would be most, given over 50,000 issues of Roman Imperial coins alone), and require retraining when exemplars of a new class become available. Hence, rather than seeking to learn a representation that distinguishes a particular class from all the others, herein we seek a representation that is overall best at distinguishing classes from one another, thus relinquishing the demand for exemplars of any specific class. This leads to our adoption of the paradigm of pairwise coin matching by issue, rather than the usual classification paradigm, and the specific solution we propose in the form of a Siamese neural network. Furthermore, while adopting deep learning, motivated by its successes in the field and its unchallenged superiority over classical computer vision approaches, we also seek to leverage the advantages that transformers have over the previously employed convolutional neural networks, and in particular their non-local attention mechanisms, which ought to be particularly useful in ancient coin analysis by associating semantically but not visually related distal elements of a coin’s design. Evaluated on a large data corpus of 14,820 images and 7605 issues, using transfer learning and only a small training set of 542 images of 24 issues, our Double Siamese ViT model is shown to surpass the state of the art by a large margin, achieving an overall accuracy of 81%. Moreover, our further investigation of the results shows that the majority of the method’s errors are unrelated to the intrinsic aspects of the algorithm itself, but are rather a consequence of unclean data, which is a problem that can be easily addressed in practice by simple pre-processing and quality checking.Publisher PDFPeer reviewe

    Improving Road Surface Area Extraction via Semantic Segmentation with Conditional Generative Learning for Deep Inpainting Operations

    Get PDF
    The road surface area extraction task is generally carried out via semantic segmentation over remotely-sensed imagery. However, this supervised learning task is often costly as it requires remote sensing images labelled at the pixel level, and the results are not always satisfactory (presence of discontinuities, overlooked connection points, or isolated road segments). On the other hand, unsupervised learning does not require labelled data and can be employed for post-processing the geometries of geospatial objects extracted via semantic segmentation. In this work, we implement a conditional Generative Adversarial Network to reconstruct road geometries via deep inpainting procedures on a new dataset containing unlabelled road samples from challenging areas present in official cartographic support from Spain. The goal is to improve the initial road representations obtained with semantic segmentation models via generative learning. The performance of the model was evaluated on unseen data by conducting a metrical comparison where a maximum Intersection over Union (IoU) score improvement of 1.3% was observed when compared to the initial semantic segmentation result. Next, we evaluated the appropriateness of applying unsupervised generative learning using a qualitative perceptual validation to identify the strengths and weaknesses of the proposed method in very complex scenarios and gain a better intuition of the model’s behaviour when performing large-scale post-processing with generative learning and deep inpainting procedures and observed important improvements in the generated data

    Wahrnehmungsgrenzen kleiner Verformungen auf spiegelnden Oberflächen

    Get PDF
    Es werden zwei Modelle entwickelt und evaluiert, die Wahrnehmungsgrenzen kleiner Formabweichungen auf spiegelnden Oberflächen beschreiben. Beide Modelle berücksichtigen die gespiegelte Umgebung und führen die Wahrnehmungsgrenzen auf die Winkelauflösung des menschlichen Auges zurück. Ein weiteres Ziel dieser Arbeit ist die Erweiterung der Modelle auf Oberflächenrauheiten und -welligkeiten. Die vorhergesagten Wahrnehmungsgrenzen werden mit den Daten zweier Wahrnehmungsstudien verglichen

    3D data fusion from multiple sensors and its applications

    Get PDF
    The introduction of depth cameras in the mass market contributed to make computer vision applicable to many real world applications, such as human interaction in virtual environments, autonomous driving, robotics and 3D reconstruction. All these problems were originally tackled by means of standard cameras, but the intrinsic ambiguity in the bidimensional images led to the development of depth cameras technologies. Stereo vision was first introduced to provide an estimate of the 3D geometry of the scene. Structured light depth cameras were developed to use the same concepts of stereo vision but overcome some of the problems of passive technologies. Finally, Time-of-Flight (ToF) depth cameras solve the same depth estimation problem by using a different technology. This thesis focuses on the acquisition of depth data from multiple sensors and presents techniques to efficiently combine the information of different acquisition systems. The three main technologies developed to provide depth estimation are first reviewed, presenting operating principles and practical issues of each family of sensors. The use of multiple sensors then is investigated, providing practical solutions to the problem of 3D reconstruction and gesture recognition. Data from stereo vision systems and ToF depth cameras are combined together to provide a higher quality depth map. A confidence measure of depth data from the two systems is used to guide the depth data fusion. The lack of datasets with data from multiple sensors is addressed by proposing a system for the collection of data and ground truth depth, and a tool to generate synthetic data from standard cameras and ToF depth cameras. For gesture recognition, a depth camera is paired with a Leap Motion device to boost the performance of the recognition task. A set of features from the two devices is used in a classification framework based on Support Vector Machines and Random Forests
    corecore