20 research outputs found

    Accuracy analysis of 3D object reconstruction using RGB-D sensor

    Get PDF
    In this paper, we propose a new method for 3D object reconstruction using RGB-D sensor. The RGB-D sensor provides RGB images as well as depth images. Since the depth and RGB color images are captured with one sensor of a RGB-D camera placed in different locations, the depth image should be related to the color image. After matching of the images (registration), point-to-point corresponding between two images is found, and they can be combined and represented in the 3D space. In order to obtain a dense 3D map of the 3D object, we design an algorithm for merging information from all used cameras. First, features extracted from color and depth images are used to localize them in a 3D scene. Next, Iterative Closest Point (ICP) algorithm is used to align all frames. As a result, a new frame is added to the dense 3D model. However, the spatial distribution and resolution of depth data affect to the performance of 3D scene reconstruction system based on ICP. The presented computer simulation results show an improvement in accuracy of 3D object reconstruction using real data.This work was supported by the Russian Science Foundation, grant no. 17-76-20045

    Understanding the message of images

    Full text link
    We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that have previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image or image-text understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a widecoverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries

    Stable Backward Diffusion Models that Minimise Convex Energies

    Get PDF
    The inverse problem of backward diffusion is known to be ill-posed and highly unstable. Backward diffusion processes appear naturally in image enhancement and deblurring applications. It is therefore greatly desirable to establish a backward diffusion model which implements a smart stabilisation approach that can be used in combination with an easy to handle numerical scheme. So far, existing stabilisation strategies in literature require sophisticated numerics to solve the underlying initial value problem. We derive a class of space-discrete one-dimensional backward diffusion as gradient descent of energies where we gain stability by imposing range constraints. Interestingly, these energies are even convex. Furthermore, we establish a comprehensive theory for the time-continuous evolution and we show that stability carries over to a simple explicit time discretisation of our model. Finally, we confirm the stability and usefulness of our technique in experiments in which we enhance the contrast of digital greyscale and colour images

    Machine Learning Architectures for Video Annotation and Retrieval

    Get PDF
    PhDIn this thesis we are designing machine learning methodologies for solving the problem of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc queries. Concept-based video annotation refers to the annotation of video fragments with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain objects, activities, locations etc., and combinations of the former. Our contributions are: i) A thorough analysis on extending and using different local descriptors towards improved concept-based video annotation and a stacking architecture that uses in the first layer, concept classifiers trained on local descriptors and improves their prediction accuracy by implicitly capturing concept relations, in the last layer of the stack. ii) A cascade architecture that orders and combines many classifiers, trained on different visual descriptors, for the same concept. iii) A deep learning architecture that exploits concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn concept-specific representations that are sparse, linear combinations of representations of latent concepts. At a second level, we build on ideas from structured output learning, and propose the introduction, at training time, of a new cost term that explicitly models the correlations between the concepts. By doing so, we explicitly model the structure in the output space (i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that combines concept-based video annotation and textual query analysis, and transforms concept-based keyframe and query representations into a common semantic embedding space. Our architectures have been extensively evaluated on the TRECVID SIN 2013, the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness compared to other similar approaches

    Analysis and Modular Approach for Text Extraction from Scientific Figures on Limited Data

    Get PDF
    Scientific figures are widely used as compact, comprehensible representations of important information. The re-usability of these figures is however limited, as one can rarely search directly for them, since they are mostly indexing by their surrounding text (e. g., publication or website) which often does not contain the full-message of the figure. In this thesis, the focus is on making the content of scientific figures accessible by extracting the text from these figures. A modular pipeline for unsupervised text extraction from scientific figures, based on a thorough analysis of the literature, was built to address the problem. This modular pipeline was used to build several unsupervised approaches, to evaluate different methods from the literature and new methods and method combinations. Some supervised approaches were built as well for comparison. One challenge, while evaluating the approaches, was the lack of annotated data, which especially needed to be considered when building the supervised approach. Three existing datasets were used for evaluation as well as two datasets of 241 scientific figures which were manually created and annotated. Additionally, two existing datasets for text extraction from other types of images were used for pretraining the supervised approach. Several experiments showed the superiority of the unsupervised pipeline over common Optical Character Recognition engines and identified the best unsupervised approach. This unsupervised approach was compared with the best supervised approach, which, despite of the limited amount of training data available, clearly outperformed the unsupervised approach.Infografiken sind ein viel verwendetes Medium zur kompakten Darstellung von Kernaussagen. Die Nachnutzbarkeit dieser Abbildungen ist jedoch häufig limitiert, da sie schlecht auffindbar sind, da sie meist über die umschließenden Medien, wie beispielsweise Publikationen oder Webseiten, und nicht über ihren Inhalt indexiert sind. Der Fokus dieser Arbeit liegt auf der Extraktion der textuellen Inhalte aus Infografiken, um deren Inhalt zu erschließen. Ausgehend von einer umfangreichen Analyse verwandter Arbeiten, wurde ein generalisierender, modularer Ansatz für die unüberwachte Textextraktion aus wissenschaftlichen Abbildungen entwickelt. Mit diesem modularen Ansatz wurden mehrere unüberwachte Ansätze und daneben auch noch einige überwachte Ansätze umgesetzt, um diverse Methoden aus der Literatur sowie neue und bisher noch nicht genutzte Methoden zu vergleichen. Eine Herausforderung bei der Evaluation war die geringe Menge an annotierten Abbildungen, was insbesondere beim überwachten Ansatz Methoden berücksichtigt werden musste. Für die Evaluation wurden drei existierende Datensätze verwendet und zudem wurden zusätzlich zwei Datensätze mit insgesamt 241 Infografiken erstellt und mit den nötigen Informationen annotiert, sodass insgesamt 5 Datensätze für die Evaluation verwendet werden konnten. Für das Pre-Training des überwachten Ansatzes wurden zudem zwei Datensätze aus verwandten Textextraktionsbereichen verwendet. In verschiedenen Experimenten wird gezeigt, dass der unüberwachte Ansatz besser funktioniert als klassische Texterkennungsverfahren und es wird aus den verschiedenen unüberwachten Ansätzen der beste ermittelt. Dieser unüberwachte Ansatz wird mit dem überwachten Ansatz verglichen, der trotz begrenzter Trainingsdaten die besten Ergebnisse liefert
    corecore