234 research outputs found

    A Learnable Model with Calibrated Uncertainty Quantification for Estimating Canopy Height from Spaceborne Sequential Imagery

    Get PDF
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/.Global-scale canopy height mapping is an important tool for ecosystem monitoring and sustainable forest management. Various studies have demonstrated the ability to estimate canopy height from a single spaceborne multispectral image using end-to-end learning techniques. In addition to texture information of a single-shot image, our study exploits multi temporal information of image sequences to improve estimation accuracy. We adopt a convolutional variant of a long short-term memory (LSTM) model for canopy height estimation from multitemporal instances of Sentinel-2 products. Furthermore, we utilize the deep ensembles technique for meaningful uncertainty estimation on the predictions and postprocessing isotonic regression model for calibrating them. Our lightweight model (∼320k trainable parameters) achieves the mean absolute error (MAE) of 1.29 m in a European test area of 79 km2. It outperforms the state-of-the-art methods based on single-shot spaceborne images as well as costly airborne images while providing additional confidence maps that are shown to be well calibrated. Moreover, the trained model is shown to be transferable in a different country of Europe using a fine-tuning area of as low as ∼2 km2 with MAE = 1.94 m.publishedVersio

    Video metadata extraction in a videoMail system

    Get PDF
    Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving

    Registration and categorization of camera captured documents

    Get PDF
    Camera captured document image analysis concerns with processing of documents captured with hand-held sensors, smart phones, or other capturing devices using advanced image processing, computer vision, pattern recognition, and machine learning techniques. As there is no constrained capturing in the real world, the captured documents suffer from illumination variation, viewpoint variation, highly variable scale/resolution, background clutter, occlusion, and non-rigid deformations e.g., folds and crumples. Document registration is a problem where the image of a template document whose layout is known is registered with a test document image. Literature in camera captured document mosaicing addressed the registration of captured documents with the assumption of considerable amount of single chunk overlapping content. These methods cannot be directly applied to registration of forms, bills, and other commercial documents where the fixed content is distributed into tiny portions across the document. On the other hand, most of the existing document image registration methods work with scanned documents under affine transformation. Literature in document image retrieval addressed categorization of documents based on text, figures, etc. However, the scalability of existing document categorization methodologies based on logo identification is very limited. This dissertation focuses on two problems (i) registration of captured documents where the overlapping content is distributed into tiny portions across the documents and (ii) categorization of captured documents into predefined logo classes that scale to large datasets using local invariant features. A novel methodology is proposed for the registration of user defined Regions Of Interest (ROI) using corresponding local features from their neighborhood. The methodology enhances prior approaches in point pattern based registration, like RANdom SAmple Consensus (RANSAC) and Thin Plate Spline-Robust Point Matching (TPS-RPM), to enable registration of cell phone and camera captured documents under non-rigid transformations. Three novel aspects are embedded into the methodology: (i) histogram based uniformly transformed correspondence estimation, (ii) clustering of points located near the ROI to select only close by regions for matching, and (iii) validation of the registration in RANSAC and TPS-RPM algorithms. Experimental results on a dataset of 480 images captured using iPhone 3GS and Logitech webcam Pro 9000 have shown an average registration accuracy of 92.75% using Scale Invariant Feature Transform (SIFT). Robust local features for logo identification are determined empirically by comparisons among SIFT, Speeded-Up Robust Features (SURF), Hessian-Affine, Harris-Affine, and Maximally Stable Extremal Regions (MSER). Two different matching methods are presented for categorization: matching all features extracted from the query document as a single set and a segment-wise matching of query document features using segmentation achieved by grouping area under intersecting dense local affine covariant regions. The later approach not only gives an approximate location of predicted logo classes in the query document but also helps to increase the prediction accuracies. In order to facilitate scalability to large data sets, inverted indexing of logo class features has been incorporated in both approaches. Experimental results on a dataset of real camera captured documents have shown a peak 13.25% increase in the F–measure accuracy using the later approach as compared to the former

    Quantifying Membrane Topology at the Nanoscale

    Get PDF
    Changes in the shape of cellular membranes are linked with viral replication, Alzheimer\u27s, heart disease and an abundance of other maladies. Some membranous organelles, such as the endoplasmic reticulum and the Golgi, are only 50 nm in diameter. As such, membrane shape changes are conventionally studied with electron microscopy (EM), which preserves cellular ultrastructure and achieves a resolution of 2 nm or better. However, immunolabeling in EM is challenging, and often destroys the cell, making it difficult to study interactions between membranes and other proteins. Additionally, cells must be fixed in EM imaging, making it impossible to study mechanisms of disease. To address these problems, this thesis advances nanoscale imaging and analysis of membrane shape changes and their associated proteins using super-resolution single-molecule localization microscopy. This thesis is divided into three parts. In the first, a novel correlative orientation-independent differential interference contrast (OI-DIC) and single-molecule localization microscopy (SMLM) instrument is designed to address challenges with live-cell imaging of membrane nanostructure. SMLM super-resolution fluorescence techniques image with ~ 20 nm resolution, and are compatible with live-cell imaging. However, due to SMLM\u27s slow imaging speeds, most cell movement is under-sampled. OI-DIC images fast, is gentle enough to be used with living cells and can image cellular structure without labelling, but is diffraction-limited. Combining SMLM with OI-DIC allows for imaging of cellular context that can supplement sparse super-resolution data in real time. The second part of the thesis describes an open-source software package for visualizing and analyzing SMLM data. SMLM imaging yields localization point clouds, which requires non-standard visualization and analysis techniques. Existing techniques are described, and necessary new ones are implemented. These tools are designed to interpret data collected from the OI-DIC/SMLM microscope, as well as from other optical setups. Finally, a tool for extracting membrane structure from SMLM point clouds is described. SMLM data is often noisy, containing multiple localizations per fluorophore and many non-specific localizations. SMLM\u27s resolution reveals labelling discontinuities, which exacerbate sparsity of localizations. It is non-trivial to reconstruct the continuous shape of a membrane from a discrete set of points, and even more difficult in the presence of the noise profile characteristic of most SMLM point clouds. To address this, a surface reconstruction algorithm for extracting continuous surfaces from SMLM data is implemented. This method employs biophysical curvature constraints to improve the accuracy of the surface

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    Structured light assisted real-time stereo photogrammetry for robotics and automation. Novel implementation of stereo matching

    Get PDF
    In this Master’s thesis project a novel implementation of a stereo matching based method is proposed. Moreover, an exhaustive analysis of the state-of-the-art algorithms in that field is outlined. Specifically, both standard and deep learning based methods have been extensively investigated, thus to provide useful insights for the designed implementation. Regarding the developed work, it is basically structured in the following manner. At first a research phase has been carried out, hence to simply and rapidly test the thought strategy. Subsequently, a first implementation of the algorithm has been designed and tested using data available from the Middlebury 2014 dataset, which is one of the most exploited dataset in the computer vision area. At this stage, numerous tests have been completed and consequently various changes to the algorithm pipeline have been made, in order to improve the final result. Finally, after that exhaustive researching phase the actual method has been designed and tested using real environment images obtained from the stereo device developed by the company, in which this work has been produced. Fundamental element of the project is indeed that stereo device. As a matter of fact, the designed algorithm in based on the data produced by the cameras that constitute it. Specifically, the main function of the system designed by LaDiMo is to make the built stereo matching based procedure simultaneously faster and accurate. As a matter of fact one of the main prerogative of the project was to create an algorithm that has to prove potential real-time results. This has been in fact, achieved by applying one of the two methods created. Specifically, it is a lightweight implementation, which strongly exploits the information coming from the LaDiMo device, thus to provide accurate results, keeping the computational time short. At the end of this Master’s thesis images showing the main outcomes obtained are proposed. Moreover, a discussion regarding the further improvements that are going to be added to the project is stated. In fact, the method implemented, being not optimized only demonstrate a potential real-time implementation, which would be certainly achieved through an efficient refactoring of the main pipeline

    Logo recognition in videos: an automated brand analysis system

    Get PDF
    Every year companies spend a sizeable budget on marketing, a large portion of which is spent on advertisement of their product brands on TV broadcasts. These physical advertising artifacts are usually emblazoned with the companies' name, logo, and their trademark brand. Given these astronomical numbers, companies are extremely keen to verify that their brand has the level of visibility they expect for such expenditure. In other words advertisers, in particular, like to verify that their contracts with broadcasters are fulfilled as promised since the price of a commercial depends primarily on the popularity of the show it interrupts or sponsors. Such verifications are essential to major companies in order to justify advertising budgets and ensure their brands achieve the desired level of visibility. Currently, the verification of brand visibility occurs manually by human annotators who view a broadcast and annotate every appearance of a companies' trademark in the broadcast. In this thesis a novel brand logo analysis system which uses shape-based matching and scale invariant feature transform (SIFT) based matching on graphics processing unit (GPU) is proposed developed and tested. The system is described for detection and retrieval of trademark logos appearing in commercial videos. A compact representation of trademark logos and video frame content based on global (shape-based) and local (scale invariant feature transform (SIFT)) feature points is proposed. These representations can be used to robustly detect, recognize, localize, and retrieve trademarks as they appear in a variety of different commercial video types. Classification of trademarks is performed by using shaped-based matching and matching a set of SIFT feature descriptors for each trademark instance against the set of SIFT features detected in each frame of the video. Our system can automatically recognize the logos in video frames in order to summarize the logo content of the broadcast with the detected size, position and score. The output of the system can be used to summarize or check the time and duration of commercial video blocks on broadcast or on a DVD. Experimental results are provided, along with an analysis of the processed frames. Results show that our proposed technique is efficient and effectively recognizes and classifies trademark logos

    Single Particle Tracking: Analysis Techniques for Live Cell Nanoscopy.

    Get PDF
    Single molecule experiments are a set of experiments designed specifically to study the properties of individual molecules. It has only been in the last three decades where single molecule experiments have been applied to the life sciences; where they have been successfully implemented in systems biology for probing the behaviors of sub-cellular mechanisms. The advent and growth of super-resolution techniques in single molecule experiments has made the fundamental behaviors of light and the associated nano-probes a necessary concern among life scientists wishing to advance the state of human knowledge in biology. This dissertation disseminates some of the practices learned in experimental live cell microscopy. The topic of single particle tracking is addressed here in a format that is designed for the physicist who embarks upon single molecule studies. Specifically, the focus is on the necessary procedures to generate single particle tracking analysis techniques that can be implemented to answer biological questions. These analysis techniques range from designing and testing a particle tracking algorithm to inferring model parameters once an image has been processed. The intellectual contributions of the author include the techniques in diffusion estimation, localization filtering, and trajectory associations for tracking which will all be discussed in detail in later chapters. The author of this thesis has also contributed to the software development of automated gain calibration, live cell particle simulations, and various single particle tracking packages. Future work includes further evaluation of this laboratory\u27s single particle tracking software, entropy based approaches towards hypothesis validations, and the uncertainty quantification of gain calibration

    Automatic human face detection in color images

    Get PDF
    Automatic human face detection in digital image has been an active area of research over the past decade. Among its numerous applications, face detection plays a key role in face recognition system for biometric personal identification, face tracking for intelligent human computer interface (HCI), and face segmentation for object-based video coding. Despite significant progress in the field in recent years, detecting human faces in unconstrained and complex images remains a challenging problem in computer vision. An automatic system that possesses a similar capability as the human vision system in detecting faces is still a far-reaching goal. This thesis focuses on the problem of detecting human laces in color images. Although many early face detection algorithms were designed to work on gray-scale Images, strong evidence exists to suggest face detection can be done more efficiently by taking into account color characteristics of the human face. In this thesis, we present a complete and systematic face detection algorithm that combines the strengths of both analytic and holistic approaches to face detection. The algorithm is developed to detect quasi-frontal faces in complex color Images. This face class, which represents typical detection scenarios in most practical applications of face detection, covers a wide range of face poses Including all in-plane rotations and some out-of-plane rotations. The algorithm is organized into a number of cascading stages including skin region segmentation, face candidate selection, and face verification. In each of these stages, various visual cues are utilized to narrow the search space for faces. In this thesis, we present a comprehensive analysis of skin detection using color pixel classification, and the effects of factors such as the color space, color classification algorithm on segmentation performance. We also propose a novel and efficient face candidate selection technique that is based on color-based eye region detection and a geometric face model. This candidate selection technique eliminates the computation-intensive step of window scanning often employed In holistic face detection, and simplifies the task of detecting rotated faces. Besides various heuristic techniques for face candidate verification, we developface/nonface classifiers based on the naive Bayesian model, and investigate three feature extraction schemes, namely intensity, projection on face subspace and edge-based. Techniques for improving face/nonface classification are also proposed, including bootstrapping, classifier combination and using contextual information. On a test set of face and nonface patterns, the combination of three Bayesian classifiers has a correct detection rate of 98.6% at a false positive rate of 10%. Extensive testing results have shown that the proposed face detector achieves good performance in terms of both detection rate and alignment between the detected faces and the true faces. On a test set of 200 images containing 231 faces taken from the ECU face detection database, the proposed face detector has a correct detection rate of 90.04% and makes 10 false detections. We have found that the proposed face detector is more robust In detecting in-plane rotated laces, compared to existing face detectors. +D2
    corecore