212 research outputs found

    Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision

    Get PDF
    PhDRecent advances in robust subspace estimation have made dimensionality reduction and noise and outlier suppression an area of interest for research, along with continuous improvements in computer vision applications. Due to the nature of image and video signals that need a high dimensional representation, often storage, processing, transmission, and analysis of such signals is a difficult task. It is therefore desirable to obtain a low-dimensional representation for such signals, and at the same time correct for corruptions, errors, and outliers, so that the signals could be readily used for later processing. Major recent advances in low-rank modelling in this context were initiated by the work of Cand`es et al. [17] where the authors provided a solution for the long-standing problem of decomposing a matrix into low-rank and sparse components in a Robust Principal Component Analysis (RPCA) framework. However, for computer vision applications RPCA is often too complex, and/or may not yield desirable results. The low-rank component obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks lower dimensional representations are required. The RPCA has the ability to robustly estimate noise and outliers and separate them from the low-rank component, by a sparse part. But, it has no mechanism of providing an insight into the structure of the sparse solution, nor a way to further decompose the sparse part into a random noise and a structured sparse component that would be advantageous in many computer vision tasks. As videos signals are usually captured by a camera that is moving, obtaining a low-rank component by RPCA becomes impossible. In this thesis, novel Approximated RPCA algorithms are presented, targeting different shortcomings of the RPCA. The Approximated RPCA was analysed to identify the most time consuming RPCA solutions, and replace them with simpler yet tractable alternative solutions. The proposed method is able to obtain the exact desired rank for the low-rank component while estimating a global transformation to describe camera-induced motion. Furthermore, it is able to decompose the sparse part into a foreground sparse component, and a random noise part that contains no useful information for computer vision processing. The foreground sparse component is obtained by several novel structured sparsity-inducing norms, that better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for reducing complexity of low-rank estimation have been proposed that achieve significant complexity reduction without sacrificing the visual representation of video and image information. The proposed algorithms are applied to several fundamental computer vision tasks, namely, high efficiency video coding, batch image alignment, inpainting, and recovery, video stabilisation, background modelling and foreground segmentation, robust subspace clustering and motion estimation, face recognition, and ultra high definition image and video super-resolution. The algorithms proposed in this thesis including batch image alignment and recovery, background modelling and foreground segmentation, robust subspace clustering and motion segmentation, and ultra high definition image and video super-resolution achieve either state-of-the-art or comparable results to existing methods

    Human Pose Tracking from Monocular Image Sequences

    Get PDF
    This thesis proposes various novel approaches for improving the performance of automatic 2D human pose tracking system including multi-scale strategy, mid-level spatial dependencies to constrain more relations of multiple body parts, additional constraints between symmetric body parts and the left/right confusion correction by a head orientation estimator. These proposed approaches are employed to develop a complete human pose tracking system. The experimental results demonstrate significant improvements of all the proposed approaches towards accuracy and efficiency

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Object Tracking Based on Satellite Videos: A Literature Review

    Get PDF
    Video satellites have recently become an attractive method of Earth observation, providing consecutive images of the Earth’s surface for continuous monitoring of specific events. The development of on-board optical and communication systems has enabled the various applications of satellite image sequences. However, satellite video-based target tracking is a challenging research topic in remote sensing due to its relatively low spatial and temporal resolution. Thus, this survey systematically investigates current satellite video-based tracking approaches and benchmark datasets, focusing on five typical tracking applications: traffic target tracking, ship tracking, typhoon tracking, fire tracking, and ice motion tracking. The essential aspects of each tracking target are summarized, such as the tracking architecture, the fundamental characteristics, primary motivations, and contributions. Furthermore, popular visual tracking benchmarks and their respective properties are discussed. Finally, a revised multi-level dataset based on WPAFB videos is generated and quantitatively evaluated for future development in the satellite video-based tracking area. In addition, 54.3% of the tracklets with lower Difficulty Score (DS) are selected and renamed as the Easy group, while 27.2% and 18.5% of the tracklets are grouped into the Medium-DS group and the Hard-DS group, respectively

    Laser Scanner Technology

    Get PDF
    Laser scanning technology plays an important role in the science and engineering arena. The aim of the scanning is usually to create a digital version of the object surface. Multiple scanning is sometimes performed via multiple cameras to obtain all slides of the scene under study. Usually, optical tests are used to elucidate the power of laser scanning technology in the modern industry and in the research laboratories. This book describes the recent contributions reported by laser scanning technology in different areas around the world. The main topics of laser scanning described in this volume include full body scanning, traffic management, 3D survey process, bridge monitoring, tracking of scanning, human sensing, three-dimensional modelling, glacier monitoring and digitizing heritage monuments

    Acquisition, Processing, and Analysis of Video, Audio and Meteorological Data in Multi-Sensor Electronic Beehive Monitoring

    Get PDF
    In recent years, a widespread decline has been seen in honey bee population and this is widely attributed to colony collapse disorder. Hence, it is of utmost importance that a system is designed to gather relevant information. This will allow for a deeper understanding of the possible reasons behind the above phenomenon to aid in the design of suitable countermeasures. Electronic Beehive Monitoring is one such way of gathering critical information regarding a colony’s health and behavior without invasive beehive inspections. In this dissertation, we have presented an electronic beehive monitoring system called BeePi that can be placed on top of a super and requires no structural modifications to a standard beehive (Langstroth or Dadant beehive), thereby preserving the sacredness of the bee space without disturbing the natural beehive cycles. The system is capable of capturing videos of forager traffic through a camera placed over the landing pad. Audio of bee buzzing is also recorded through microphones attached outside just above the landing pad. The above sensors are connected to a low-cost raspberry pi computer, and the data is saved on the raspberry pi itself or an external hard drive. In this dissertation, we have developed an algorithm that analyzes those video recordings and returns the number of bees that have moved in each video. The algorithm is also able to distinguish between incoming, outgoing, and lateral bee movements. We believe this would help commercial and amateur beekeepers or even citizen scientists to observe the bee traffic near their respective hives to identify the state of the corresponding bee colonies. This information helps those mentioned above because it is believed that honeybee traffic carries information on colony behavior and phenology. Next, we analyzed the audio recordings and presented a system that can classify those recordings into bee buzzing, cricket chirping, and ambient noise. We later saw how a long–term analysis of the intensity of bee buzzing could help us understand the hive’s development through an entire beekeeping season. We also investigated the effect of local weather conditions using 21 different meteorological variables on the forager traffic. We collected the meteorological data from a weather station located on the campus of Utah State University. Through our study, we were able to show that without the use of additional costly intrusive hardware to count the bees, we can use our bee motion counting algorithm to calculate the bee motions and then use the counts to investigate the relationship between foraging activity and local weather. To ensure that our findings and algorithms can be reproduced, we have made our datasets and source codes public for interested research and citizen science communities

    Fruit Detection and Tree Segmentation for Yield Mapping in Orchards

    Get PDF
    Accurate information gathering and processing is critical for precision horticulture, as growers aim to optimise their farm management practices. An accurate inventory of the crop that details its spatial distribution along with health and maturity, can help farmers efficiently target processes such as chemical and fertiliser spraying, crop thinning, harvest management, labour planning and marketing. Growers have traditionally obtained this information by using manual sampling techniques, which tend to be labour intensive, spatially sparse, expensive, inaccurate and prone to subjective biases. Recent advances in sensing and automation for field robotics allow for key measurements to be made for individual plants throughout an orchard in a timely and accurate manner. Farmer operated machines or unmanned robotic platforms can be equipped with a range of sensors to capture a detailed representation over large areas. Robust and accurate data processing techniques are therefore required to extract high level information needed by the grower to support precision farming. This thesis focuses on yield mapping in orchards using image and light detection and ranging (LiDAR) data captured using an unmanned ground vehicle (UGV). The contribution is the framework and algorithmic components for orchard mapping and yield estimation that is applicable to different fruit types and orchard configurations. The framework includes detection of fruits in individual images and tracking them over subsequent frames. The fruit counts are then associated to individual trees, which are segmented from image and LiDAR data, resulting in a structured spatial representation of yield. The first contribution of this thesis is the development of a generic and robust fruit detection algorithm. Images captured in the outdoor environment are susceptible to highly variable external factors that lead to significant appearance variations. Specifically in orchards, variability is caused by changes in illumination, target pose, tree types, etc. The proposed techniques address these issues by using state-of-the-art feature learning approaches for image classification, while investigating the utility of orchard domain knowledge for fruit detection. Detection is performed using both pixel-wise classification of images followed instance segmentation, and bounding-box regression approaches. The experimental results illustrate the versatility of complex deep learning approaches over a multitude of fruit types. The second contribution of this thesis is a tree segmentation approach to detect the individual trees that serve as a standard unit for structured orchard information systems. The work focuses on trellised trees, which present unique challenges for segmentation algorithms due to their intertwined nature. LiDAR data are used to segment the trellis face, and to generate proposals for individual trees trunks. Additional trunk proposals are provided using pixel-wise classification of the image data. The multi-modal observations are fine-tuned by modelling trunk locations using a hidden semi-Markov model (HSMM), within which prior knowledge of tree spacing is incorporated. The final component of this thesis addresses the visual occlusion of fruit within geometrically complex canopies by using a multi-view detection and tracking approach. Single image fruit detections are tracked over a sequence of images, and associated to individual trees or farm rows, with the spatial distribution of the fruit counting forming a yield map over the farm. The results show the advantage of using multi-view imagery (instead of single view analysis) for fruit counting and yield mapping. This thesis includes extensive experimentation in almond, apple and mango orchards, with data captured by a UGV spanning a total of 5 hectares of farm area, over 30 km of vehicle traversal and more than 7,000 trees. The validation of the different processes is performed using manual annotations, which includes fruit and tree locations in image and LiDAR data respectively. Additional evaluation of yield mapping is performed by comparison against fruit counts on trees at the farm and counts made by the growers post-harvest. The framework developed in this thesis is demonstrated to be accurate compared to ground truth at all scales of the pipeline, including fruit detection and tree mapping, leading to accurate yield estimation, per tree and per row, for the different crops. Through the multitude of field experiments conducted over multiple seasons and years, the thesis presents key practical insights necessary for commercial development of an information gathering system in orchards

    Scene text localization and recognition in images and videos

    Get PDF
    Scene Text Localization and Recognition methods nd all areas in an image or a video that would be considered as text by a human, mark boundaries of the areas and output a sequence of characters associated with its content. They are used to process images and videos taken by a digital camera or a mobile phone and to \read" the content of each text area into a digital format, typically a list of Unicode character sequences, that can be processed in further applications. Three di erent methods for Scene Text Localization and Recognition were proposed in the course of the research, each one advancing the state of the art and improving the accuracy. The rst method detects individual characters as Extremal Regions (ER), where the probability of each ER being a character is estimated using novel features with O(1) complexity and only ERs with locally maximal probability are selected across several image projections for the second stage, where the classi cation is improved using more computationally expensive features. The method was the rst published method to address the complete problem of scene text localization and recognition as a whole - all previous work in the literature focused solely on di erent subproblems. Secondly, a novel easy-to-implement stroke detector was proposed. The detector is signi cantly faster and produces signi cantly less false detections than the commonly used ER detector. The detector e ciently produces character strokes segmentations, which are exploited in a subsequent classi cation phase based on features e ectively calculated as part of the segmentation process. Additionally, an e cient text clustering algorithm based on text direction voting is proposed, which as well as the previous stages is scale- and rotation- invariant and supports wide variety of scripts and fonts. The third method exploits a deep-learning model, which is trained for both text detection and recognition in a single trainable pipeline. The method localizes and recognizes text in an image in a single feed-forward pass, it is trained purely on synthetic data so it does not require obtaining expensive human annotations for training and it achieves state-of-the-art accuracy in the end-to-end text recognition on two standard datasets, whilst being an order of magnitude faster than the previous methods - the whole pipeline runs at 10 frames per second.Katedra kybernetik

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field

    Expert System with an Embedded Imaging Module for Diagnosing Lung Diseases

    Get PDF
    Lung diseases are one of the major causes of suffering and death in the world. Improved survival rate could be obtained if the diseases can be detected at its early stage. Specialist doctors with the expertise and experience to interpret medical images and diagnose complex lung diseases are scarce. In this work, a rule-based expert system with an embedded imaging module is developed to assist the general physicians in hospitals and clinics to diagnose lung diseases whenever the services of specialist doctors are not available. The rule-based expert system contains a large knowledge base of data from various categories such as patient's personal and medical history, clinical symptoms, clinical test results and radiological information. An imaging module is integrated into the expert system for the enhancement of chest X-Ray images. The goal of this module is to enhance the chest X-Ray images so that it can provide details similar to more expensive methods such as MRl and CT scan. A new algorithm which is a modified morphological grayscale top hat transform is introduced to increase the visibility of lung nodules in chest X-Rays. Fuzzy inference technique is used to predict the probability of malignancy of the nodules. The output generated by the expert system was compared with the diagnosis made by the specialist doctors. The system is able to produce results\ud which are similar to the diagnosis made by the doctors and is acceptable by clinical standards
    corecore