159 research outputs found

    Robust and flexible multi-scale medial axis computation

    Get PDF
    The principle of the multi-scale medial axis (MMA) is important in that any object is detected at a blurring scale proportional to the size of the object. Thus it provides a sound balance between noise removal and preserving detail. The robustness of the MMA has been reflected in many existing applications in object segmentation, recognition, description and registration. This thesis aims to improve the computational aspects of the MMA. The MMA is obtained by computing ridges in a “medialness” scale-space derived from an image. In computing the medialness scale-space, we propose an edge-free medialness algorithm, the Concordance-based Medial Axis Transform (CMAT). It not only depends on the symmetry of the positions of boundaries, but also is related to the symmetry of the intensity contrasts at boundaries. Therefore it excludes spurious MMA branches arising from isolated boundaries. In addition, the localisation accuracy for the position and width of an object, as well as the robustness under noisy conditions, is preserved in the CMAT. In computing ridges in the medialness space, we propose the sliding window algorithm for extracting locally optimal scale ridges. It is simple and efficient in that it can readily separate the scale dimension from the search space but avoids the difficult task of constructing surfaces of connected maxima. It can extract a complete set of MMA for interfering objects in scale-space, e.g. embedded or adjacent objects. These algorithms are evaluated using a quantitative study of their performance for 1-D signals and qualitative testing on 2-D images

    Analysis Of Behaviors In Crowd Videos

    Get PDF
    In this dissertation, we address the problem of discovery and representation of group activity of humans and objects in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a discriminative representation of human motion in social settings, which captures a wide variety of human activities observable in video sequences. Such motion emerges from the collective behavior of individuals and their interactions and is a significant source of information typically employed for applications such as event detection, behavior recognition, and activity recognition. We present new representations of human group motion for static cameras, and propose algorithms for their application to variety of problems. We first propose a method to model and learn the scene activity of a crowd using Social Force Model for the first time in the computer vision community. We present a method to densely estimate the interaction forces between people in a crowd, observed by a static camera. Latent Dirichlet Allocation (LDA) is used to learn the model of the normal activities over extended periods of time. Randomly selected spatio-temporal volumes of interaction forces are used to learn the model of normal behavior of the scene. The model encodes the latent topics of social interaction forces in the scene for normal behaviors. We classify a short video sequence of n frames as normal or abnormal by using the learnt model. Once a sequence of frames is classified as an abnormal, iii the regions of anomalies in the abnormal frames are localized using the magnitude of interaction forces. The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a global estimation of the interaction forces within the crowd. It, therefore, is incapable of identifying different groups of objects based on motion or behavior in the scene. Although the algorithm is capable of learning the normal behavior and detects the abnormality, but it is incapable of capturing the dynamics of different behaviors. To overcome these limitations, we then propose a method based on the Lagrangian framework for fluid dynamics, by introducing a streakline representation of flow. Streaklines are traced in a fluid flow by injecting color material, such as smoke or dye, which is transported with the flow and used for visualization. In the context of computer vision, streaklines may be used in a similar way to transport information about a scene, and they are obtained by repeatedly initializing a fixed grid of particles at each frame, then moving both current and past particles using optical flow. Streaklines are the locus of points that connect particles which originated from the same initial position. This approach is advantageous over the previous representations in two aspects: first, its rich representation captures the dynamics of the crowd and changes in space and time in the scene where the optical flow representation is not enough, and second, this model is capable of discovering groups of similar behavior within a crowd scene by performing motion segmentation. We propose a method to distinguish different group behaviors such as divergent/convergent motion and lanes using this framework. Finally, we introduce flow potentials as a discriminative feature to iv recognize crowd behaviors in a scene. Results of extensive experiments are presented for multiple real life crowd sequences involving pedestrian and vehicular traffic. The proposed method exploits optical flow as the low level feature and performs integration and clustering to obtain coherent group motion patterns. However, we observe that in crowd video sequences, as well as a variety of other vision applications, the co-occurrence and inter-relation of motion patterns are the main characteristics of group behaviors. In other words, the group behavior of objects is a mixture of individual actions or behaviors in specific geometrical layout and temporal order. We, therefore, propose a new representation for group behaviors of humans using the interrelation of motion patterns in a scene. The representation is based on bag of visual phrases of spatio-temporal visual words. We present a method to match the high-order spatial layout of visual words that preserve the geometry of the visual words under similarity transformations. To perform the experiments we collected a dataset of group choreography performances from the YouTube website. The dataset currently contains four categories of group dances

    An investigation into common challenges of 3D scene understanding in visual surveillance

    Get PDF
    Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics – methods to extract optimal features for a specific CCTV application – as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object “classification” and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a scene’s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future

    A machine vision approach to rock fragmentation analysis

    Get PDF
    Bibliography: p. 217-223.[pp. i - iv missing] This thesis is concerned with the development of an instrument for the purpose of performing online measurement of rock size distribution using machine vision. This instrument has application in the gold mining industry where it could be used to measure the fragmentation of gold ore on a conveyor belt feed to an autogenous mill, for the purpose of controlling the mill. The gold ore can range in size from fine material (< 20mm) to very large rocks (0.5m). A machine vision approach is only capable of directly measuring the projected area of particles at the surface of the rock-stream. A volume distribution has to be estimated from this using a stereological method. These methods have been investigated previously and are typically error prone. They have not been investigated here. An investigation of lighting demonstrates that a diffuse lighting arrangement is suitable for this application. This would have two advantages: specular reflection from wet material is suppressed; and intensity values can be used to predict the orientation of the surface of the particles. A computational structure has been developed to identify and delineate rocks in an image for the purpose of measuring their areas. It is based on the human visual system in that it consists of a low-level preattentive vision stage and a higher-level stage of attention focusing. Multiscalar image processing techniques have also been integrated in order to improve the detection of rocks across a wide range of sizes. A performance advantage can be obtained in this way because all the algorithms can be better matched to the size of the objects being detected. Results have been obtained with an average true detection rate of 69 and a further close miss rate of 14 , with very few false alarms. The overall result is that the measured projected area distribution closely matches the true value for each test image

    Artificial intelligence and image processing applications for high-throughput phenotyping

    Get PDF
    Doctor of PhilosophyDepartment of Computer ScienceMitchell L NeilsenThe areas of Computer Vision and Scientific Computing have witnessed rapid growth in the last decade with the fields of industrial robotics, automotive and healthcare acting as the primary vehicles for research and advancement. However, related research in other fields, such as agriculture, remains an understudied problem. This dissertation explores the application of Computer Vision and Scientific Computing in an agricultural domain known as High-throughput Phenotyping (HTP). HTP is the assessment of complex seed traits such as growth, development, tolerance, resistance, ecology, yield, and the measurement of parameters that form more complex traits. The dissertation makes the following contributions: The first contribution is the development of algorithms to estimate morphometric traits such as length, width, area, and seed kernel count using 3-D graphics and static image processing, and the extension of existing algorithms for the same. The second contribution is the development of lightweight frameworks to aid in synthetic image dataset creation and image cropping for deep neural networks in HTP. Deep neural networks require a plethora of training data to yield results of the highest quality. However, no such training datasets are readily available for HTP research, especially on seed kernels. The proposed synthetic image generation framework helps generate a profusion of training data at will to train neural networks from a meager samples of seed kernels. Besides requiring large quantities of data, deep neural networks require the input to be a certain size. However, not all available data are in the size required by the deep neural networks. The proposed image cropper helps to resize images without resulting in any distortion, thereby, making image data fit for consumption. The third contribution is the design and analysis of supervised and self-supervised neural network architectures trained on synthetic images to perform the tasks of seed kernel classification, counting and morphometry. In the area of supervised image classification, state-of-the-art neural network models of VGG-16, VGG-19 and ResNet-101 are investigated. A Simple framework for Contrastive Learning of visual Representations (SimCLR) [137], Momentum Contrast (MoCo) [55] and Bootstrap Your Own Latent (BYOL) [123] are leveraged for self-supervised image classification. The instance-based segmentation deep neural network models of Mask R-CNN and YOLO are utilized to perform the tasks of seed kernel classification, segmentation and counting. The results demonstrate the feasibility of deep neural networks for their respective tasks of classification and instance segmentation. In addition to estimating seed kernel count from static images, algorithms that aid in seed kernel counting from videos are proposed and analyzed. Proposed is an algorithm that creates a slit image which can be analyzed to estimate seed count. Upon the creation of the slit image, the video is no longer required to estimate seed count, thereby, significantly lowering the computational resources required for the estimation. The fourth contribution is the development of an end-to-end, automated image capture system for single seed kernel analysis. In addition to estimating length and width from 2-D images, the proposed system estimates the volume of a seed kernel from 2-D images using the technique of volume sculpting. The relative standard deviation of the results produced by the proposed technique is lower (better) than the relative standard deviation of the results produced by volumetric estimation using the ellipsoid slicing technique. The fifth contribution is the development of image processing algorithms to provide feature enhancements to mobile applications to improve upon on-site phenotyping capabilities. Algorithms for two features of high value namely, leaf angle estimation and fractional plant cover estimation are developed. The leaf angle estimation feature estimates the angle between stem and leaf for images captured using mobile phone cameras whereas fractional plant cover is to determine companion plants i.e., plants that are able to co-exist and mutually benefit. The proposed techniques, frameworks and findings lay a solid foundation for future Computer Vision and Scientific Computing research in the domain of agriculture. The contributions are significant since the dissertation not only proposes techniques, but also develops low-cost end-to-end frameworks to leverage the proposed techniques in a scalable fashion

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
    • …
    corecore