25 research outputs found

    Long Range Automated Persistent Surveillance

    Get PDF
    This dissertation addresses long range automated persistent surveillance with focus on three topics: sensor planning, size preserving tracking, and high magnification imaging. field of view should be reserved so that camera handoff can be executed successfully before the object of interest becomes unidentifiable or untraceable. We design a sensor planning algorithm that not only maximizes coverage but also ensures uniform and sufficient overlapped camera’s field of view for an optimal handoff success rate. This algorithm works for environments with multiple dynamic targets using different types of cameras. Significantly improved handoff success rates are illustrated via experiments using floor plans of various scales. Size preserving tracking automatically adjusts the camera’s zoom for a consistent view of the object of interest. Target scale estimation is carried out based on the paraperspective projection model which compensates for the center offset and considers system latency and tracking errors. A computationally efficient foreground segmentation strategy, 3D affine shapes, is proposed. The 3D affine shapes feature direct and real-time implementation and improved flexibility in accommodating the target’s 3D motion, including off-plane rotations. The effectiveness of the scale estimation and foreground segmentation algorithms is validated via both offline and real-time tracking of pedestrians at various resolution levels. Face image quality assessment and enhancement compensate for the performance degradations in face recognition rates caused by high system magnifications and long observation distances. A class of adaptive sharpness measures is proposed to evaluate and predict this degradation. A wavelet based enhancement algorithm with automated frame selection is developed and proves efficient by a considerably elevated face recognition rate for severely blurred long range face images

    Bio-inspired foveal and peripheral visual sensing for saliency-based decision making in robotics

    Get PDF
    Computer vision is an area of research that has grown at immense speed in the last few decades, tackling problems towards scene understanding from very diverse fronts, such as image classification, object detection, localization, mapping and tracking. It has also been long understood that there are very valuable lessons to learn from biology and to be applied to this research field, where the human visual system is very likely the most studied brain mechanism. The eye foveation system is a very good example of such lessons, since both machines and animals often face a similar dilemma; to prioritize visual areas of interest to faster process information, given limited computing power and from a field of view that is too wide to be simultaneously attended. While extensive models of artificial foveation have been presented, the re-emerging area of machine learning with deep neural networks has opened the question into how these two approaches can contribute to each other. Novel deep learning models often rely on the availability of substantial computing power, but areas of application face strict constraints, a good example are unmanned aerial vehicles, which in order to be autonomous should lift and power all their computing equipment. In this work it is studied how applying a foveation principle to down-scale images can be used to reduce the number of operations required for object detection, and compare its effect to normally down-sampled images, given the prevalent number of operations by Convolutional Neural Network (CNN) layers. Foveation requires prior knowledge of regions of interest to center the fovea, this point in question is addressed by a merging of bottom-up saliency and top-down feedback of objects that the CNN has been trained to detect. Albeit saliency models have also been studied extensively in the last couple of decades, most often comparing their performance to human observer datasets, the question remains open into how they fit in wider information processing paradigms and into functional representations of the human brain. It is proposed here an information flow scheme that encompasses these principles. Finally, to give to the model the capacity to operate coherently in the time domain, it adapts a representation of a well-established theory of the decision-making process that takes place in the basal ganglia region of the brain. The behaviour of this representation is then tested against human observer's data in an omnidirectional field of view, where the importance of selecting the most contextually relevant region of interest in each time-step is highlighted

    A computational model of visual attention.

    Get PDF
    Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation

    Foveation for 3D visualization and stereo imaging

    Get PDF
    Even though computer vision and digital photogrammetry share a number of goals, techniques, and methods, the potential for cooperation between these fields is not fully exploited. In attempt to help bridging the two, this work brings a well-known computer vision and image processing technique called foveation and introduces it to photogrammetry, creating a hybrid application. The results may be beneficial for both fields, plus the general stereo imaging community, and virtual reality applications. Foveation is a biologically motivated image compression method that is often used for transmitting videos and images over networks. It is possible to view foveation as an area of interest management method as well as a compression technique. While the most common foveation applications are in 2D there are a number of binocular approaches as well. For this research, the current state of the art in the literature on level of detail, human visual system, stereoscopic perception, stereoscopic displays, 2D and 3D foveation, and digital photogrammetry were reviewed. After the review, a stereo-foveation model was constructed and an implementation was realized to demonstrate a proof of concept. The conceptual approach is treated as generic, while the implementation was conducted under certain limitations, which are documented in the relevant context. A stand-alone program called Foveaglyph is created in the implementation process. Foveaglyph takes a stereo pair as input and uses an image matching algorithm to find the parallax values. It then calculates the 3D coordinates for each pixel from the geometric relationships between the object and the camera configuration or via a parallax function. Once 3D coordinates are obtained, a 3D image pyramid is created. Then, using a distance dependent level of detail function, spherical volume rings with varying resolutions throughout the 3D space are created. The user determines the area of interest. The result of the application is a user controlled, highly compressed non-uniform 3D anaglyph image. 2D foveation is also provided as an option. This type of development in a photogrammetric visualization unit is beneficial for system performance. The research is particularly relevant for large displays and head mounted displays. Although, the implementation, because it is done for a single user, would possibly be best suited to a head mounted display (HMD) application. The resulting stereo-foveated image can be loaded moderately faster than the uniform original. Therefore, the program can potentially be adapted to an active vision system and manage the scene as the user glances around, given that an eye tracker determines where exactly the eyes accommodate. This exploration may also be extended to robotics and other robot vision applications. Additionally, it can also be used for attention management and the viewer can be directed to the object(s) of interest the demonstrator would like to present (e.g. in 3D cinema). Based on the literature, we also believe this approach should help resolve several problems associated with stereoscopic displays such as the accommodation convergence problem and diplopia. While the available literature provides some empirical evidence to support the usability and benefits of stereo foveation, further tests are needed. User surveys related to the human factors in using stereo foveated images, such as its possible contribution to prevent user discomfort and virtual simulator sickness (VSS) in virtual environments, are left as future work.reviewe

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Wide-Area Surveillance System using a UAV Helicopter Interceptor and Sensor Placement Planning Techniques

    Get PDF
    This project proposes and describes the implementation of a wide-area surveillance system comprised of a sensor/interceptor placement planning and an interceptor unmanned aerial vehicle (UAV) helicopter. Given the 2-D layout of an area, the planning system optimally places perimeter cameras based on maximum coverage and minimal cost. Part of this planning system includes the MATLAB implementation of Erdem and Sclaroff’s Radial Sweep algorithm for visibility polygon generation. Additionally, 2-D camera modeling is proposed for both fixed and PTZ cases. Finally, the interceptor is also placed to minimize shortest-path flight time to any point on the perimeter during a detection event. Secondly, a basic flight control system for the UAV helicopter is designed and implemented. The flight control system’s primary goal is to hover the helicopter in place when a human operator holds an automatic-flight switch. This system represents the first step in a complete waypoint-navigation flight control system. The flight control system is based on an inertial measurement unit (IMU) and a proportional-integral-derivative (PID) controller. This system is implemented using a general-purpose personal computer (GPPC) running Windows XP and other commercial off-the-shelf (COTS) hardware. This setup differs from other helicopter control systems which typically use custom embedded solutions or micro-controllers. Experiments demonstrate the sensor placement planning achieving \u3e90% coverage at optimized-cost for several typical areas given multiple camera types and parameters. Furthermore, the helicopter flight control system experiments achieve hovering success over short flight periods. However, the final conclusion is that the COTS IMU is insufficient for high-speed, high-frequency applications such as a helicopter control system

    Neuronal encoding of natural imagery in dragonfly motion pathways

    Get PDF
    Vision is the primary sense of humans and most other animals. While the act of seeing seems easy, the neuronal architectures that underlie this ability are some of the most complex of the brain. Insects represent an excellent model for investigating how vision operates as they often lead rich visual lives while possessing relatively simple brains. Among insects, aerial predators such as the dragonfly face additional survival tasks. Not only must aerial predators successfully navigate three-dimensional visual environments, they must also be able to identify and track their prey. This task is made even more difficult due to the complexity of visual scenes that contain detail on all scales of magnification, making the job of the predator particularly challenging. Here I investigate the physiology of neurons accessible through tracts in the third neuropil of the optic lobe of the dragonfly. It is at this stage of processing that the first evidence of both wide-field motion and object detection emerges. My research extends the current understanding of two main pathways in the dragonfly visual system, the wide-field motion pathway and target-tracking pathway. While wide-field motion pathways have been studied in numerous insects, until now the dragonfly wide-field motion pathway remains unstudied. Investigation of this pathway has revealed properties, novel among insects, specifically the purely optical adaptation to motion at both high and low velocities through motion adaptation. Here I characterise these newly described neurons and investigate their adaptation properties. The dragonfly target-tracking pathway has been studied extensively, but most research has focussed on classical stimuli such as gratings and small black objects moving on white monitors. Here I extend previous research, which characterised the behaviour of target tracking neurons in cluttered environments, developing a paradigm to allow numerous properties of targets to be changed while still measuring tracking performance. I show that dragonfly neurons interact with clutter through the previously discovered selective attention system, treating cluttered scenes as collections of target-like features. I further show that this system uses the direction and speed of the target and background as one of the key parameters for tracking success. I also elucidate some additional properties of selective attention including the capacity to select for inhibitory targets or weakly salient features in preference to strongly excitatory ones. In collaboration with colleagues, I have also performed some limited modelling to demonstrate that a selective attention model, which includes switching best explains experimental data. Finally, I explore a mathematical model called divisive normalisation which may partially explain how neurons with large receptive fields can be used to re-establish target position information (lost in a position invariant system) through relatively simple integrations of multiple large receptive field neurons. In summary, my thesis provides a broad investigation into several questions about how dragonflies can function in natural environments. More broadly, my thesis addresses general questions about vision and how complicated visual tasks can be solved via clever strategies employed in neuronal systems and their modelled equivalents.Thesis (Ph.D.) -- University of Adelaide, Adelaide Medical School, 201

    Development of a practical and mobile brain-computer communication device for profoundly paralyzed individuals

    Full text link
    Thesis (Ph.D.)--Boston UniversityBrain-computer interface (BCI) technology has seen tremendous growth over the past several decades, with numerous groundbreaking research studies demonstrating technical viability (Sellers et al., 2010; Silvoni et al., 2011). Despite this progress, BCIs have remained primarily in controlled laboratory settings. This dissertation proffers a blueprint for translating research-grade BCI systems into real-world applications that are noninvasive and fully portable, and that employ intelligent user interfaces for communication. The proposed architecture is designed to be used by severely motor-impaired individuals, such as those with locked-in syndrome, while reducing the effort and cognitive load needed to communicate. Such a system requires the merging of two primary research fields: 1) electroencephalography (EEG)-based BCIs and 2) intelligent user interface design. The EEG-based BCI portion of this dissertation provides a history of the field, details of our software and hardware implementation, and results from an experimental study aimed at verifying the utility of a BCI based on the steady-state visual evoked potential (SSVEP), a robust brain response to visual stimulation at controlled frequencies. The visual stimulation, feature extraction, and classification algorithms for the BCI were specially designed to achieve successful real-time performance on a laptop computer. Also, the BCI was developed in Python, an open-source programming language that combines programming ease with effective handling of hardware and software requirements. The result of this work was The Unlock Project app software for BCI development. Using it, a four-choice SSVEP BCI setup was implemented and tested with five severely motor-impaired and fourteen control participants. The system showed a wide range of usability across participants, with classification rates ranging from 25-95%. The second portion of the dissertation discusses the viability of intelligent user interface design as a method for obtaining a more user-focused vocal output communication aid tailored to motor-impaired individuals. A proposed blueprint of this communication "app" was developed in this dissertation. It would make use of readily available laptop sensors to perform facial recognition, speech-to-text decoding, and geo-location. The ultimate goal is to couple sensor information with natural language processing to construct an intelligent user interface that shapes communication in a practical SSVEP-based BCI
    corecore