143 research outputs found

    Salient Object Detection Techniques in Computer Vision-A Survey.

    Full text link
    Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end

    Principles and Guidelines for Advancement of Touchscreen-Based Non-visual Access to 2D Spatial Information

    Get PDF
    Graphical materials such as graphs and maps are often inaccessible to millions of blind and visually-impaired (BVI) people, which negatively impacts their educational prospects, ability to travel, and vocational opportunities. To address this longstanding issue, a three-phase research program was conducted that builds on and extends previous work establishing touchscreen-based haptic cuing as a viable alternative for conveying digital graphics to BVI users. Although promising, this approach poses unique challenges that can only be addressed by schematizing the underlying graphical information based on perceptual and spatio-cognitive characteristics pertinent to touchscreen-based haptic access. Towards this end, this dissertation empirically identified a set of design parameters and guidelines through a logical progression of seven experiments. Phase I investigated perceptual characteristics related to touchscreen-based graphical access using vibrotactile stimuli, with results establishing three core perceptual guidelines: (1) a minimum line width of 1mm should be maintained for accurate line-detection (Exp-1), (2) a minimum interline gap of 4mm should be used for accurate discrimination of parallel vibrotactile lines (Exp-2), and (3) a minimum angular separation of 4mm should be used for accurate discrimination of oriented vibrotactile lines (Exp-3). Building on these parameters, Phase II studied the core spatio-cognitive characteristics pertinent to touchscreen-based non-visual learning of graphical information, with results leading to the specification of three design guidelines: (1) a minimum width of 4mm should be used for supporting tasks that require tracing of vibrotactile lines and judging their orientation (Exp-4), (2) a minimum width of 4mm should be maintained for accurate line tracing and learning of complex spatial path patterns (Exp-5), and (3) vibrotactile feedback should be used as a guiding cue to support the most accurate line tracing performance (Exp-6). Finally, Phase III demonstrated that schematizing line-based maps based on these design guidelines leads to development of an accurate cognitive map. Results from Experiment-7 provide theoretical evidence in support of learning from vision and touch as leading to the development of functionally equivalent amodal spatial representations in memory. Findings from all seven experiments contribute to new theories of haptic information processing that can guide the development of new touchscreen-based non-visual graphical access solutions

    Integrated multi-scale architecture of the cortex with application to computer vision

    Get PDF
    Tese de dout., Engenharia Electrónica e de Computadores, Faculdade de Ciência e Tecnologia, Universidade do Algarve, 2007The main goal of this thesis is to try to understand the functioning of the visual cortex through the development of computational models. In the input layer V1 of the visual cortex there are simple, complex and endstopped cells. These provide a multi-scale representation of objects and scene in terms of lines, edges and keypoints. In this thesis we combine recent progress concerning the development of computational models of these and other cells with processes in higher cortical areas V2 and V4 etc. Three pertinent challenges are discussed: (i) object recognition embedded in a cortical architecture; (ii) brightness perception, and (iii) painterly rendering based on human vision. Specific aspects are Focusof- Attention by means of keypoint-based saliency maps, the dynamic routing of features from V1 through higher cortical areas in order to obtain translation, rotation and size invariance, and the construction of normalized object templates with canonical views in visual memory. Our simulations show that the multi-scale representations can be integrated into a cortical architecture in order to model subsequent processing steps: from segregation, via different categorization levels, until final object recognition is obtained. As for real cortical processing, the system starts with coarse-scale information, refines categorization by using mediumscale information, and employs all scales in recognition. We also show that a 2D brightness model can be based on the multi-scale symbolic representation of lines and edges, with an additional low-pass channel and nonlinear amplitude transfer functions, such that object recognition and brightness perception are combined processes based on the same information. The brightness model can predict many different effects such as Mach bands, grating induction, the Craik-O’Brien-Cornsweet illusion and brightness induction, i.e. the opposite effects of assimilation (White effect) and simultaneous brightness contrast. Finally, a novel application is introduced: painterly rendering has been linked to computer vision, but we propose to link it to human vision because perception and painting are two processes which are strongly interwoven

    Modelling visual search for surface defects

    Get PDF
    Much work has been done on developing algorithms for automated surface defect detection. However, comparisons between these models and human perception are rarely carried out. This thesis aims to investigate how well human observers can nd defects in textured surfaces, over a wide range of task di culties. Stimuli for experiments will be generated using texture synthesis methods and human search strategies will be captured by use of an eye tracker. Two di erent modelling approaches will be explored. A computational LNL-based model will be developed and compared to human performance in terms of the number of xations required to find the target. Secondly, a stochastic simulation, based on empirical distributions of saccades, will be compared to human search strategies

    Human action recognition using saliency-based global and local features

    Get PDF
    Recognising human actions from video sequences is one of the most important topics in computer vision and has been extensively researched during the last decades; however, it is still regarded as a challenging task especially in real scenarios due to difficulties mainly resulting from background clutter, partial occlusion, as well as changes in scale, viewpoint, lighting, and appearance. Human action recognition is involved in many applications, including video surveillance systems, human-computer interaction, and robotics for human behaviour characterisation. In this thesis, we aim to introduce new features and methods to enhance and develop human action recognition systems. Specifically, we have introduced three methods for human action recognition. In the first approach, we present a novel framework for human action recognition based on salient object detection and a combination of local and global descriptors. Saliency Guided Feature Extraction (SGFE) is proposed to detect salient objects and extract features on the detected objects. We then propose a simple strategy to identify and process only those video frames that contain salient objects. Processing salient objects instead of all the frames not only makes the algorithm more efficient, but more importantly also suppresses the interference of background pixels. We combine this approach with a new combination of local and global descriptors, namely 3D SIFT and Histograms of Oriented Optical Flow (HOOF). The resulting Saliency Guided 3D SIFT and HOOF (SGSH) feature is used along with a multi-class support vector machine (SVM) classifier for human action recognition. The second proposed method is a novel 3D extension of Gradient Location and Orientation Histograms (3D GLOH) which provides discriminative local features representing both the gradient orientation and their relative locations. We further propose a human action recognition system based on the Bag of Visual Words model, by combining the new 3D GLOH local features with Histograms of Oriented Optical Flow (HOOF) global features. Along with the idea from our first work to extract features only in salient regions, our overall system outperforms existing feature descriptors for human action recognition for challenging video datasets. Finally, we propose to extract minimal representative information, namely deforming skeleton graphs corresponding to foreground shapes, to effectively represent actions and remove the influence of changes of illumination, subject appearance and backgrounds. We propose a novel approach to action recognition based on matching of skeleton graphs, combining static pairwise graph similarity measure using Optimal Subsequence Bijection with Dynamic TimeWarping to robustly handle topological and temporal variations. We have evaluated the proposed methods by conducting extensive experiments on widely-used human action datasets including the KTH, the UCF Sports, TV Human Interaction (TVHI), Olympic Sports and UCF11 datasets. Experimental results show the effectiveness of our methods for action recognition
    corecore