79 research outputs found

    Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement.

    Get PDF
    Visual attention is a kind of fundamental cognitive capability that allows human beings to focus on the region of interests (ROIs) under complex natural environments. What kind of ROIs that we pay attention to mainly depends on two distinct types of attentional mechanisms. The bottom-up mechanism can guide our detection of the salient objects and regions by externally driven factors, i.e. color and location, whilst the top-down mechanism controls our biasing attention based on prior knowledge and cognitive strategies being provided by visual cortex. However, how to practically use and fuse both attentional mechanisms for salient object detection has not been sufficiently explored. To the end, we propose in this paper an integrated framework consisting of bottom-up and top-down attention mechanisms that enable attention to be computed at the level of salient objects and/or regions. Within our framework, the model of a bottom-up mechanism is guided by the gestalt-laws of perception. We interpreted gestalt-laws of homogeneity, similarity, proximity and figure and ground in link with color, spatial contrast at the level of regions and objects to produce feature contrast map. The model of top-down mechanism aims to use a formal computational model to describe the background connectivity of the attention and produce the priority map. Integrating both mechanisms and applying to salient object detection, our results have demonstrated that the proposed method consistently outperforms a number of existing unsupervised approaches on five challenging and complicated datasets in terms of higher precision and recall rates, AP (average precision) and AUC (area under curve) values

    SR-POD : sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection

    Get PDF
    Convolutional neural networks (CNNs) have outperformed most state-of-the-art methods in object detection. However, CNNs suffer the difficulty of detecting objects with rotation, because the dataset used to train the CCNs often does not contain sufficient samples with various angles of orientation. In this paper, we propose a novel data-augmentation approach to handle samples with rotation, which utilizes the distribution of the object's orientation without the time-consuming process of rotating the sample images. Firstly, we present an orientation descriptor, named as "principal-axis orientation" to describe the orientation of the object's principal axis in an image and estimate the distribution of objectsā€™ principal-axis orientations (PODs) of the whole dataset. Secondly, we define a similarity metric to calculate the POD similarity between the training set and an additional dataset, which is built by randomly selecting images from the benchmark ImageNet ILSVRC2012 dataset. Finally, we optimize a cost function to obtain an optimal rotation angle, which indicates the highest POD similarity between the two aforementioned data sets. In order to evaluate our data augmentation method for object detection, experiments, conducted on the benchmark PASCAL VOC2007 dataset, show that with the training set augmented using our method, the average precision (AP) of the Faster RCNN in the TV-monitor is improved by 7.5%. In addition, our experimental results also demonstrate that new samples generated by random rotation are more likely to result in poor performance of object detection

    VIP-STB farm: scale-up village to county/province level to support science and technology at backyard (STB) program.

    Get PDF
    In this paper, we introduce a new concept in VIP-STB, a funded project through Agri-Tech in China: Newton Network+ (ATCNN), in developing feasible solutions towards scaling-up STB from village level to upper level via some generic models and systems. There are three tasks in this project, i.e. normalized difference vegetation index (NDVI) estimation, wheat density estimation and household-based small farms (HBSF) engagement. In the first task, several machine learning models have been used to evaluate the performance of NDVI estimation. In the second task, integrated software via Python and Twilio is developed to improve communication services and engagement for HBSFs, and provides technical capabilities. In the third task, crop density/population is predicted by conventional image processing techniques. The objectives and strategy for VIP-STB are described, experimental results on each task are presented, and more details on each model that has been implemented are also provided with future development guidance

    Hypothesis-based image segmentation for object learning and recognition

    Get PDF
    Denecke A. Hypothesis-based image segmentation for object learning and recognition. Bielefeld: UniversitƤt Bielefeld; 2010.This thesis addresses the figure-ground segmentation problem in the context of complex systems for automatic object recognition as well as for the online and interactive acquisition of visual representations. First the problem of image segmentation in general terms and next its importance for object learning in current state-of-the-art systems is introduced. Secondly a method using artificial neural networks is presented. This approach on the basis of Generalized Learning Vector Quantization is investigated in challenging scenarios such as the real-time figure-ground segmentation of complex shaped objects under continuously changing environment conditions. The ability to fulfill these requirements characterizes the novelty of the approach compared to state-of-the-art methods. Finally our technique is extended towards online adaption of model complexity and the integration of several segmentation cues. This yields a framework for object segmentation that is applicable to improve current systems for visual object learning and recognition

    Deep background subtraction of thermal and visible imagery for redestrian detection in videos

    Get PDF
    In this paper, we introduce an efficient framework to subtract the background from both visible and thermal imagery for pedestriansā€™ detection in the urban scene. We use a deep neural network (DNN) to train the background subtraction model. For the training of the DNN, we first generate an initial background map and then employ randomly 5% video frames, background map, and manually segmented ground truth. Then we apply a cognition-based post-processing to further smooth the foreground detection result. We evaluate our method against our previous work and 11 recently widely cited method on three challenge video series selected from a publicly available color-thermal benchmark dataset OCTBVS. Promising results have been shown that the proposed DNN-based approach can successfully detect the pedestrians with good shape in most scenes regardless of illuminate changes and occlusion problem

    Incremental learning-based visual tracking with weighted discriminative dictionaries

    Get PDF
    Existing sparse representation-based visual tracking methods detect the target positions by minimizing the reconstruction error. However, due to complex background, illumination change, and occlusion problems, these methods are difficult to locate the target properly. In this article, we propose a novel visual tracking method based on weighted discriminative dictionaries and a pyramidal feature selection strategy. First, we utilize color features and texture features of the training samples to obtain multiple discriminative dictionaries. Then, we use the position information of those samples to assign weights to the base vectors in dictionaries. For robust visual tracking, we propose a pyramidal sparse feature selection strategy where the weights of base vectors and reconstruction errors in different feature are integrated together to get the best target regions. At the same time, we measure feature reliability to dynamically adjust the weights of different features. In addition, we introduce a scenario-aware mechanism and an incremental dictionary update method based on noise energy analysis. Comparison experiments show that the proposed algorithm outperforms several state-of-the-art methods, and useful quantitative and qualitative analyses are also carried out

    A Survey for Graphic Design Intelligence

    Full text link
    Graphic design is an effective language for visual communication. Using complex composition of visual elements (e.g., shape, color, font) guided by design principles and aesthetics, design helps produce more visually-appealing content. The creation of a harmonious design requires carefully selecting and combining different visual elements, which can be challenging and time-consuming. To expedite the design process, emerging AI techniques have been proposed to automatize tedious tasks and facilitate human creativity. However, most current works only focus on specific tasks targeting at different scenarios without a high-level abstraction. This paper aims to provide a systematic overview of graphic design intelligence and summarize literature in the taxonomy of representation, understanding and generation. Specifically we consider related works for individual visual elements as well as the overall design composition. Furthermore, we highlight some of the potential directions for future explorations.Comment: 10 pages, 2 figure

    Compressive sensing based secret signals recovery for effective image steganalysis in secure communications

    Get PDF
    Conventional image steganalysis mainly focus on presence detection rather than the recovery of the original secret messages that were embedded in the host image. To address this issue, we propose an image steganalysis method featured in the compressive sensing (CS) domain, where block CS measurement matrix senses the transform coefficients of stego-image to reflect the statistical differences between the cover and stego- images. With multi-hypothesis prediction in the CS domain, the reconstruction of hidden signals is achieved efficiently. Extensive experiments have been carried out on five diverse image databases and benchmarked with four typical stegographic algorithms. The comprehensive results have demonstrated the efficacy of the proposed approach as a universal scheme for effective detection of stegography in secure communications whilst it has greatly reduced the numbers of features requested for secret signal reconstruction

    Eigenface algorithm-based facial expression recognition in conversations - an experimental study

    Get PDF
    Recognising facial expressions is important in many fields such as computer-human interface. Though different approaches have been widely used in facial expression recognition systems, there are still many problems in practice to achieve the best implementation outcomes. Most systems are tested via the lab-based facial expressions, which may be unnatural. Particularly many systems have problems when they are used for recognising the facial expressions being used during conversation. This paper mainly conducts an experi-mental study on Eigenface algorithm-based facial expression recognition. It primarily aims to investigate the performance of both lab-based facial expressions and facial expressions used during conversation. The experiment also aims to probe the problems arising from the recognition of facial expression in conversations. The study is carried out using both the authorā€™s facial expression as the basis for the lab-based expressions and the facial expression from one elderly person during conversation. The experiment showed a good result in lab-based facial expressions, but there are some issues observed when using the case of facial expressions obtained in conversation. By analysing the experimental results, future research focus has been highlighted as the investigation of how to recognise special emotions such as a wry smile and how to deal with the interferences in the lower part of face when speaking
    • ā€¦
    corecore