12 research outputs found

    Audiovisual Saliency Prediction in Uncategorized Video Sequences based on Audio-Video Correlation

    Full text link
    Substantial research has been done in saliency modeling to develop intelligent machines that can perceive and interpret their surroundings. But existing models treat videos as merely image sequences excluding any audio information, unable to cope with inherently varying content. Based on the hypothesis that an audiovisual saliency model will be an improvement over traditional saliency models for natural uncategorized videos, this work aims to provide a generic audio/video saliency model augmenting a visual saliency map with an audio saliency map computed by synchronizing low-level audio and visual features. The proposed model was evaluated using different criteria against eye fixations data for a publicly available DIEM video dataset. The results show that the model outperformed two state-of-the-art visual saliency models.Comment: 9 pages, 2 figures, 4 table

    Incorporating a humanoid robot to motivate the geometric figures learning

    Get PDF
    Technology has been introduced into educational environments to facilitate learning and engage the students interest. Robotics can be an interesting alternative to explore theoretical concepts covered in class. In this paper, a computational system capable of detecting objects was incorporated into the robot NAO, so it can Interact with students, recognizing geometric shapes with overlap. The system consists of two models of neural networks and was evaluated through a sequence of didatic activities presented to students of the 5th year, aiming to encourage them to perform the tasks. The robot operates autonomously, recognizing and counting the diferente objects in the image. The results show that the children felt very motivated and engaged to fulfill the tasks.São Paulo State Research Foundation (FAPESP)Brazilian National Research Council (CNPq

    A computational visual saliency model for images.

    Get PDF
    Human eyes receive an enormous amount of information from the visual world. It is highly difficult to simultaneously process this excessive information for the human brain. Hence the human visual system will selectively process the incoming information by attending only the relevant regions of interest in a scene. Visual saliency characterises some parts of a scene that appears to stand out from its neighbouring regions and attracts the human gaze. Modelling saliency-based visual attention has been an active research area in recent years. Saliency models have found vital importance in many areas of computer vision tasks such as image and video compression, object segmentation, target tracking, remote sensing and robotics. Many of these applications deal with high-resolution images and real-time videos and it is a challenge to process this excessive amount of information with limited computational resources. Employing saliency models in these applications will limit the processing of irrelevant information and further will improve their efficiency and performance. Therefore, a saliency model with good prediction accuracy and low computation time is highly essential. This thesis presents a low-computation wavelet-based visual saliency model designed to predict the regions of human eye fixations in images. The proposed model uses two-channel information luminance (Y) and chrominance (Cr) in YCbCr colour space for saliency computation. These two channels are decomposed to their lowest resolution using two-dimensional Discrete Wavelet Transform (DWT) to extract the local contrast features at multiple scales. The extracted local contrast features are integrated at multiple levels using a two-dimensional entropy-based feature combination scheme to derive a combined map. The combined map is normalized and enhanced using natural logarithm transformation to derive a final saliency map. The performance of the model has been evaluated qualitatively and quantitatively using two large benchmark image datasets. The experimental results show that the proposed model has achieved better prediction accuracy both qualitatively and quantitatively with a significant reduction in computation time when compared to the existing benchmark models. It has achieved nearly 25% computational savings when compared to the benchmark model with the lowest computation time

    A brief survey of visual saliency detection

    Get PDF

    Advanced Visual Computing for Image Saliency Detection

    Get PDF
    Saliency detection is a category of computer vision algorithms that aims to filter out the most salient object in a given image. Existing saliency detection methods can generally be categorized as bottom-up methods and top-down methods, and the prevalent deep neural network (DNN) has begun to show its applications in saliency detection in recent years. However, the challenges in existing methods, such as problematic pre-assumption, inefficient feature integration and absence of high-level feature learning, prevent them from superior performances. In this thesis, to address the limitations above, we have proposed multiple novel models with favorable performances. Specifically, we first systematically reviewed the developments of saliency detection and its related works, and then proposed four new methods, with two based on low-level image features, and two based on DNNs. The regularized random walks ranking method (RR) and its reversion-correction-improved version (RCRR) are based on conventional low-level image features, which exhibit higher accuracy and robustness in extracting the image boundary based foreground / background queries; while the background search and foreground estimation (BSFE) and dense and sparse labeling (DSL) methods are based on DNNs, which have shown their dominant advantages in high-level image feature extraction, as well as the combined strength of multi-dimensional features. Each of the proposed methods is evaluated by extensive experiments, and all of them behave favorably against the state-of-the-art, especially the DSL method, which achieves remarkably higher performance against sixteen state-of-the-art methods (including ten conventional methods and six learning based methods) on six well-recognized public datasets. The successes of our proposed methods reveal more potential and meaningful applications of saliency detection in real-life computer vision tasks

    Uma abordagem baseada em filtragem colaborativa integrada a mapas de saliência para a recomendação de imagens

    Get PDF
    Nowadays, the amount of customers using sites for shopping is greatly increasing, mainly due to the easiness and rapidity of this way of consumption. The sites, differently from physical stores, can make anything available to customers. In this context, Recommender Systems (RS) have become indispensable to help consumers to find products that may possibly pleasant or be useful to them. These systems often use techniques of Collaborating Filtering (CF), whose main underlying idea is that products are recommended to a given user based on purchase information and evaluations of past, by a group of users similar to the user who is requesting recommendation. One of the main challenges faced by such a technique is the need of the user to provide some information about her preferences on products in order to get further recommendations from the system. When there are items that do not have ratings or that possess quite few ratings available, the recommender system performs poorly. This problem is known as new item cold-start. In this paper, we propose to investigate in what extent information on visual attention can help to produce more accurate recommendation models. We present a new CF strategy, called IKB-MS, that uses visual attention to characterize images and alleviate the new item cold-start problem. In order to validate this strategy, we created a clothing image database and we use three algorithms well known for the extraction of visual attention these images. An extensive set of experiments shows that our approach is efficient and outperforms state-of-the-art CF RS.Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorDissertação (Mestrado)Hoje em dia, a quantidade de consumidores que utilizam sites para realizar compras está aumentando muito, principalmente devido à facilidade e rapidez dessa forma de consumo. Os sites, diferentemente de lojas físicas, podem mostrar qualquer conteúdo existente para os clientes. Nesse contexto, Sistemas de Recomendação (SR) tornaram-se indispensáveis para ajudar os consumidores a encontrar produtos que podem, eventualmente, ser agradáveis ou úteis para eles. Esses sistemas, geralmente, usam técnicas de Filtragem Colaborativa (FC), cuja principal ideia é que os produtos são recomendados para um determinado usuário com base em informações sobre compras e avaliações realizadas no passado, por um grupo de usuários similares ao usuário que está pedindo recomendação. Um dos principais desafios enfrentados por tal técnica é a necessidade de o usuário fornecer algumas informações sobre suas preferências sobre os produtos, a fim de obter novas recomendações do sistema. Quando há itens que não têm avaliações ou que possuem poucas avaliações disponíveis, o sistema de recomendação executa mal. Esse problema é conhecido como problema de partida a frio de itens. Neste trabalho, propomos investigar em que ponto a informação sobre a atenção visual pode ajudar a produzir modelos mais precisos de recomendação. Nós apresentamos uma nova estratégia de FC, denominada IKB-MS, que utiliza atenção visual para caracterizar as imagens e minimizar o problema de partida a frio de itens. A fim de validar essa estratégia, criamos um banco de dados de imagens de roupas e usamos três algoritmos bem conhecidos para a extração da atenção visual dessas imagens. Um extenso conjunto de experimentos mostra que a nossa abordagem é eficiente e supera os SR de FC do estado-da-arte
    corecore