3 research outputs found
High-throughput visual knowledge analysis and retrieval in big data ecosystems
Visual knowledge plays an important role in many highly skilled applications, such as medical diagnosis, geospatial image analysis and pathology diagnosis. Medical practitioners are able to interpret and reason about diagnostic images based on not only primitive-level image features such as color, texture, and spatial distribution but also their experience and tacit knowledge which are seldom articulated explicitly. This reasoning process is dynamic and closely related to real-time human cognition. Due to a lack of visual knowledge management and sharing tools, it is difficult to capture and transfer such tacit and hard-won expertise to novices. Moreover, many mission-critical applications require the ability to process such tacit visual knowledge in real time. Precisely how to index this visual knowledge computationally and systematically still poses a challenge to the computing community. My dissertation research results in novel computational approaches for high-throughput visual knowledge analysis and retrieval from large-scale databases using latest technologies in big data ecosystems. To provide a better understanding of visual reasoning, human gaze patterns are qualitatively measured spatially and temporally to model observers' cognitive process. These gaze patterns are then indexed in a NoSQL distributed database as a visual knowledge repository, which is accessed using various unique retrieval methods developed through this dissertation work. To provide meaningful retrievals in real time, deep-learning methods for automatic annotation of visual activities and streaming similarity comparisons are developed under a gaze-streaming framework using Apache Spark. This research has several potential applications that offer a broader impact among the scientific community and in the practical world. First, the proposed framework can be adapted for different domains, such as fine arts, life sciences, etc. with minimal effort to capture human reasoning processes. Second, with its real-time visual knowledge search function, this framework can be used for training novices in the interpretation of domain images, by helping them learn experts' reasoning processes. Third, by helping researchers to understand human visual reasoning, it may shed light on human semantics modeling. Finally, integrating reasoning process with multimedia data, future retrieval of media could embed human perceptual reasoning for database search beyond traditional content-based media retrievals
Sistema de recomendação de imagens baseado em atenção visual
Nowadays, the amount of users using e-commerce sites for shopping is greatly increasing, mainly due to the easiness and rapidity of this way of consumption. Many e-commerce sites, differently from physical stores, can offer their users a wide range of products and services, and the users can find it very difficult to find products of your preference. Typically, your preference for a product can be influenced by the visual appearance of the product image. In this context, Image Recommendation Systems (IRS) have become indispensable to help users to find products that may possibly pleasant or be useful to them. Generally, IRS use past behavior of users (clicks, purchases, reviews, ratings, etc.) and/or attributes of the products to define the preferences of users. One of the main challenges faced by IRS is the need of the user to provide some information about his / her preferences on products in order to get further recommendations from
the system. Unfortunately, users are not always willing to provide such information explicitly. So, in order to cope with this challenge, methods for obtaining user’s implicit feedback are desirable. In this work, the author propose an investigation to discover to which extent information concerning user visual attention can help improve the rating prediction hence produce more accurate IRS. This work proposes to develop two new methods, a method based on Collaborative Filtering (CF) which combines ratings and data visual attention to represent the past behavior of users, and another method based on the content of the items, which combines textual attributes, visual features and visual attention data to compose the profile of the items. The proposed methods were evaluated in a painting dataset and a clothing dataset. The experimental results show significant improvements in rating prediction and precision in recommendation when compared to
the state-of-the-art. It is worth mentioning that the proposed techniques are flexible and can be applied in other scenarios that exploits the visual attention of the recommended items.Conselho Nacional de Desenvolvimento Científico e TecnológicoTese (Doutorado)Hoje em dia, a quantidade de usuários que utilizam sites de comércio eletrônico para realizar compras está aumentando muito, principalmente devido à facilidade e rapidez. Muitos sites de comércio eletrônico, diferentemente das lojas físicas, disponibilizam aos seus usuários uma grande variedade de produtos e serviços, e os usuários podem ter muita dificuldade em encontrar produtos de sua preferência. Normalmente, a preferência por um produto pode ser influenciada pela aparência visual da imagem do produto. Neste contexto, os Sistemas de Recomendação de produtos que estão associados a Imagens (SRI)
tornaram-se indispensáveis para ajudar os usuários a encontrar produtos que podem ser, eventualmente, agradáveis ou úteis para eles. Geralmente, os SRI usam o comportamento passado dos usuários (cliques, compras, críticas, avaliações, etc.) e/ou atributos de produtos para definirem as preferências dos usuários. Um dos principais desafios enfrentados em SRI é a necessidade de o usuário fornecer algumas informações sobre suas preferências sobre os produtos, a fim de obter novas recomendações do sistema. Infelizmente,
os usuários nem sempre estão dispostos a fornecer tais informações de forma explícita. Assim, a fim de lidar com esse desafio, os métodos para obtenção de informações de forma implícita do usuário são desejáveis. Neste trabalho, propõe-se investigar em que medida informações sobre atenção visual do usuário podem ajudar a melhorar a predição de avaliação e consequentemente produzir SRI mais precisos. É também objetivo deste trabalho o desenvolvimento de dois novos métodos, um método baseado em Filtragem
Colaborativa (FC) que combina avaliações e dados de atenção visual para representar o comportamento passado dos usuários, e outro método baseado no conteúdo dos itens, que combina atributos textuais, características visuais e dados de atenção visual para compor o perfil dos itens. Os métodos propostos foram avaliados em uma base de imagens de pinturas e uma base de imagens de roupas. Os resultados experimentais mostram que os métodos propostos neste trabalho possuem ganhos significativos em predição de avaliação e precisão na recomendação quando comparados ao estado-da-arte. Vale ressaltar que as técnicas propostas são flexíveis, podendo ser utilizadas em outros cenários que exploram a atenção visual dos itens recomendados
Interactive video retrieval using implicit user feedback.
PhDIn the recent years, the rapid development of digital technologies and the low
cost of recording media have led to a great increase in the availability of
multimedia content worldwide. This availability places the demand for the
development of advanced search engines. Traditionally, manual annotation of
video was one of the usual practices to support retrieval. However, the vast
amounts of multimedia content make such practices very expensive in terms of
human effort. At the same time, the availability of low cost wearable sensors
delivers a plethora of user-machine interaction data. Therefore, there is an
important challenge of exploiting implicit user feedback (such as user navigation
patterns and eye movements) during interactive multimedia retrieval sessions
with a view to improving video search engines. In this thesis, we focus on
automatically annotating video content by exploiting aggregated implicit
feedback of past users expressed as click-through data and gaze movements.
Towards this goal, we have conducted interactive video retrieval experiments, in
order to collect click-through and eye movement data in not strictly controlled
environments. First, we generate semantic relations between the multimedia
items by proposing a graph representation of aggregated past interaction data and
exploit them to generate recommendations, as well as to improve content-based
search. Then, we investigate the role of user gaze movements in interactive video
retrieval and propose a methodology for inferring user interest by employing
support vector machines and gaze movement-based features. Finally, we propose
an automatic video annotation framework, which combines query clustering into
topics by constructing gaze movement-driven random forests and temporally
enhanced dominant sets, as well as video shot classification for predicting the
relevance of viewed items with respect to a topic. The results show that
exploiting heterogeneous implicit feedback from past users is of added value for
future users of interactive video retrieval systems