2 research outputs found
A system for large-scale image and video retrieval on everyday scenes
There has been a growing amount of multimedia data generated on the web todayin terms of size and diversity. This has made accurate content retrieval with these large and complex collections of data a challenging problem. Motivated by the need for systems that can enable scalable and efficient search, we propose QIK (Querying Images Using Contextual Knowledge). QIK leverages advances in deep learning (DL) and natural language processing (NLP) for scene understanding to enable large-scale multimedia retrieval on everyday scenes with common objects. The system consists of three major components: Indexer, Query Processor, and Video Processor. Given an image, the Indexer performs probabilistic image understanding (PIU). The PIU generated consists of the most probable captions, parsed and represented by tree structures using NLP techniques, and detected objects. The PIU's are stored and indexed in a database system. For a query image, the Query Processor generates the most probable caption and parses it into the corresponding tree structure. Then an optimized tree-pattern query is constructed and executed on the database to retrieve a set of candidate images. The candidate images fetched are ranked using the tree-edit distance metric computed on the tree structures. Given a video, the Video Processor extracts a sequence of key scenes that are posed to the Query Processor to retrieve a set of candidate scenes. The candidate scene parse trees corresponding to a video are extracted and are ranked based on the number of matching scenes. We evaluated the performance of our system for large-scale image and video retrieval tasks on datasets containing everyday scenes and observed that our system could outperform state-ofthe- art techniques in terms of mean average precision.Includes bibliographical references
Recommended from our members
Enabling Automated, Conversational Health Coaching with Human-Centered Artificial Intelligence
Health coaching is a promising approach to support self-management of chronic conditions like type 2 diabetes; however, there aren’t enough coaching practitioners to support those in need. Advances in Artificial Intelligence (AI) and Machine Learning (ML) have the potential to enable innovative, automated health coaching interventions, but important gaps remain in applying AI and ML to coaching interventions. This thesis aims to identify computational approaches and interactive technologies that enable automated health coaching systems. First, I utilized computational approaches that leverage individuals’ self-tracking and health data and used an expert system to translate ML inferences into personalized nutrition goal recommendations. The system, GlucoGoalie, was evaluated in multiple studies including a 4-week deployment study which demonstrated the feasibility of the approach.
Second, I compared human-powered and automated/chatbot approaches to health coaching in a 3-week study which found that t2.coach — a scripted, theoretically-grounded chatbot designed through an iterative, user-centered process — cultivated a coach-like experience that had many similarities to the experience of messaging with actual health coaches, and outlined directions for automated, conversational coaching interventions. Third, I examined multiple AI approaches to enable micro-coaching dialogs — brief coaching conversations related to specific meals, to support achievement of nutrition goals — including a knowledge-based system for natural language understanding, and a data-driven, reinforcement learning approach for dialog management. Together, the results of these studies contribute methods and insights that take steps towards more intelligent conversational coaching systems, with resonance to research in informatics, human-computer interaction, and health coaching