2,573 research outputs found
Structured Knowledge Representation for Image Retrieval
We propose a structured approach to the problem of retrieval of images by
content and present a description logic that has been devised for the semantic
indexing and retrieval of images containing complex objects. As other
approaches do, we start from low-level features extracted with image analysis
to detect and characterize regions in an image. However, in contrast with
feature-based approaches, we provide a syntax to describe segmented regions as
basic objects and complex objects as compositions of basic ones. Then we
introduce a companion extensional semantics for defining reasoning services,
such as retrieval, classification, and subsumption. These services can be used
for both exact and approximate matching, using similarity measures. Using our
logical approach as a formal specification, we implemented a complete
client-server image retrieval system, which allows a user to pose both queries
by sketch and queries by example. A set of experiments has been carried out on
a testbed of images to assess the retrieval capabilities of the system in
comparison with expert users ranking. Results are presented adopting a
well-established measure of quality borrowed from textual information
retrieval
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
Structured Knowledge Representation for Image Retrieval
We propose a structured approach to the problem of retrieval of images by content and present a description logic that has been devised for the semantic indexing and retrieval of images containing complex objects. As other approaches do, we start from low-level features extracted with image analysis to detect and characterize regions in an image. However, in contrast with feature-based approaches, we provide a syntax to describe segmented regions as basic objects and complex objects as compositions of basic ones. Then we introduce a companion extensional semantics for defining reasoning services, such as retrieval, classification, and subsumption. These services can be used for both exact and approximate matching, using similarity measures. Using our logical approach as a formal specification, we implemented a complete clientserver image retrieval system, which allows a user to pose both queries by sketch and queries by example. A set of experiments has been carried out on a testbed of images to assess the retrieval capabilities of the system in comparison with expert users ranking. Results are presented adopting a well-established measure of quality borrowed from textual information retrieval
Journalistic image access : description, categorization and searching
The quantity of digital imagery continues to grow, creating a pressing need to develop efficient methods for organizing and retrieving images. Knowledge on user behavior in image description and search is required for creating effective and satisfying searching experiences. The nature of visual information and journalistic images creates challenges in representing and matching images with user needs.
The goal of this dissertation was to understand the processes in journalistic image access (description, categorization, and searching), and the effects of contextual factors on preferred access points. These were studied using multiple data collection and analysis methods across several studies. Image attributes used to describe journalistic imagery were analyzed based on description tasks and compared to a typology developed through a meta-analysis of literature on image attributes. Journalistic image search processes and query types were analyzed through a field study and multimodal image retrieval experiment. Image categorization was studied via sorting experiments leading to a categorization model. Advances to research methods concerning search tasks and categorization procedures were implemented.
Contextual effects on image access were found related to organizational contexts, work, and search tasks, as well as publication context. Image retrieval in a journalistic work context was contextual at the level of image needs and search process. While text queries, together with browsing, remained the key access mode to journalistic imagery, participants also used visual access modes in the experiment, constructing multimodal queries. Assigned search task type and searcher expertise had an effect on query modes utilized. Journalistic images were mostly described and queried for on the semantic level but also syntactic attributes were used. Constraining the description led to more abstract descriptions. Image similarity was evaluated mainly based on generic semantics. However, functionally oriented categories were also constructed, especially by domain experts. Availability of page context promoted thematic rather than object-based categorization.
The findings increase our understanding of user behavior in image description, categorization, and searching, as well as have implications for future solutions in journalistic image access. The contexts of image production, use, and search merit more interest in research as these could be leveraged for supporting annotation and retrieval. Multiple access points should be created for journalistic images based on image content and function. Support for multimodal query formulation should also be offered. The contributions of this dissertation may be used to create evaluation criteria for journalistic image access systems
EgoTV: Egocentric Task Verification from Natural Language Task Descriptions
To enable progress towards egocentric agents capable of understanding
everyday tasks specified in natural language, we propose a benchmark and a
synthetic dataset called Egocentric Task Verification (EgoTV). EgoTV contains
multi-step tasks with multiple sub-task decompositions, state changes, object
interactions, and sub-task ordering constraints, in addition to abstracted task
descriptions that contain only partial details about ways to accomplish a task.
We also propose a novel Neuro-Symbolic Grounding (NSG) approach to enable the
causal, temporal, and compositional reasoning of such tasks. We demonstrate
NSG's capability towards task tracking and verification on our EgoTV dataset
and a real-world dataset derived from CrossTask (CTV). Our contributions
include the release of the EgoTV and CTV datasets, and the NSG model for future
research on egocentric assistive agents
- …