67,230 research outputs found
Visual Question Answering: A Survey of Methods and Datasets
Visual Question Answering (VQA) is a challenging task that has received
increasing attention from both the computer vision and the natural language
processing communities. Given an image and a question in natural language, it
requires reasoning over visual elements of the image and general knowledge to
infer the correct answer. In the first part of this survey, we examine the
state of the art by comparing modern approaches to the problem. We classify
methods by their mechanism to connect the visual and textual modalities. In
particular, we examine the common approach of combining convolutional and
recurrent neural networks to map images and questions to a common feature
space. We also discuss memory-augmented and modular architectures that
interface with structured knowledge bases. In the second part of this survey,
we review the datasets available for training and evaluating VQA systems. The
various datatsets contain questions at different levels of complexity, which
require different capabilities and types of reasoning. We examine in depth the
question/answer pairs from the Visual Genome project, and evaluate the
relevance of the structured annotations of images with scene graphs for VQA.
Finally, we discuss promising future directions for the field, in particular
the connection to structured knowledge bases and the use of natural language
processing models.Comment: 25 page
Recommended from our members
Artificial intelligence approaches to predicting and detecting cognitive decline in older adults: A conceptual review.
Preserving cognition and mental capacity is critical to aging with autonomy. Early detection of pathological cognitive decline facilitates the greatest impact of restorative or preventative treatments. Artificial Intelligence (AI) in healthcare is the use of computational algorithms that mimic human cognitive functions to analyze complex medical data. AI technologies like machine learning (ML) support the integration of biological, psychological, and social factors when approaching diagnosis, prognosis, and treatment of disease. This paper serves to acquaint clinicians and other stakeholders with the use, benefits, and limitations of AI for predicting, diagnosing, and classifying mild and major neurocognitive impairments, by providing a conceptual overview of this topic with emphasis on the features explored and AI techniques employed. We present studies that fell into six categories of features used for these purposes: (1) sociodemographics; (2) clinical and psychometric assessments; (3) neuroimaging and neurophysiology; (4) electronic health records and claims; (5) novel assessments (e.g., sensors for digital data); and (6) genomics/other omics. For each category we provide examples of AI approaches, including supervised and unsupervised ML, deep learning, and natural language processing. AI technology, still nascent in healthcare, has great potential to transform the way we diagnose and treat patients with neurocognitive disorders
- …