Search CORE

6,148 research outputs found

Crowdsourcing in Computer Vision

Author: Fei-Fei Li
Grauman Kristen
Kovashka Adriana
Russakovsky Olga
Publication venue: 'Now Publishers'
Publication date: 01/01/2016
Field of study

Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of visual perception tasks. In this survey, we describe the types of annotations computer vision researchers have collected using crowdsourcing, and how they have ensured that this data is of high quality while annotation effort is minimized. We begin by discussing data collection on both classic (e.g., object recognition) and recent (e.g., visual story-telling) vision tasks. We then summarize key design decisions for creating effective data collection interfaces and workflows, and present strategies for intelligently selecting the most important data instances to annotate. Finally, we conclude with some thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 201

arXiv.org e-Print Archive

Crossref

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

Author: Gan Chuang
Gong Boqing
Li Haoxiang
Li Yandong
Sun Chen
Publication venue
Publication date: 15/08/2017
Field of study

Rich and dense human labeled datasets are among the main enabling factors for the recent advance on vision-language understanding. Many seemingly distant annotations (e.g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e.g., of COCO). The popularity of COCO correlates those annotations and tasks. Explicitly linking them up may significantly benefit both individual tasks and the unified vision and language modeling. We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links visual questions and segmentation answers (VQS). They transfer human supervision between the previously separate tasks, offer more effective leverage to existing problems, and also open the door for new research problems and models. We study two applications of the VQS data in this paper: supervised attention for VQA and a novel question-focused semantic segmentation task. For the former, we obtain state-of-the-art results on the VQA real multiple-choice task by simply augmenting the multilayer perceptrons with some attention features that are learned using the segmentation-QA links as explicit supervision. To put the latter in perspective, we study two plausible methods and compare them to an oracle method assuming that the instance segmentations are given at the test stage.Comment: To appear on ICCV 201

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Discovering visual concept structure with sparse and incomplete tags

Author: Andrew
Berger
Beyer
Breiman
Breiman
Cabral
Candès
Chen
Chen
Choi
Chua
Cover
Criminisi
Deng
Deng
Desai
Duin
Fellbaum
Feng
Frey
Frome
Frénay
Gong
Griffiths
Hardoon
Hu
Huang
Hwang
Jain
Jardine
Jingya Wang
Johnson
Klaser
Lin
Liu
Liu
Makadia
Montillo
Mu
Natarajan
Natarajan
Ng
Ngiam
Quadrianto
Rai
Rand
Russakovsky
Shaogang Gong
Sharma
Shi
Shotton
Simonyan
Sivic
Srivastava
Steinley
Truong
Vahdat
Vahdat
Vedaldi
Vinh
Vinh
Wang
Wei
Weinberger
Wu
Xiatian Zhu
Xu
Zhao
Zhao
Zheng
Zhou
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 30/05/2017
Field of study

This work was partially supported by the China Scholarship Council, Vision Semantics Limited, and Royal Society Newton Advanced Fellowship Programme (NA150459)

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

Structural advances for pattern discovery in multi-relational databases

Author: Kanodia Juveria
Publication venue: RIT Scholar Works
Publication date: 01/01/2005
Field of study

With ever-growing storage needs and drift towards very large relational storage settings, multi-relational data mining has become a prominent and pertinent field for discovering unique and interesting relational patterns. As a consequence, a whole suite of multi-relational data mining techniques is being developed. These techniques may either be extensions to the already existing single-table mining techniques or may be developed from scratch. For the traditionalists, single-table mining algorithms can be used to work on multi-relational settings by making inelegant and time consuming joins of all target relations. However, complex relational patterns cannot be expressed in a single-table format and thus, cannot be discovered. This work presents a new multi-relational frequent pattern mining algorithm termed Multi-Relational Frequent Pattern Growth (MRFP Growth). MRFP Growth is capable of mining multiple relations, linked with referential integrity, for frequent patterns that satisfy a user specified support threshold. Empirical results on MRFP Growth performance and its comparison with the state-of-the-art multirelational data mining algorithms like WARMR and Decentralized Apriori are discussed at length. MRFP Growth scores over the latter two techniques in number of patterns generated and speed. The realm of multi-relational clustering is also explored in this thesis. A multi-Relational Item Clustering approach based on Hypergraphs (RICH) is proposed. Experimentally RICH combined with MRFP Growth proves to be a competitive approach for clustering multi-relational data. The performance and iii quality of clusters generated by RICH are compared with other clustering algorithms. Finally, the thesis demonstrates the applied utility of the theoretical implications of the above mentioned algorithms in an application framework for auto-annotation of images in an image database. The system is called CoMMA which stands for Combining Multi-relational Multimedia for Associations

RIT Scholar Works

Very-High-Resolution SAR Images and Linked Open Data Analytics Based on Ontologies

Author: Bereta Konstantina
Datcu Mihai
Dumitru Corneliu
Espinoza-Molina Daniela
Koubarakis Manolis
Nikolaou Charalampos
Schwarz Gottfried
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In this paper, we deal with the integration of multiple sources of information such as Earth observation (EO) synthetic aperture radar (SAR) images and their metadata, semantic descriptors of the image content, as well as other publicly available geospatial data sources expressed as linked open data for posing complex queries in order to support geospatial data analytics. Our approach lays the foundations for the development of richer tools and applications that focus on EO image analytics using ontologies and linked open data. We introduce a system architecture where a common satellite image product is transformed from its initial format into to actionable intelligence information, which includes image descriptors, metadata, image tiles, and semantic labels resulting in an EO-data model. We also create a SAR image ontology based on our EO-data model and a two-level taxonomy classification scheme of the image content. We demonstrate our approach by linking high-resolution TerraSAR-X images with information from CORINE Land Cover (CLC), Urban Atlas (UA), GeoNames, and OpenStreetMap (OSM), which are represented in the standard triple model of the resource description frameworks (RDFs)

Institute of Transport Research:Publications

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

Author: Akagunduz Erdem
Ulku Irem
Publication venue
Publication date: 14/05/2020
Field of study

Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; excessive numbers of deep learning-based 2D semantic segmentation approaches have been proposed within the last decade. In this survey, we mainly focus on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images. We started with an analysis of the public image sets and leaderboards for 2D semantic segmantation, with an overview of the techniques employed in performance evaluation. In examining the evolution of the field, we chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era. We technically analysed the solutions put forward in terms of solving the fundamental problems of the field, such as fine-grained localisation and scale invariance. Before drawing our conclusions, we present a table of methods from all mentioned eras, with a brief summary of each approach that explains their contribution to the field. We conclude the survey by discussing the current challenges of the field and to what extent they have been solved.Comment: Updated with new studie

arXiv.org e-Print Archive

Directory of Open Access Journals

Generating semantically enriched diagnostics for radiological images using machine learning

Author: Gasimova Aydan
Publication venue: Computing, Imperial College London
Publication date: 01/01/2023
Field of study

Development of Computer Aided Diagnostic (CAD) tools to aid radiologists in pathology detection and decision making relies considerably on manually annotated images. With the advancement of deep learning techniques for CAD development, these expert annotations no longer need to be hand-crafted, however, deep learning algorithms require large amounts of data in order to generalise well. One way in which to access large volumes of expert-annotated data is through radiological exams consisting of images and reports. Using past radiological exams obtained from hospital archiving systems has many advantages: they are expert annotations available in large quantities, covering a population-representative variety of pathologies, and they provide additional context to pathology diagnoses, such as anatomical location and severity. Learning to auto-generate such reports from images presents many challenges such as the difficulty in representing and generating long, unstructured textual information, accounting for spelling errors and repetition or redundancy, and the inconsistency across different annotators. In this thesis, the problem of learning to automate disease detection from radiological exams is approached from three directions. Firstly, a report generation model is developed such that it is conditioned on radiological image features. Secondly, a number of approaches are explored aimed at extracting diagnostic information from free-text reports. Finally, an alternative approach to image latent space learning from current state-of-the-art is developed that can be applied to accelerated image acquisition.Open Acces

Spiral - Imperial College Digital Repository

Semantic multimedia modelling & interpretation for annotation

Author: Ullah I.
Ullah I.
Publication venue
Publication date: 01/01/2011
Field of study

The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems

Middlesex University Research Repository