Search CORE

11,384 research outputs found

ImageSpirit: Verbal Guided Image Parsing

Author: Cheng Ming-Ming
Crook Nigel
Lin Wen-Yan
Mitra Niloy
Sturgess Paul
Torr Philip
Vineet Vibhav
Warrell Jonathan
Zheng Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

arXiv.org e-Print Archive

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

UCL Discovery

Oxford University Research Archive

Oxford Brookes University: RADAR

Computer-aided boundary delineation of agricultural lands

Author: Angelici Gary L.
Cheng Thomas D.
Ma Matt
Slye Robert E.
Publication venue
Publication date
Field of study

The National Agricultural Statistics Service of the United States Department of Agriculture (USDA) presently uses labor-intensive aerial photographic interpretation techniques to divide large geographical areas into manageable-sized units for estimating domestic crop and livestock production. Prototype software, the computer-aided stratification (CAS) system, was developed to automate the procedure, and currently runs on a Sun-based image processing system. With a background display of LANDSAT Thematic Mapper and United States Geological Survey Digital Line Graph data, the operator uses a cursor to delineate agricultural areas, called sampling units, which are assigned to strata of land-use and land-cover types. The resultant stratified sampling units are used as input into subsequent USDA sampling procedures. As a test, three counties in Missouri were chosen for application of the CAS procedures. Subsequent analysis indicates that CAS was five times faster in creating sampling units than the manual techniques were

NASA Technical Reports Server

A study of the very high order natural user language (with AI capabilities) for the NASA space station common module

Author: Gill E. N.
Publication venue
Publication date
Field of study

The requirements are identified for a very high order natural language to be used by crew members on board the Space Station. The hardware facilities, databases, realtime processes, and software support are discussed. The operations and capabilities that will be required in both normal (routine) and abnormal (nonroutine) situations are evaluated. A structure and syntax for an interface (front-end) language to satisfy the above requirements are recommended

NASA Technical Reports Server

Object Referring in Visual Scene with Spoken Language

Author: Dai Dengxin
Van Gool Luc
Vasudevan Arun Balajee
Publication venue
Publication date: 05/12/2017
Field of study

Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated with their locations in images, text descriptions and speech descriptions. This makes the datasets ideal for multi-modality learning. The approach is developed by carefully taking down ORSpoken problem into three sub-problems and introducing task-specific vision-language interactions at the corresponding levels. Experiments show that our method outperforms competing methods consistently and significantly. The approach is also evaluated in the presence of audio noise, showing the efficacy of the proposed vision-language interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201

arXiv.org e-Print Archive

Crossref

Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition

Author: Ocampo Sepúlveda Jorge Carlos
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 11/01/2012
Field of study

A human-computer interface is developed to provide services of computer assisted machine translation (CAT) and computer assisted transcription of handwritten text images (CATTI). The back-end machine translation (MT) and handwritten text recognition (HTR) systems are provided by the Pattern Recognition and Human Language Technology (PRHLT) research group. The idea is to provide users with easy to use tools to convert interactive translation and transcription feasible tasks. The assisted service is provided by remote servers with CAT or CATTI capabilities. The interface supplies the user with tools for efficient local edition: deletion, insertion and substitution.Ocampo Sepúlveda, JC. (2009). Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition. http://hdl.handle.net/10251/14318Archivo delegad

RiuNet

Semi-Automated SVG Programming via Direct Manipulation

Author: Cypher A.
Kurlander D.
Lieberman H.
Lieberman H.
McDirmid
Pierra G.
Schuster C.
Silva P. B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/08/2016
Field of study

Direct manipulation interfaces provide intuitive and interactive features to a broad range of users, but they often exhibit two limitations: the built-in features cannot possibly cover all use cases, and the internal representation of the content is not readily exposed. We believe that if direct manipulation interfaces were to (a) use general-purpose programs as the representation format, and (b) expose those programs to the user, then experts could customize these systems in powerful new ways and non-experts could enjoy some of the benefits of programmable systems. In recent work, we presented a prototype SVG editor called Sketch-n-Sketch that offered a step towards this vision. In that system, the user wrote a program in a general-purpose lambda-calculus to generate a graphic design and could then directly manipulate the output to indirectly change design parameters (i.e. constant literals) in the program in real-time during the manipulation. Unfortunately, the burden of programming the desired relationships rested entirely on the user. In this paper, we design and implement new features for Sketch-n-Sketch that assist in the programming process itself. Like typical direct manipulation systems, our extended Sketch-n-Sketch now provides GUI-based tools for drawing shapes, relating shapes to each other, and grouping shapes together. Unlike typical systems, however, each tool carries out the user's intention by transforming their general-purpose program. This novel, semi-automated programming workflow allows the user to rapidly create high-level, reusable abstractions in the program while at the same time retaining direct manipulation capabilities. In future work, our approach may be extended with more graphic design features or realized for other application domains.Comment: In 29th ACM User Interface Software and Technology Symposium (UIST 2016

arXiv.org e-Print Archive

Crossref