Search CORE

50 research outputs found

Recommended from our members

Extracting Visual information From Text: using Captions to Label Human Faces in Newspaper Photographs

Author: Rapaport William J.
Srihari Rohini K.
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

There are many situations where linguistic and pictorial data are jointly presented to communicate information. A computer model for synthesising information from the two sources requires an initial interpretation of both the text and the picture followed by consolidation of information. The problem of performing general-purpose vision(without apriori knowledge) would make this a nearly impossible task. However, in some situations, the text describes salient aspects of the picture. In such situations, it is possible to extract visual information from the text, resulting in a relational graph describing the structure of the accompanying picture. This graph can then be used by a computer vision system to guide the interpretation of the picture. This paper discusses an application whereby information obtained from parsing a caption of a newspaper photograph is used to identify human faces in the photograph. Heuristics are described for extracting information from the caption which contributes to the hypothesised structure of the picture. The top-down processing of the image using this information is discussed

eScholarship - University of California

Using Speech Input for Image Interpretation, Annotation, and Retrieval

Author: Srihari Rohini K.
Publication venue: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Publication date: 01/01/1997
Field of study

"This research explores the interaction of textual and photographic information in an integrated text/image database environment. Specifically, three different applications involving the exploitation of linguistic con-text in vision are presented. Linguistic context is qualitative in nature and is obtained dynamically. By understanding text accompanying images or video, we are able to extract information useful in retrieving the picture and directing an image interpretation system to identify relevant objects (e.g., faces) in the picture. The latter constitutes a powerful technique for automatically indexing images. A multistage system, PICTION, which uses captions to identify human faces in an accompanying photograph, has been developed. We discuss the use of PICTION's output in content-based retrieval of images to satisfy focus of attention in queries. The design and implementation of a system called Show&Tell???a multimedia system for semi-automated image annotation???is discussed. This system, which combines advances in speech recognition, natural language processing (NLP), and image understanding (IU), is designed to assist in image annotation and to enhance image retrieval capabilities. An extension of this work to video annotation and retrieval is also presented."published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Exploiting Multimodal Context in Image Retrieval

Author: Srihari Rohini K.
Zhang Zhongfei
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1999
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Use of lexical and syntactic techniques in recognizing handwritten text

Author: Rohini K. Srihari
Publication venue: Morgan Kaufmann Publishers, Inc
Publication date: 01/01/1994
Field of study

The output of handwritten word recognizers (Wit) tends to be very noisy due to various factors. In order to compensate for this behaviour, several choices of the WR must be ini-tially considered. In the case of handwritten sentence/phrase recognition, linguistic constraints may be applied in order to improve the results of the Wit. This paper discusses two statistical methods of applying linguistic constraints to the output of an Wit on input consisting of sentences/phrases. The first is based on collocations and can be used to prOmote lower ranked word choices or to propose new words. The second is a Markov model of syntax and is based on syn-tactic categories (tags) associated with words. In each case, we show the improvement in the word recognition rate as a result of applying these constraints. 1

CiteSeerX

Crossref

Use of Language Models in Handwriting Recognition

Author: Rohini K. Srihari
Rohini Srihar
Sargur N. Srihari
Sargur Srihari
Shravya Shetty
Shravya Shetty
Publication venue
Publication date
Field of study

Language models have been extensively used in natural language applications such as speech recognition, part-of-speech tagging, information extraction, etc. To a lesser extent the value of language models in text recognition has also been proved, e.g., recognition of poor quality printed text and the recognition of extended handwriting. This survey describes how linguistic context, particularly probabilistic language models, are used in the recognition of handwritten text. The survey begins with two handwriting recogniton techniques, segmentation-free and segmentation-based, are integrated with language models in the recognition process. Next, language models at the word level in the post processing step to improve the recognition results and at the character level for handwriting recognition 1 and correction of recognition results are described. Finally, syntax based techniques like lexical analysis using collocations, syntactic (n-gram) analysis using part-of-speech (POS) tags, and a hybrid syntactic technique comprised of both a statistical and an analytical component are described. Language modeling has been found to be very helpful for all natural language applications. They have been seen to improve the performance of these application by 25-50 % when the text used in training is representative of that for which the model is intended. 2 I

CiteSeerX