Search CORE

4 research outputs found

Matching sets of features for efficient retrieval and recognition

Author: Grauman Kristen Lorraine, 1979-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 145-153).In numerous domains it is useful to represent a single example by the collection of local features or parts that comprise it. In computer vision in particular, local image features are a powerful way to describe images of objects and scenes. Their stability under variable image conditions is critical for success in a wide range of recognition and retrieval applications. However, many conventional similarity measures and machine learning algorithms assume vector inputs. Comparing and learning from images represented by sets of local features is therefore challenging, since each set may vary in cardinality and its elements lack a meaningful ordering. In this thesis I present computationally efficient techniques to handle comparisons, learning, and indexing with examples represented by sets of features. The primary goal of this research is to design and demonstrate algorithms that can effectively accommodate this useful representation in a way that scales with both the representation size as well as the number of images available for indexing or learning. I introduce the pyramid match algorithm, which efficiently forms an implicit partial matching between two sets of feature vectors.(cont.) The matching has a linear time complexity, naturally forms a Mercer kernel, and is robust to clutter or outlier features, a critical advantage for handling images with variable backgrounds, occlusions, and viewpoint changes. I provide bounds on the expected error relative to the optimal partial matching. For very large databases, even extremely efficient pairwise comparisons may not offer adequately responsive query times. I show how to perform sub-linear time retrievals under the matching measure with randomized hashing techniques, even when input sets have varying numbers of features. My results are focused on several important vision tasks, including applications to content-based image retrieval, discriminative classification for object recognition, kernel regression, and unsupervised learning of categories. I show how the dramatic increase in performance enables accurate and flexible image comparisons to be made on large-scale data sets, and removes the need to artificially limit the number of local descriptions used per image when learning visual categories.by Kristen Lorraine Grauman.Ph.D

DSpace@MIT

Fourth NASA Goddard Conference on Mass Storage Systems and Technologies

Author: Hariharan P. C.
Kobler Benjamin
Publication venue
Publication date
Field of study

This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994

NASA Technical Reports Server

The Third NASA Goddard Conference on Mass Storage Systems and Technologies

Author: Hariharan P. C.
Kobler Benjamin
Publication venue
Publication date
Field of study

This report contains copies of nearly all of the technical papers and viewgraphs presented at the Goddard Conference on Mass Storage Systems and Technologies held in October 1993. The conference served as an informational exchange forum for topics primarily relating to the ingestion and management of massive amounts of data and the attendant problems involved. Discussion topics include the necessary use of computers in the solution of today's infinitely complex problems, the need for greatly increased storage densities in both optical and magnetic recording media, currently popular storage media and magnetic media storage risk factors, data archiving standards including a talk on the current status of the IEEE Storage Systems Reference Model (RM). Additional topics addressed System performance, data storage system concepts, communications technologies, data distribution systems, data compression, and error detection and correction

NASA Technical Reports Server

Automating the conversion of natural language fiction to multi-modal 3D animated virtual environments

Author: Glass Kevin Robert
Publication venue: Faculty of Science, Computer Science
Publication date: 01/01/2009
Field of study

Popular fiction books describe rich visual environments that contain characters, objects, and behaviour. This research develops automated processes for converting text sourced from fiction books into animated virtual environments and multi-modal films. This involves the analysis of unrestricted natural language fiction to identify appropriate visual descriptions, and the interpretation of the identified descriptions for constructing animated 3D virtual environments. The goal of the text analysis stage is the creation of annotated fiction text, which identifies visual descriptions in a structured manner. A hierarchical rule-based learning system is created that induces patterns from example annotations provided by a human, and uses these for the creation of additional annotations. Patterns are expressed as tree structures that abstract the input text on different levels according to structural (token, sentence) and syntactic (parts-of-speech, syntactic function) categories. Patterns are generalized using pair-wise merging, where dissimilar sub-trees are replaced with wild-cards. The result is a small set of generalized patterns that are able to create correct annotations. A set of generalized patterns represents a model of an annotator's mental process regarding a particular annotation category. Annotated text is interpreted automatically for constructing detailed scene descriptions. This includes identifying which scenes to visualize, and identifying the contents and behaviour in each scene. Entity behaviour in a 3D virtual environment is formulated using time-based constraints that are automatically derived from annotations. Constraints are expressed as non-linear symbolic functions that restrict the trajectories of a pair of entities over a continuous interval of time. Solutions to these constraints specify precise behaviour. We create an innovative quantified constraint optimizer for locating sound solutions, which uses interval arithmetic for treating time and space as contiguous quantities. This optimization method uses a technique of constraint relaxation and tightening that allows solution approximations to be located where constraint systems are inconsistent (an ability not previously explored in interval-based quantified constraint solving). 3D virtual environments are populated by automatically selecting geometric models or procedural geometry-creation methods from a library. 3D models are animated according to trajectories derived from constraint solutions. The final animated film is sequenced using a range of modalities including animated 3D graphics, textual subtitles, audio narrations, and foleys. Hierarchical rule-based learning is evaluated over a range of annotation categories. Models are induced for different categories of annotation without modifying the core learning algorithms, and these models are shown to be applicable to different types of books. Models are induced automatically with accuracies ranging between 51.4% and 90.4%, depending on the category. We show that models are refined if further examples are provided, and this supports a boot-strapping process for training the learning mechanism. The task of interpreting annotated fiction text and populating 3D virtual environments is successfully automated using our described techniques. Detailed scene descriptions are created accurately, where between 83% and 96% of the automatically generated descriptions require no manual modification (depending on the type of description). The interval-based quantified constraint optimizer fully automates the behaviour specification process. Sample animated multi-modal 3D films are created using extracts from fiction books that are unrestricted in terms of complexity or subject matter (unlike existing text-to-graphics systems). These examples demonstrate that: behaviour is visualized that corresponds to the descriptions in the original text; appropriate geometry is selected (or created) for visualizing entities in each scene; sequences of scenes are created for a film-like presentation of the story; and that multiple modalities are combined to create a coherent multi-modal representation of the fiction text. This research demonstrates that visual descriptions in fiction text can be automatically identified, and that these descriptions can be converted into corresponding animated virtual environments. Unlike existing text-to-graphics systems, we describe techniques that function over unrestricted natural language text and perform the conversion process without the need for manually constructed repositories of world knowledge. This enables the rapid production of animated 3D virtual environments, allowing the human designer to focus on creative aspects

SEALS Digital commons

South East Academic Libraries System (SEALS)