974 research outputs found
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Recommended from our members
Multimodal Indexing of Presentation Videos
This thesis presents four novel methods to help users efficiently and effectively retrieve information from unstructured and unsourced multimedia sources, in particular the increasing amount and variety of presentation videos such as those in e-learning, conference recordings, corporate talks, and student presentations. We demonstrate a system to summarize, index and cross-reference such videos, and measure the quality of the produced indexes as perceived by the end users. We introduce four major semantic indexing cues: text, speaker faces, graphics, and mosaics, going beyond standard tag based searches and simple video playbacks. This work aims at recognizing visual content "in the wild", where the system cannot rely on any additional information besides the video itself. For text, within a scene text detection and recognition framework, we present a novel locally optimal adaptive binarization algorithm, implemented with integral histograms. It determines of an optimal threshold that maximizes the between-classes variance within a subwindow, with computational complexity independent from the size of the window itself. We obtain character recognition rates of 74%, as validated against ground truth of 8 presentation videos spanning over 1 hour and 45 minutes, which almost doubles the baseline performance of an open source OCR engine. For speaker faces, we detect, track, match, and finally select a humanly preferred face icon per speaker, based on three quality measures: resolution, amount of skin, and pose. We register a 87% accordance (51 out of 58 speakers) between the face indexes automatically generated from three unstructured presentation videos of approximately 45 minutes each, and human preferences recorded through Mechanical Turk experiments. For diagrams, we locate graphics inside frames showing a projected slide, cluster them according to an on-line algorithm based on a combination of visual and temporal information, and select and color-correct their representatives to match human preferences recorded through Mechanical Turk experiments. We register 71% accuracy (57 out of 81 unique diagrams properly identified, selected and color-corrected) on three hours of videos containing five different presentations. For mosaics, we combine two existing suturing measures, to extend video images into in-the-world coordinate system. A set of frames to be registered into a mosaic are sampled according to the PTZ camera movement, which is computed through least square estimation starting from the luminance constancy assumption. A local features based stitching algorithm is then applied to estimate the homography among a set of video frames and median blending is used to render pixels in overlapping regions of the mosaic. For two of these indexes, namely faces and diagrams, we present two novel MTurk-derived user data collections to determine viewer preferences, and show that they are matched in selection by our methods. The net result work of this thesis allows users to search, inside a video collection as well as within a single video clip, for a segment of presentation by professor X on topic Y, containing graph Z
Visual Perception for Manipulation and Imitation in Humanoid Robots
This thesis deals with visual perception for manipulation and imitation in humanoid robots. In particular, real-time applicable methods for object recognition and pose estimation as well as for markerless human motion capture have been developed. As only sensor a small baseline stereo camera system (approx. human eye distance) was used. An extensive experimental evaluation has been performed on simulated as well as real image data from real-world scenarios using the humanoid robot ARMAR-III
Computer vision and optimization methods applied to the measurements of in-plane deformations
fi=vertaisarvioitu|en=peerReviewed
Modelling Colour Appearance: Applications in Skin Image Perception
Humans are trichromatic, and yet their perception of colours is rich and complex. The research presented in this thesis explores the process of colour appearance of uniform patches and natural polychromatic stimuli. This is done through the measurement and analysis of the achromatic locus (Chapter 2), modelling of chromatic adaptation in a large dataset of unique hues settings (Chapter 3), and measurement of thresholds for uniform and polychromatic stimuli derived from simulated skin images (Chapter 4). Chapter 2 proposes a novel navigation scheme based on unique hues for traversing colour space. The results show that when colour adjustments are made using this novel scheme, the variability of achromatic settings made by observers is reduced compared to the classical method of making colour adjustments along the cardinal axes of the CIELUV colour space. This result holds across the tested luminance levels (5,20,50 cd/m^2) in each of the three tested ambient illumination conditions β dark, simulated daylight and cool white fluorescent lighting. The analysis also shows that the direction of maximum variance of the achromatic settings lies along the daylight locus. Chapter 3 evaluates models of chromatic adaptation by using unique hues settings measured under different ambient illumination conditions. It is shown that a simple diagonal model in cone excitation space is the most efficient in terms of the trade-off between accuracy and degrees of freedom. It is also found that diagonal and linear models show similar performances, reiterating their theoretical equivalence. Performances of these diagonalisable models are found to be worse for UR and UG unique hue planes compared to UY and UB planes. Chapter 4 presents a set of three experiments reporting estimations of perceptual thresholds for polychromatic and uniform stimuli in a 3-D chromaticity-luminance colour space. The first experiment reports thresholds for simulated skin images and uniform stimuli of the corresponding mean CIELAB colour. The second and third experiments investigate the effect of ambient illumination and the location of the stimuli in colour space. The thresholds for the polychromatic stimuli are found to be consistently higher than those for the uniform patches, for both the chromatic, and the luminance projections. The area of the chromaticity ellipses shows a gradual increase with distance from the illuminant chromaticity. The orientations of these ellipses for simulated skin are found to align with the vector joining the mean patch chromaticity and the illuminant chromaticity
An investigation into the requirements for an efficient image transmission system over an ATM network
This thesis looks into the problems arising in an image transmission system when
transmitting over an A TM network. Two main areas were investigated: (i) an
alternative coding technique to reduce the bit rate required; and (ii) concealment of
errors due to cell loss, with emphasis on processing in the transform domain of
DCT-based images. [Continues.
- β¦