77,413 research outputs found
Image Semantics in the Description and Categorization of Journalistic Photographs
This paper reports a study on the description and categorization of images. The aim of the study was to evaluate existing indexing frameworks in the context of reportage photographs and to find out how the use of this particular image genre influences the results. The effect of different tasks on image description and categorization was also studied. Subjects performed keywording and free description tasks and the elicited terms were classified using the most extensive one of the reviewed frameworks. Differences were found in the terms used in constrained and unconstrained descriptions. Summarizing terms such as abstract concepts, themes, settings and emotions were
used more frequently in keywording than in free description. Free descriptions included more terms referring to locations within the images, people and descriptive terms due to the narrative form the subjects used without prompting. The evaluated framework was found to lack some syntactic and semantic classes present in the data and modifications were suggested. According to the results of this study image categorization is based on high-level interpretive concepts,
including affective and abstract themes. The results indicate that image genre influences categorization and keywording modifies and truncates natural image description
Training methods for facial image comparison: a literature review
This literature review was commissioned to explore the psychological literature relating to facial image comparison with a particular emphasis on whether individuals can be trained to improve performance on this task. Surprisingly few studies have addressed this question directly. As a consequence, this review has been extended to cover training of face recognition and training of different kinds of perceptual comparisons where we are of the opinion that the methodologies or findings of such studies are informative. The majority of studies of face processing have examined face recognition, which relies heavily on memory. This may be memory for a face that was learned recently (e.g. minutes or hours previously) or for a face learned longer ago, perhaps after many exposures (e.g. friends, family members, celebrities). Successful face recognition, irrespective of the type of face, relies on the ability to retrieve the to-berecognised face from long-term memory. This memory is then compared to the physically present image to reach a recognition decision. In contrast, in face matching task two physical representations of a face (live, photographs, movies) are compared and so long-term memory is not involved. Because the comparison is between two present stimuli rather than between a present stimulus and a memory, one might expect that face matching, even if not an easy task, would be easier to do and easier to learn than face recognition. In support of this, there is evidence that judgment tasks where a presented stimulus must be judged by a remembered standard are generally more cognitively demanding than judgments that require comparing two presented stimuli Davies & Parasuraman, 1982; Parasuraman & Davies, 1977; Warm and Dember, 1998). Is there enough overlap between face recognition and matching that it is useful to look at the literature recognition? No study has directly compared face recognition and face matching, so we turn to research in which people decided whether two non-face stimuli were the same or different. In these studies, accuracy of comparison is not always better when the comparator is present than when it is remembered. Further, all perceptual factors that were found to affect comparisons of simultaneously presented objects also affected comparisons of successively presented objects in qualitatively the same way. Those studies involved judgments about colour (Newhall, Burnham & Clark, 1957; Romero, Hita & Del Barco, 1986), and shape (Larsen, McIlhagga & Bundesen, 1999; Lawson, Bülthoff & Dumbell, 2003; Quinlan, 1995). Although one must be cautious in generalising from studies of object processing to studies of face processing (see, e.g., section comparing face processing to object processing), from these kinds of studies there is no evidence to suggest that there are qualitative differences in the perceptual aspects of how recognition and matching are done. As a result, this review will include studies of face recognition skill as well as face matching skill. The distinction between face recognition involving memory and face matching not involving memory is clouded in many recognition studies which require observers to decide which of many presented faces matches a remembered face (e.g., eyewitness studies). And of course there are other forensic face-matching tasks that will require comparison to both presented and remembered comparators (e.g., deciding whether any person in a video showing a crowd is the target person). For this reason, too, we choose to include studies of face recognition as well as face matching in our revie
The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping
Many tasks performed by autonomous vehicles such as road marking detection,
object tracking, and path planning are simpler in bird's-eye view. Hence,
Inverse Perspective Mapping (IPM) is often applied to remove the perspective
effect from a vehicle's front-facing camera and to remap its images into a 2D
domain, resulting in a top-down view. Unfortunately, however, this leads to
unnatural blurring and stretching of objects at further distance, due to the
resolution of the camera, limiting applicability. In this paper, we present an
adversarial learning approach for generating a significantly improved IPM from
a single camera image in real time. The generated bird's-eye-view images
contain sharper features (e.g. road markings) and a more homogeneous
illumination, while (dynamic) objects are automatically removed from the scene,
thus revealing the underlying road layout in an improved fashion. We
demonstrate our framework using real-world data from the Oxford RobotCar
Dataset and show that scene understanding tasks directly benefit from our
boosted IPM approach.Comment: equal contribution of first two authors, 8 full pages, 6 figures,
accepted at IV 201
Getting the message across : ten principles for web animation
The growing use of animation in Web pages testifies to the increasing ease with which such multimedia components can be created. This trend indicates a commitment to animation that is often unmatched by the skill of the implementers. The present paper details a set of ten commandments for web animation, intending to sensitise budding animators to key aspects that may impair the communicational effectiveness of their animation. These guidelines are drawn from an extensive literature survey coloured by personal experience of using Web animation packages. Our ten principles are further elucidated by a Web-based on-line tutorial
On the Design and Analysis of Multiple View Descriptors
We propose an extension of popular descriptors based on gradient orientation
histograms (HOG, computed in a single image) to multiple views. It hinges on
interpreting HOG as a conditional density in the space of sampled images, where
the effects of nuisance factors such as viewpoint and illumination are
marginalized. However, such marginalization is performed with respect to a very
coarse approximation of the underlying distribution. Our extension leverages on
the fact that multiple views of the same scene allow separating intrinsic from
nuisance variability, and thus afford better marginalization of the latter. The
result is a descriptor that has the same complexity of single-view HOG, and can
be compared in the same manner, but exploits multiple views to better trade off
insensitivity to nuisance variability with specificity to intrinsic
variability. We also introduce a novel multi-view wide-baseline matching
dataset, consisting of a mixture of real and synthetic objects with ground
truthed camera motion and dense three-dimensional geometry
Human-display interactions: Context-specific biases
Recent developments in computer engineering have greatly enhanced the capabilities of display technology. As displays are no longer limited to simple alphanumeric output, they can present a wide variety of graphic information, using either static or dynamic presentation modes. At the same time that interface designers exploit the increased capabilities of these displays, they must be aware of the inherent limitation of these displays. Generally, these limitations can be divided into those that reflect limitations of the medium (e.g., reducing three-dimensional representations onto a two-dimensional projection) and those reflecting the perceptual and conceptual biases of the operator. The advantages and limitations of static and dynamic graphic displays are considered. Rather than enter into the discussion of whether dynamic or static displays are superior, general advantages and limitations are explored which are contextually specific to each type of display
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …