47,686 research outputs found
Ridgelet-based signature for natural image classification
This paper presents an approach to grouping natural scenes into (semantically) meaningful categories. The proposed approach exploits the statistics of natural scenes to define
relevant image categories. A ridgelet-based signature is used to represent images. This signature is used by a support vector classifier that is well designed to support high dimensional features, resulting in an effective recognition system. As an illustration of the potential of the approach several experiments of binary classifications (e.g. city/landscape or indoor/outdoor) are conducted on databases of natural scenes
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances and Million-AID
The past years have witnessed great progress on remote sensing (RS) image
interpretation and its wide applications. With RS images becoming more
accessible than ever before, there is an increasing demand for the automatic
interpretation of these images. In this context, the benchmark datasets serve
as essential prerequisites for developing and testing intelligent
interpretation algorithms. After reviewing existing benchmark datasets in the
research community of RS image interpretation, this article discusses the
problem of how to efficiently prepare a suitable benchmark dataset for RS image
interpretation. Specifically, we first analyze the current challenges of
developing intelligent algorithms for RS image interpretation with bibliometric
investigations. We then present the general guidances on creating benchmark
datasets in efficient manners. Following the presented guidances, we also
provide an example on building RS image dataset, i.e., Million-AID, a new
large-scale benchmark dataset containing a million instances for RS image scene
classification. Several challenges and perspectives in RS image annotation are
finally discussed to facilitate the research in benchmark dataset construction.
We do hope this paper will provide the RS community an overall perspective on
constructing large-scale and practical image datasets for further research,
especially data-driven ones
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Training of Crisis Mappers and Map Production from Multi-sensor Data: Vernazza Case Study (Cinque Terre National Park, Italy)
This aim of paper is to presents the development of a multidisciplinary project carried out by the cooperation between Politecnico di Torino and ITHACA (Information Technology for Humanitarian Assistance, Cooperation and Action). The goal of the project was the training in geospatial data acquiring and processing for students attending Architecture and Engineering Courses, in order to start up a team of "volunteer mappers". Indeed, the project is aimed to document the environmental and built heritage subject to disaster; the purpose is to improve the capabilities of the actors involved in the activities connected in geospatial data collection, integration and sharing. The proposed area for testing the training activities is the Cinque Terre National Park, registered in the World Heritage List since 1997. The area was affected by flood on the 25th of October 2011. According to other international experiences, the group is expected to be active after emergencies in order to upgrade maps, using data acquired by typical geomatic methods and techniques such as terrestrial and aerial Lidar, close-range and aerial photogrammetry, topographic and GNSS instruments etc.; or by non conventional systems and instruments such us UAV, mobile mapping etc. The ultimate goal is to implement a WebGIS platform to share all the data collected with local authorities and the Civil Protectio
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
- …