604 research outputs found
MusA: Using Indoor Positioning and Navigation to Enhance Cultural Experiences in a museum
In recent years there has been a growing interest into the use of multimedia mobile guides in museum environments. Mobile devices have the capabilities to detect the user context and to provide pieces of information suitable to help visitors discovering and following the logical and emotional connections that develop during the visit. In this scenario, location based services (LBS) currently represent an asset, and the choice of the technology to determine users' position, combined with the definition of methods that can effectively convey information, become key issues in the design process. In this work, we present MusA (Museum Assistant), a general framework for the development of multimedia interactive guides for mobile devices. Its main feature is a vision-based indoor positioning system that allows the provision of several LBS, from way-finding to the contextualized communication of cultural contents, aimed at providing a meaningful exploration of exhibits according to visitors' personal interest and curiosity. Starting from the thorough description of the system architecture, the article presents the implementation of two mobile guides, developed to respectively address adults and children, and discusses the evaluation of the user experience and the visitors' appreciation of these application
Computer Vision algorithms performance in architectural heritage multi-image based projects. General overview and operative evaluation: the North Tower of Buñol's Castle (Spain)
[EN] Multi-image based modeling has proven to be effective providing solutions for surveying and documenting cultural heritage, and in particular architectural heritage. In addition to the issues related with instruments and captation strategy, the operativity of these projects is supported by three bases: Computer Vision (C.V.) algorithms, analytical close-range photogrammetry, and theory of errors. In this work we propose an approach that examines the importance of the first, from two points of view. On one hand, we present a brief overview of its intervention in the different processing stages, both in photomodeling as in photograms stitching projects, thus reviewing the fundaments regarding the two classic branches of architectural photogrammetry. On the other, we present a review of the operational strategy with these algorithms, through a case study that evaluates the results of two software applications, advancing some methodological improvements.Cabanes Ginés, JL.; Bonafé, C. (2021). Computer Vision algorithms performance in architectural heritage multi-image based projects. General overview and operative evaluation: the North Tower of Buñol's Castle (Spain). SCIRES-IT. 11(2):125-138. https://doi.org/10.2423/i22394303v11n2p12512513811
Discriminative learning of local image descriptors
In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both linear and nonlinear transforms with dimensionality reduction, and make use of discriminant learning techniques such as Linear Discriminant Analysis (LDA) and Powell minimization to solve for the parameters. Using these techniques, we obtain descriptors that exceed state-of-the-art performance with low dimensionality. In addition to new experiments and recommendations for descriptor learning, we are also making available a new and realistic ground truth data set based on multiview stereo data
Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data
Retinal image matching plays a crucial role in monitoring disease progression
and treatment response. However, datasets with matched keypoints between
temporally separated pairs of images are not available in abundance to train
transformer-based model. We propose a novel approach based on reverse knowledge
distillation to train large models with limited data while preventing
overfitting. Firstly, we propose architectural modifications to a CNN-based
semi-supervised method called SuperRetina that help us improve its results on a
publicly available dataset. Then, we train a computationally heavier model
based on a vision transformer encoder using the lighter CNN-based model, which
is counter-intuitive in the field knowledge-distillation research where
training lighter models based on heavier ones is the norm. Surprisingly, such
reverse knowledge distillation improves generalization even further. Our
experiments suggest that high-dimensional fitting in representation space may
prevent overfitting unlike training directly to match the final output. We also
provide a public dataset with annotations for retinal image keypoint detection
and matching to help the research community develop algorithms for retinal
image applications
Low–High Orthoimage Pairs-Based 3D Reconstruction for Elevation Determination Using Drone
This paper presents a 3D reconstruction method for fast elevation determination on construction sites. The proposed method is intended to automatically and accurately determine construction site elevations using drone-based, low–high orthoimage pairs. This method requires fewer images than other methods for covering a large target area of a construction site. An up–forward–down path was designed to capture approximately -scale images at different altitudes over target stations. A pixel grid matching and elevation determination algorithm was developed to automatically match images in dense pixel grid-style via self-adaptive patch feature descriptors, and simultaneously determine elevations based on a virtual elevation model. The 3D reconstruction results were an elevation map and an orthoimage at each station. Then, the large-scale results of the entire site were easily stitched from adjacent results with narrow overlaps. Moreover, results alignment was automatically performed via the U-net detected ground control point. Experiments validated that in 10–20 and 20–40 orthoimage pairs, 92% of 2,500- and 4,761-pixels were matched in the strongest and strong levels, which was better than sparse reconstructions via structure from motion; moreover, the elevation measurements were as accurate as photogrammetry using multiscale overlapping images
The Exploitation of Data from Remote and Human Sensors for Environment Monitoring in the SMAT Project
In this paper, we outline the functionalities of a system that integrates and
controls a fleet of Unmanned Aircraft Vehicles (UAVs). UAVs have a set of payload sensors
employed for territorial surveillance, whose outputs are stored in the system and analysed
by the data exploitation functions at different levels. In particular, we detail the second
level data exploitation function whose aim is to improve the sensors data interpretation in
the post-mission activities. It is concerned with the mosaicking of the aerial images and
the cartography enrichment by human sensors—the social media users. We also describe
the software architecture for the development of a mash-up (the integration of information
and functionalities coming from the Web) and the possibility of using human sensors in the
monitoring of the territory, a field in which, traditionally, the involved sensors were only the
hardware ones.JRC.H.6-Digital Earth and Reference Dat
Painting-to-3D Model Alignment Via Discriminative Visual Elements
International audienceThis paper describes a technique that can reliably align arbitrary 2D depictions of an architectural site, including drawings, paintings and historical photographs, with a 3D model of the site. This is a tremendously difficult task as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the painting to a large 3D model, such as a partial reconstruction of a city, is huge. To address these issues, we develop a new compact representation of complex 3D scenes. The 3D model of the scene is represented by a small set of discriminative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g. watercolor, sketch, historical photograph) and structural changes (e.g. missing scene parts, large occluders) of the scene. We demonstrate an application of the proposed approach to automatic re-photography to find an approximate viewpoint of historical paintings and photographs with respect to a 3D model of the site. The proposed alignment procedure is validated via a human user study on a new database of paintings and sketches spanning several sites. The results demonstrate that our algorithm produces significantly better alignments than several baseline methods
- …