1,262 research outputs found
Representation Learning by Learning to Count
We introduce a novel method for representation learning that uses an
artificial supervision signal based on counting visual primitives. This
supervision signal is obtained from an equivariance relation, which does not
require any manual annotation. We relate transformations of images to
transformations of the representations. More specifically, we look for the
representation that satisfies such relation rather than the transformations
that match a given representation. In this paper, we use two image
transformations in the context of counting: scaling and tiling. The first
transformation exploits the fact that the number of visual primitives should be
invariant to scale. The second transformation allows us to equate the total
number of visual primitives in each tile to that in the whole image. These two
transformations are combined in one constraint and used to train a neural
network with a contrastive loss. The proposed task produces representations
that perform on par or exceed the state of the art in transfer learning
benchmarks.Comment: ICCV 2017(oral
Automatic camera selection for activity monitoring in a multi-camera system for tennis
In professional tennis training matches, the coach needs to be able to view play from the most appropriate angle in order to monitor players' activities. In this paper, we describe and evaluate a system for automatic camera selection from a network of synchronised cameras within a tennis sporting arena. This work combines synchronised video streams from multiple cameras into a single summary video suitable for critical review by both tennis players and coaches. Using an overhead camera view, our system automatically determines the 2D tennis-court calibration resulting in a mapping that relates a player's position in the overhead camera to their position and size in another camera view in the network. This allows the system to determine the appearance of a player in each of the other cameras and thereby choose the best view for each player via a novel technique. The video summaries are evaluated in end-user studies and shown to provide an efficient means of multi-stream visualisation for tennis player activity monitoring
Topic segmentation of TV-streams by watershed transform and vectorization
International audienceA fine-grained segmentation of Radio or TV broadcasts is an essential step for most multimedia processing tasks. Applying segmentation algorithms to the speech transcripts seems straightforward. Yet, most of these algorithms are not suited when dealing with short segments or noisy data. In this paper, we present a new segmentation technique inspired from the image analysis field and relying on a new way to compute similarities between candidate segments called Vectorization. Vectorization makes it possible to match text segments that do not share common words; this property is shown to be particularly useful when dealing with transcripts in which transcription errors and short segments makes the segmentation difficult. This new topic segmen-tation technique is evaluated on two corpora of transcripts from French TV broadcasts on which it largely outperforms other existing approaches from the state-of-the-art
Multimodal segmentation of lifelog data
A personal lifelog of visual and audio information can be very helpful as a human memory augmentation tool. The SenseCam, a passive wearable camera, used in conjunction with an iRiver MP3 audio recorder, will capture over 20,000 images and 100 hours of audio per week. If used constantly, very soon this would build up to a substantial collection of personal data. To gain real value from this collection it is important to automatically segment the data into meaningful units or activities. This paper investigates the optimal combination of data sources to segment personal data into such activities. 5 data sources were logged and processed to segment a collection of personal data, namely: image processing on captured SenseCam images; audio processing on captured iRiver audio data; and processing of the temperature, white light level, and accelerometer sensors onboard the SenseCam device. The results indicate that a combination of the image, light and accelerometer sensor data segments our collection of personal data better than a combination of all 5 data sources. The accelerometer sensor is good for detecting when the user moves to a new location, while the image and light sensors are good for detecting changes in wearer activity within the same location, as well as detecting when the wearer socially interacts with others
Evaluation of pointer click relevance feedback in PicSOM : deliverable D1.2 of FP7 project nÂș 216529 PinView
This report presents the results of a series of experiments where knowledge of the most relevant part of images is given as additional information to a content-based image retrieval system. The most relevant parts have been identified by search-task-dependent pointer clicks on the images. As such they provide a rudimentary form of explicit enriched relevance feedback and to some extent mimic genuine implicit eye movement measurements which are essential ingredients of the PinView project
MusA: Using Indoor Positioning and Navigation to Enhance Cultural Experiences in a museum
In recent years there has been a growing interest into the use of multimedia mobile guides in museum environments. Mobile devices have the capabilities to detect the user context and to provide pieces of information suitable to help visitors discovering and following the logical and emotional connections that develop during the visit. In this scenario, location based services (LBS) currently represent an asset, and the choice of the technology to determine users' position, combined with the definition of methods that can effectively convey information, become key issues in the design process. In this work, we present MusA (Museum Assistant), a general framework for the development of multimedia interactive guides for mobile devices. Its main feature is a vision-based indoor positioning system that allows the provision of several LBS, from way-finding to the contextualized communication of cultural contents, aimed at providing a meaningful exploration of exhibits according to visitors' personal interest and curiosity. Starting from the thorough description of the system architecture, the article presents the implementation of two mobile guides, developed to respectively address adults and children, and discusses the evaluation of the user experience and the visitors' appreciation of these application
- âŠ