24,748 research outputs found
Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy
In this paper we shall consider the problem of deploying attention to subsets
of the video streams for collating the most relevant data and information of
interest related to a given task. We formalize this monitoring problem as a
foraging problem. We propose a probabilistic framework to model observer's
attentive behavior as the behavior of a forager. The forager, moment to moment,
focuses its attention on the most informative stream/camera, detects
interesting objects or activities, or switches to a more profitable stream. The
approach proposed here is suitable to be exploited for multi-stream video
summarization. Meanwhile, it can serve as a preliminary step for more
sophisticated video surveillance, e.g. activity and behavior analysis.
Experimental results achieved on the UCR Videoweb Activities Dataset, a
publicly available dataset, are presented to illustrate the utility of the
proposed technique.Comment: Accepted to IEEE Transactions on Image Processin
Functional Skills Support Programme: Developing functional skills in modern foreign languages
This booklet is part of "... a series of 11 booklets which helps schools to implement functional skills across the curriculum. The booklets illustrate how functional skills can be applied and developed in different subjects and contexts, supporting achievement at Key Stage 3 and Key Stage 4.
Each booklet contains an introduction to functional skills for subject teachers, three practical planning examples with links to related websites and resources, a process for planning and a list of additional resources to support the teaching and learning of functional skills." - The National Strategies website
Measurements by A LEAP-Based Virtual Glove for the hand rehabilitation
Hand rehabilitation is fundamental after stroke or surgery. Traditional rehabilitation
requires a therapist and implies high costs, stress for the patient, and subjective evaluation of
the therapy effectiveness. Alternative approaches, based on mechanical and tracking-based gloves,
can be really effective when used in virtual reality (VR) environments. Mechanical devices are often
expensive, cumbersome, patient specific and hand specific, while tracking-based devices are not
affected by these limitations but, especially if based on a single tracking sensor, could suffer from
occlusions. In this paper, the implementation of a multi-sensors approach, the Virtual Glove (VG),
based on the simultaneous use of two orthogonal LEAP motion controllers, is described. The VG is
calibrated and static positioning measurements are compared with those collected with an accurate
spatial positioning system. The positioning error is lower than 6 mm in a cylindrical region of interest
of radius 10 cm and height 21 cm. Real-time hand tracking measurements are also performed, analysed
and reported. Hand tracking measurements show that VG operated in real-time (60 fps), reduced
occlusions, and managed two LEAP sensors correctly, without any temporal and spatial discontinuity
when skipping from one sensor to the other. A video demonstrating the good performance of VG
is also collected and presented in the Supplementary Materials. Results are promising but further
work must be done to allow the calculation of the forces exerted by each finger when constrained by
mechanical tools (e.g., peg-boards) and for reducing occlusions when grasping these tools. Although
the VG is proposed for rehabilitation purposes, it could also be used for tele-operation of tools and
robots, and for other VR applications
Automatic Synchronization of Multi-User Photo Galleries
In this paper we address the issue of photo galleries synchronization, where
pictures related to the same event are collected by different users. Existing
solutions to address the problem are usually based on unrealistic assumptions,
like time consistency across photo galleries, and often heavily rely on
heuristics, limiting therefore the applicability to real-world scenarios. We
propose a solution that achieves better generalization performance for the
synchronization task compared to the available literature. The method is
characterized by three stages: at first, deep convolutional neural network
features are used to assess the visual similarity among the photos; then, pairs
of similar photos are detected across different galleries and used to construct
a graph; eventually, a probabilistic graphical model is used to estimate the
temporal offset of each pair of galleries, by traversing the minimum spanning
tree extracted from this graph. The experimental evaluation is conducted on
four publicly available datasets covering different types of events,
demonstrating the strength of our proposed method. A thorough discussion of the
obtained results is provided for a critical assessment of the quality in
synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi
Finding perceptually optimal operating points of a real time interactive video-conferencing system
This research aims to address issues faced by real time video-conferencing systems in locating a perceptually optimal operating point under various network and conversational conditions.
In order to determine the perceptually optimal operating point of a video-conferencing system, we must first be able to conduct a fair assessment of the quality of the current operating point in the system and compare it with another operating point to determine if one is better than the other in terms of perceptual quality. However at this point in time, there does not exist one objective quality metric that can accurately and fully describe the perceptual quality of a real time video conversation. Hence there is a need for a controlled environment to allow tests to be conducted in and in which we can study different metrics and identify the best trade-offs between them.
We begin by studying the components of a typical setup of a real time video-conferencing system and the impacts that various network and conversation conditions can have on the overall perceptual quality. We also look into different metrics available to measure those impacts.
We then created a platform to perform black box testing on current video conferencing systems and observe how they handle the changes in operating conditions. The platform is then used to conduct a brief evaluation of the performance of Skype, a popular commercial video-conferencing system. However, we are not able to modify the system parameters of Skype.
The main contribution of this thesis is the design of a new testbed that provides a controlled environment to allow tests to be conducted to determine the perceptual optimum operating point of a video conversation under specified network and conversation conditions. This testbed will allow us to modify certain parameters, such as frame rate and frame size, which were not previously possible.
The testbed takes as input, two recorded videos of the two speakers of a face-to-face conversation and desired output video parameters, such as frame rate, frame size and delay. A video generation algorithm is designed as part of the testbed to handle modifications to frame rate and frame size of the videos as well as delays inserted into the recorded video conversation to simulate the effects of network delays. The most important issue addressed is the generation of new frames to fill up the gaps created due to a change in frame rate or delay inserted, unlike as in the case of voice, where a period of silence can simply be used to handle these situations.
The testbed uses a packetization strategy designed on the basis of an uneven packet transmission rate (UPTR) and that handles the packetization of interleaved video and audio data; it also uses piggybacking to provide redundancy if required. Losses can be injected either randomly or based on packet traces collected via PlanetLab. The processed videos will then be pieced together side-by-side to give the viewpoint of a third-party observing the video conversation from the site of the first speaker. Hence the first speaker will be observed to have a faster reaction time without network delays than that of the second speaker who is simulated to be located at the remote end. The video of the second speaker will also reflect the degradations in perceptual quality induced by the network conditions, whereas the first speaker will be of perfect quality. Hence with the testbed, we are able to generate output videos for different operating points under the same network and conversational conditions and thus able to make comparisons between two operating points.
With the testbed in place, we demonstrate how it can be used to evaluate the effects of various parameters on the overall perceptual quality.
Lastly, we demonstrate the results of applying an existing efficient search algorithm used for estimating the perceptually optimal mouth-to-ear delay (MED) of a Voice-over-IP(VoIP) conversation to a Video Conversation. This is achieved by using the network simulator designed to conduct a series of subjective and objective tests to identify the perceptual optimum MED under specific network and conversational conditions
- …