24,149 research outputs found
Services surround you:physical-virtual linkage with contextual bookmarks
Our daily life is pervaded by digital information and devices, not least the common mobile phone. However, a seamless connection between our physical world, such as a movie trailer on a screen in the main rail station and its digital counterparts, such as an online ticket service, remains difficult. In this paper, we present contextual bookmarks that enable users to capture information of interest with a mobile camera phone. Depending on the userâs context, the snapshot is mapped to a digital service such as ordering tickets for a movie theater close by or a link to the upcoming movieâs Web page
Recommended from our members
Multimodal Indexing of Presentation Videos
This thesis presents four novel methods to help users efficiently and effectively retrieve information from unstructured and unsourced multimedia sources, in particular the increasing amount and variety of presentation videos such as those in e-learning, conference recordings, corporate talks, and student presentations. We demonstrate a system to summarize, index and cross-reference such videos, and measure the quality of the produced indexes as perceived by the end users. We introduce four major semantic indexing cues: text, speaker faces, graphics, and mosaics, going beyond standard tag based searches and simple video playbacks. This work aims at recognizing visual content "in the wild", where the system cannot rely on any additional information besides the video itself. For text, within a scene text detection and recognition framework, we present a novel locally optimal adaptive binarization algorithm, implemented with integral histograms. It determines of an optimal threshold that maximizes the between-classes variance within a subwindow, with computational complexity independent from the size of the window itself. We obtain character recognition rates of 74%, as validated against ground truth of 8 presentation videos spanning over 1 hour and 45 minutes, which almost doubles the baseline performance of an open source OCR engine. For speaker faces, we detect, track, match, and finally select a humanly preferred face icon per speaker, based on three quality measures: resolution, amount of skin, and pose. We register a 87% accordance (51 out of 58 speakers) between the face indexes automatically generated from three unstructured presentation videos of approximately 45 minutes each, and human preferences recorded through Mechanical Turk experiments. For diagrams, we locate graphics inside frames showing a projected slide, cluster them according to an on-line algorithm based on a combination of visual and temporal information, and select and color-correct their representatives to match human preferences recorded through Mechanical Turk experiments. We register 71% accuracy (57 out of 81 unique diagrams properly identified, selected and color-corrected) on three hours of videos containing five different presentations. For mosaics, we combine two existing suturing measures, to extend video images into in-the-world coordinate system. A set of frames to be registered into a mosaic are sampled according to the PTZ camera movement, which is computed through least square estimation starting from the luminance constancy assumption. A local features based stitching algorithm is then applied to estimate the homography among a set of video frames and median blending is used to render pixels in overlapping regions of the mosaic. For two of these indexes, namely faces and diagrams, we present two novel MTurk-derived user data collections to determine viewer preferences, and show that they are matched in selection by our methods. The net result work of this thesis allows users to search, inside a video collection as well as within a single video clip, for a segment of presentation by professor X on topic Y, containing graph Z
An examination of automatic video retrieval technology on access to the contents of an historical video archive
Purpose â This paper aims to provide an initial understanding of the constraints that historical video collections pose to video retrieval technology and the potential that online access offers to both archive and users.
Design/methodology/approach â A small and unique collection of videos on customs and folklore was used as a case study. Multiple methods were employed to investigate the effectiveness of technology and the modality of user access. Automatic keyframe extraction was tested on the visual content while the audio stream was used for automatic classification of speech and music clips. The user access (search vs browse) was assessed in a controlled user evaluation. A focus group and a survey provided insight on the actual use of the analogue archive. The results of these multiple studies were then compared and integrated (triangulation).
Findings â The amateur material challenged automatic techniques for video and audio indexing, thus suggesting that the technology must be tested against the material before deciding on a digitisation strategy. Two user interaction modalities, browsing vs searching, were tested in a user evaluation. Results show users preferred searching, but browsing becomes essential when the search engine fails in matching query and indexed words. Browsing was also valued for serendipitous discovery; however the organisation of the archive was judged cryptic and therefore of limited use. This indicates that the categorisation of an online archive should be thought of in terms of users who might not understand the current classification. The focus group and the survey showed clearly the advantage of online access even when the quality of the video surrogate is poor. The evidence gathered suggests that the creation of a digital version of a video archive requires a rethinking of the collection in terms of the new medium: a new archive should be specially designed to exploit the potential that the digital medium offers. Similarly, users' needs have to be considered before designing the digital library interface, as needs are likely to be different from those imagined.
Originality/value â This paper is the first attempt to understand the advantages offered and limitations held by video retrieval technology for small video archives like those often found in special collections
Recommended from our members
Stress and productivity patterns of interrupted, synergistic, and antagonistic office activities.
We describe a controlled experiment, aiming to study productivity and stress effects of email interruptions and activity interactions in the modern office. The measurement set includes multimodal data for nâ=â63 knowledge workers who volunteered for this experiment and were randomly assigned into four groups: (G1/G2) Batch email interruptions with/without exogenous stress. (G3/G4) Continual email interruptions with/without exogenous stress. To provide context, the experiment's email treatments were surrounded by typical office tasks. The captured variables include physiological indicators of stress, measures of report writing quality and keystroke dynamics, as well as psychometric scores and biographic information detailing participants' profiles. Investigations powered by this dataset are expected to lead to personalized recommendations for handling email interruptions and a deeper understanding of synergistic and antagonistic office activities. Given the centrality of email in the modern office, and the importance of office work to people's lives and the economy, the present data have a valuable role to play
Video browsing interfaces and applications: a review
We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video dataâwhich, if presented in its raw format, is rather unwieldy and costlyâhave become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other
Look at Me: Early Gaze Engagement Enhances Corticospinal Excitability During Action Observation
Direct gaze is a powerful social cue able to capture the onlooker's attention. Beside gaze, head and limb movements as well can provide relevant sources of information for social interaction. This study investigated the joint role of direct gaze and hand gestures on onlookers corticospinal excitability (CE). In two experiments we manipulated the temporal and spatial aspects of observed gaze and hand behavior to assess their role in affecting motor preparation. To do this, transcranial magnetic stimulation (TMS) on the primary motor cortex (M1) coupled with electromyography (EMG) recording was used in two experiments. In the crucial manipulation, we showed to participants four video clips of an actor who initially displayed eye contact while starting a social request gesture, and then completed the action while directing his gaze toward a salient object for the interaction. This way, the observed gaze potentially expressed the intention to interact. Eye tracking data confirmed that gaze manipulation was effective in drawing observers' attention to the actor's hand gesture. In the attempt to reveal possible time-locked modulations, we tracked CE at the onset and offset of the request gesture. Neurophysiological results showed an early CE modulation when the actor was about to start the request gesture looking straight to the participants, compared to when his gaze was averted from the gesture. This effect was time-locked to the kinematics of the actor's arm movement. Overall, data from the two experiments seem to indicate that the joint contribution of direct gaze and precocious kinematic information, gained while a request gesture is on the verge of beginning, increases the subjective experience of involvement and allows observers to prepare for an appropriate social interaction. On the contrary, the separation of gaze cues and body kinematics can have adverse effects on social motor preparation. CE is highly susceptible to biological cues, such as averted gaze, which is able to automatically capture and divert observer's attention. This point to the existence of heuristics based on early action and gaze cues that would allow observers to interact appropriately
Dublin City University video track experiments for TREC 2002
Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video
Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we
developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that
provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems
Interactive searching and browsing of video archives: using text and using image matching
Over the last number of decades much research work has been done in the general area of video and audio analysis. Initially the applications driving this included capturing video in digital form and then being able to store, transmit
and render it, which involved a large effort to develop compression and encoding standards. The technology needed to do all this is now easily available and cheap, with applications of digital video processing now commonplace,
ranging from CCTV (Closed Circuit TV) for security, to home capture of broadcast TV on home DVRs for personal viewing.
One consequence of the development in technology for creating, storing and distributing digital video is that there has been a huge increase in the volume of digital video, and this in turn has created a need for techniques to allow effective management of this video, and by that we mean content management. In the BBC, for example, the archives department receives approximately 500,000 queries per year and has over 350,000 hours of content in its library. Having huge archives of video information is hardly any benefit if we have no effective means of being able to locate video clips which are of relevance to whatever our information needs may be. In this chapter we report our work on developing two specific retrieval and browsing tools for digital video information. Both of these are based on an analysis of the captured video for the purpose of automatically structuring into shots or higher level semantic units like TV news stories. Some also include analysis of the video for the automatic detection of features such as the presence or absence of faces. Both include some elements of searching, where a user specifies a query or information need, and browsing, where a user is allowed to browse through sets of retrieved video shots. We support the presentation of these tools with illustrations of actual video retrieval systems developed and working on hundreds of hours of video content
- âŠ