2,036 research outputs found
Video browsing interfaces and applications: a review
We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video dataâwhich, if presented in its raw format, is rather unwieldy and costlyâhave become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other
Information access tasks and evaluation for personal lifelogs
Emerging personal lifelog (PL) collections contain permanent digital records of information associated with individualsâ daily lives. This can include materials such as emails received and sent, web content and other documents with which they have interacted, photographs, videos and music experienced passively or created, logs of phone calls and text messages, and also personal and contextual data such as location (e.g. via GPS sensors), persons and objects present (e.g. via Bluetooth) and physiological state (e.g. via biometric sensors). PLs can be collected by individuals over very extended periods, potentially running to many years. Such archives have many potential applications including helping individuals recover partial forgotten information, sharing experiences with friends or family, telling the story of oneâs life, clinical applications for the memory impaired, and fundamental psychological investigations of memory. The Centre for Digital Video Processing (CDVP) at Dublin City University is currently engaged in the collection and exploration of applications of large PLs. We are collecting rich archives of daily life including textual and visual materials, and contextual context data. An important part of this work is to consider how the effectiveness of our ideas can be measured in terms of metrics and experimental design. While these studies have considerable similarity with traditional evaluation activities in areas such as information retrieval and summarization, the characteristics of PLs mean that new challenges and questions emerge. We are currently exploring the issues through a series of pilot studies and questionnaires. Our initial results indicate that there are many research questions to be explored and that the relationships between personal memory, context and content for these tasks is complex and fascinating
The FĂschlĂĄr-News-Stories system: personalised access to an archive of TV news
The âFĂschlĂĄrâ systems are a family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information. FĂschlĂĄr-News-Stories, described in this paper, is one of those systems, and provides access to a growing archive of broadcast TV news. FĂschlĂĄr-News-Stories has several notable features including the fact that it automatically records TV news and segments a broadcast news program into stories, eliminating advertisements and credits at the start/end of the broadcast. FĂschlĂĄr-News-Stories supports access to individual stories via calendar lookup, text search through closed captions, automatically-generated links between related stories, and personalised access using a personalisation and recommender system based on collaborative filtering. Access to individual news stories is supported either by browsing keyframes with synchronised closed captions, or by playback of the recorded video. One strength of the FĂschlĂĄr-News-Stories system is that it is actually used, in practice, daily, to access news. Several aspects of the FĂschlĂĄr systems have been published before, bit in this paper we give a summary of the FĂschlĂĄr-News-Stories system in operation by following a scenario in which it is used and also outlining how the underlying system realises the functions it offers
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
Audio-visual football video analysis, from structure detection to attention analysis
Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic ďŹelds. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop speciďŹc techniques for content-based sports video analysis to utilise these characteristics.
For an efďŹcient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewerâs interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identiďŹcation.
Replay segments convey the most important contents in sports videos. It is an efďŹcient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a ďŹve-layer adaboost classiďŹer and a logo template matching throughout an entire video. The ďŹve-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to ďŹlter out logo transition candidates. Subsequently, a logo template is constructed and employed to ďŹnd all transition logo sequences. The precision and recall of this system in replay detection is 100% in a ďŹve-game evaluation collection.
An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identiďŹed by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a sufďŹx tree is proposed to ďŹnd the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains.
Highlights are what attract notice. Attention is a psychological measurement of ânotice â. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reďŹection bias among modality salient signals and combines these signals by reďŹectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are ďŹlled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can ďŹnd goal events at a high precision. Moreover, results of MAR-based highlight detection on the ďŹnal game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA
Addressing the challenge of managing large-scale digital multimedia libraries
Traditional Digital Libraries require human editorial control over the lifecycles of digital objects contained therein. This imposes an inherent (human) overhead on the maintenance of these digital libraries, which becomes unwieldy once the number of important information units in the digital library becomes too large. A revised framework is needed for digital libraries that takes the onus off the editor and allows the digital library to directly control digital object lifecycles, by employing a set of transformation rules that operate directly on the digital objects themselves. In this paper we motivate and describe a revised digital library framework that utilises transformation rules to automatically optimise system resources. We evaluate this library in three scenarios and also outline how we could apply concepts from this revised framework to address other challenges for digital libraries and digital information access in general
Multimedia and Decision-Making Process
Multimedia technology has changed the way we use computers. Multimedia transforms com-puters into a second person. Multimedia technology has made it possible for us to see, hear, read, feel, and talk to computers. Multimedia technology has transformed our use and understanding of computers. On the other hand, multimedia presentation is one of the fastest-growing sectors of the computer industry. Applications have appeared in many areas, such as training, education, business presentation, merchandising, and communications.multimedia, decision, studies, mining, architecture
Content-based video retrieval: three example systems from TRECVid
The growth in available online video material over the internet is generally combined with user-assigned tags or content description, which is the mechanism by which we then access such video. However, user-assigned tags have limitations for retrieval and often we want access where the content of the video itself is directly matched against a userâs query rather than against some manually assigned surrogate tag. Content-based video retrieval techniques are not yet scalable enough to allow interactive searching on internet-scale, but the techniques are proving robust and effective for smaller collections. In this paper we show 3 exemplar systems which demonstrate the state of the art in interactive, content-based retrieval of video shots, and these three are just three of the more than 20 systems developed for the 2007 iteration of the annual TRECVid benchmarking activity. The contribution of our paper is to show that retrieving from video using content-based
methods is now viable, that it works, and that there are many systems which now do this, such as the three outlined herein. These systems, and others can provide effective search on hundreds of hours of video content and are samples of the kind of content-based search functionality we can expect to see on larger video archives when issues of scale are addressed
- âŚ