11,688 research outputs found

    Semi-automatic semantic enrichment of raw sensor data

    Get PDF
    One of the more recent sources of large volumes of generated data is sensor devices, where dedicated sensing equipment is used to monitor events and happenings in a wide range of domains, including monitoring human biometrics. In recent trials to examine the effects that key moments in movies have on the human body, we fitted fitted with a number of biometric sensor devices and monitored them as they watched a range of dierent movies in groups. The purpose of these experiments was to examine the correlation between humans' highlights in movies as observed from biometric sensors, and highlights in the same movies as identified by our automatic movie analysis techniques. However,the problem with this type of experiment is that both the analysis of the video stream and the sensor data readings are not directly usable in their raw form because of the sheer volume of low-level data values generated both from the sensors and from the movie analysis. This work describes the semi-automated enrichment of both video analysis and sensor data and the mechanism used to query the data in both centralised environments, and in a peer-to-peer architecture when the number of sensor devices grows to large numbers. We present and validate a scalable means of semi-automating the semantic enrichment of sensor data, thereby providing a means of large-scale sensor management

    Interactive product browsing and configuration using remote augmented reality sales services

    Get PDF
    Real-time remote sales assistance is an underdeveloped component of online sales services. Solutions involving web page text chat, telephony and video support prove problematic when seeking to remotely guide customers in their sales processes, especially with configurations of physically complex artefacts. Recently, there has been great interest in the application of virtual worlds and augmented reality to create synthetic environments for remote sales of physical artefacts. However, there is a lack of analysis and development of appropriate software services to support these processes. We extend our previous work with the detailed design of configuration context services to support the management of an interactive sales session using augmented reality. We detail the context and configuration services required, presenting a novel data service streaming configuration information to the vendor for business analytics. We expect that a fully implemented configuration management service, based on our design, will improve the remote sales experience for both customers and vendors alike via analysis of the streamed information

    COSMOS-7: Video-oriented MPEG-7 scheme for modelling and filtering of semantic content

    Get PDF
    MPEG-7 prescribes a format for semantic content models for multimedia to ensure interoperability across a multitude of platforms and application domains. However, the standard leaves it open as to how the models should be used and how their content should be filtered. Filtering is a technique used to retrieve only content relevant to user requirements, thereby reducing the necessary content-sifting effort of the user. This paper proposes an MPEG-7 scheme that can be deployed for semantic content modelling and filtering of digital video. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user

    Unsupervised mining of audiovisually consistent segments in videos with application to structure analysis

    Get PDF
    International audienceIn this paper, a multimodal event mining technique is proposed to discover repeating video segments exhibiting audio and visual consistency in a totally unsupervised manner. The mining strategy first exploits independent audio and visual cluster analysis to provide segments which are consistent in both their visual and audio modalities, thus likely corresponding to a unique underlying event. A subsequent modeling stage using discriminative models enables accurate detection of the underlying event throughout the video. Event mining is applied to unsupervised video structure analysis, using simple heuristics on occurrence patterns of the events discovered to select those relevant to the video structure. Results on TV programs ranging from news to talk shows and games, show that structurally relevant events are discovered with precisions ranging from 87% to 98% and recalls from 59% to 94%

    ARTICONF decentralized social media platform for democratic crowd journalism

    Get PDF
    Media production and consumption behaviors are changing in response to new technologies and demands, giving birth to a new generation of social applications. Among them, crowd journalism represents a novel way of constructing democratic and trustworthy news relying on ordinary citizens arriving at breaking news locations and capturing relevant videos using their smartphones. The ARTICONF project as reported by Prodan (Euro-Par 2019: parallel processing workshops, Springer, 2019) proposes a trustworthy, resilient, and globally sustainable toolset for developing decentralized applications (DApps) to address this need. Its goal is to overcome the privacy, trust, and autonomy-related concerns associated with proprietary social media platforms overflowed by fake news. Leveraging the ARTICONF tools, we introduce a new DApp for crowd journalism called MOGPlay. MOGPlay collects and manages audiovisual content generated by citizens and provides a secure blockchain platform that rewards all stakeholders involved in professional news production. Besides live streaming, MOGPlay offers a marketplace for audiovisual content trading among citizens and free journalists with an internal token ecosystem. We discuss the functionality and implementation of the MOGPlay DApp and illustrate four pilot crowd journalism live scenarios that validate the prototype

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics. For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification. Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    Judging a video by its bitstream cover

    Full text link
    Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially in an age where an immense volume of video content is constantly being generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these methods often suffer from performance degradation in low-quality videos. We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream. We validate our approach using a custom-built data set comprising over 29,000 YouTube video clips, totaling 6,000 hours and spanning 11 distinct categories. Our preliminary evaluations indicate precision, accuracy, and recall rates well over 80%. The algorithm operates approximately 15,000 times faster than real-time for 30fps videos, outperforming traditional Dynamic Time Warping (DTW) algorithm by six orders of magnitude
    corecore