499 research outputs found

    Semantischer Schutz und Personalisierung von Videoinhalten. PIAF: MPEG-kompatibles Multimedia-Adaptierungs-Framework zur Bewahrung der vom Nutzer wahrgenommenen Qualität.

    Get PDF
    UME is the notion that a user should receive informative adapted content anytime and anywhere. Personalization of videos, which adapts their content according to user preferences, is a vital aspect of achieving the UME vision. User preferences can be translated into several types of constraints that must be considered by the adaptation process, including semantic constraints directly related to the content of the video. To deal with these semantic constraints, a fine-grained adaptation, which can go down to the level of video objects, is necessary. The overall goal of this adaptation process is to provide users with adapted content that maximizes their Quality of Experience (QoE). This QoE depends at the same time on the level of the user's satisfaction in perceiving the adapted content, the amount of knowledge assimilated by the user, and the adaptation execution time. In video adaptation frameworks, the Adaptation Decision Taking Engine (ADTE), which can be considered as the "brain" of the adaptation engine, is responsible for achieving this goal. The task of the ADTE is challenging as many adaptation operations can satisfy the same semantic constraint, and thus arising in several feasible adaptation plans. Indeed, for each entity undergoing the adaptation process, the ADTE must decide on the adequate adaptation operator that satisfies the user's preferences while maximizing his/her quality of experience. The first challenge to achieve in this is to objectively measure the quality of the adapted video, taking into consideration the multiple aspects of the QoE. The second challenge is to assess beforehand this quality in order to choose the most appropriate adaptation plan among all possible plans. The third challenge is to resolve conflicting or overlapping semantic constraints, in particular conflicts arising from constraints expressed by owner's intellectual property rights about the modification of the content. In this thesis, we tackled the aforementioned challenges by proposing a Utility Function (UF), which integrates semantic concerns with user's perceptual considerations. This UF models the relationships among adaptation operations, user preferences, and the quality of the video content. We integrated this UF into an ADTE. This ADTE performs a multi-level piecewise reasoning to choose the adaptation plan that maximizes the user-perceived quality. Furthermore, we included intellectual property rights in the adaptation process. Thereby, we modeled content owner constraints. We dealt with the problem of conflicting user and owner constraints by mapping it to a known optimization problem. Moreover, we developed the SVCAT, which produces structural and high-level semantic annotation according to an original object-based video content model. We modeled as well the user's preferences proposing extensions to MPEG-7 and MPEG-21. All the developed contributions were carried out as part of a coherent framework called PIAF. PIAF is a complete modular MPEG standard compliant framework that covers the whole process of semantic video adaptation. We validated this research with qualitative and quantitative evaluations, which assess the performance and the efficiency of the proposed adaptation decision-taking engine within PIAF. The experimental results show that the proposed UF has a high correlation with subjective video quality evaluation.Der Begriff "Universal Multimedia Experience" (UME) beschreibt die Vision, dass ein Nutzer nach seinen individuellen Vorlieben zugeschnittene Videoinhalte konsumieren kann. In dieser Dissertation werden im UME nun auch semantische Constraints berücksichtigt, welche direkt mit der Konsumierung der Videoinhalte verbunden sind. Dabei soll die Qualität der Videoerfahrung für den Nutzer maximiert werden. Diese Qualität ist in der Dissertation durch die Benutzerzufriedenheit bei der Wahrnehmung der Veränderung der Videos repräsentiert. Die Veränderung der Videos wird durch eine Videoadaptierung erzeugt, z.B. durch die Löschung oder Veränderung von Szenen, Objekten, welche einem semantischen Constraints nicht entsprechen. Kern der Videoadaptierung ist die "Adaptation Decision Taking Engine" (ADTE). Sie bestimmt die Operatoren, welche die semantischen Constraints auflösen, und berechnet dann mögliche Adaptierungspläne, die auf dem Video angewandt werden sollen. Weiterhin muss die ADTE für jeden Adaptierungsschritt anhand der Operatoren bestimmen, wie die Vorlieben des Nutzers berücksichtigt werden können. Die zweite Herausforderung ist die Beurteilung und Maximierung der Qualität eines adaptierten Videos. Die dritte Herausforderung ist die Berücksichtigung sich widersprechender semantischer Constraints. Dies betrifft insbesondere solche, die mit Urheberrechten in Verbindung stehen. In dieser Dissertation werden die oben genannten Herausforderungen mit Hilfe eines "Personalized video Adaptation Framework" (PIAF) gelöst, welche auf den "Moving Picture Expert Group" (MPEG)-Standard MPEG-7 und MPEG-21 basieren. PIAF ist ein Framework, welches den gesamten Prozess der Videoadaptierung umfasst. Es modelliert den Zusammenhang zwischen den Adaptierungsoperatoren, den Vorlieben der Nutzer und der Qualität der Videos. Weiterhin wird das Problem der optimalen Auswahl eines Adaptierungsplans für die maximale Qualität der Videos untersucht. Dafür wird eine Utility Funktion (UF) definiert und in der ADTE eingesetzt, welche die semantischen Constraints mit den vom Nutzer ausgedrückten Vorlieben vereint. Weiterhin ist das "Semantic Video Content Annotation Tool" (SVCAT) entwickelt worden, um strukturelle und semantische Annotationen durchzuführen. Ebenso sind die Vorlieben der Nutzer mit MPEG-7 und MPEG-21 Deskriptoren berücksichtigt worden. Die Entwicklung dieser Software-Werkzeuge und Algorithmen ist notwendig, um ein vollständiges und modulares Framework zu erhalten. Dadurch deckt PIAF den kompletten Bereich der semantischen Videoadaptierung ab. Das ADTE ist in qualitativen und quantitativen Evaluationen validiert worden. Die Ergebnisse der Evaluation zeigen unter anderem, dass die UF im Bereich Qualität eine hohe Korrelation mit der subjektiven Wahrnehmung von ausgewählten Nutzern aufweist

    Adaptive video delivery using semantics

    Get PDF
    The diffusion of network appliances such as cellular phones, personal digital assistants and hand-held computers has created the need to personalize the way media content is delivered to the end user. Moreover, recent devices, such as digital radio receivers with graphics displays, and new applications, such as intelligent visual surveillance, require novel forms of video analysis for content adaptation and summarization. To cope with these challenges, we propose an automatic method for the extraction of semantics from video, and we present a framework that exploits these semantics in order to provide adaptive video delivery. First, an algorithm that relies on motion information to extract multiple semantic video objects is proposed. The algorithm operates in two stages. In the first stage, a statistical change detector produces the segmentation of moving objects from the background. This process is robust with regard to camera noise and does not need manual tuning along a sequence or for different sequences. In the second stage, feedbacks between an object partition and a region partition are used to track individual objects along the frames. These interactions allow us to cope with multiple, deformable objects, occlusions, splitting, appearance and disappearance of objects, and complex motion. Subsequently, semantics are used to prioritize visual data in order to improve the performance of adaptive video delivery. The idea behind this approach is to organize the content so that a particular network or device does not inhibit the main content message. Specifically, we propose two new video adaptation strategies. The first strategy combines semantic analysis with a traditional frame-based video encoder. Background simplifications resulting from this approach do not penalize overall quality at low bitrates. The second strategy uses metadata to efficiently encode the main content message. The metadata-based representation of object's shape and motion suffices to convey the meaning and action of a scene when the objects are familiar. The impact of different video adaptation strategies is then quantified with subjective experiments. We ask a panel of human observers to rate the quality of adapted video sequences on a normalized scale. From these results, we further derive an objective quality metric, the semantic peak signal-to-noise ratio (SPSNR), that accounts for different image areas and for their relevance to the observer in order to reflect the focus of attention of the human visual system. At last, we determine the adaptation strategy that provides maximum value for the end user by maximizing the SPSNR for given client resources at the time of delivery. By combining semantic video analysis and adaptive delivery, the solution presented in this dissertation permits the distribution of video in complex media environments and supports a large variety of content-based applications

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people
    • …
    corecore