5,941 research outputs found
PRNU Estimation based on Weighted Averaging for Source Smartphone Video Identification
Photo response non-uniformity (PRNU) noise is a sensor pattern noise characterizing imperfections in the imaging device. The PRNU is a unique noise for each sensor device, and it has been generally utilized in the literature for source camera identification and image authentication. In video forensics, the traditional approach estimates the PRNU by averaging a set of residual signals obtained from multiple video frames. However, due to lossy compression and other non-unique content-dependent noise components that interfere with the video data, constant averaging does not take into account the intensity of these undesirable noise components which are content-dependent. Different from the traditional approach, we propose a video PRNU estimation method based on weighted averaging. The noise residual is first extracted for each single video. Then, the estimated noise residuals are fed into a weighted averaging method to optimize PRNU estimation. Experimental results on two video datasets captured by various smartphone devices have shown a significant gain obtained with the proposed approach over the conventional state-of-the-art one
Põhjalik uuring ülisuure dünaamilise ulatusega piltide toonivastendamisest koos subjektiivsete testidega
A high dynamic range (HDR) image has a very wide range of luminance levels that
traditional low dynamic range (LDR) displays cannot visualize. For this reason, HDR
images are usually transformed to 8-bit representations, so that the alpha channel for
each pixel is used as an exponent value, sometimes referred to as exponential notation
[43]. Tone mapping operators (TMOs) are used to transform high dynamic range to
low dynamic range domain by compressing pixels so that traditional LDR display can
visualize them. The purpose of this thesis is to identify and analyse differences and
similarities between the wide range of tone mapping operators that are available in the
literature. Each TMO has been analyzed using subjective studies considering different
conditions, which include environment, luminance, and colour. Also, several inverse
tone mapping operators, HDR mappings with exposure fusion, histogram adjustment,
and retinex have been analysed in this study. 19 different TMOs have been examined
using a variety of HDR images. Mean opinion score (MOS) is calculated on those selected
TMOs by asking the opinion of 25 independent people considering candidates’
age, vision, and colour blindness
Annotation of multimedia learning materials for semantic search
Multimedia is the main source for online learning materials, such as videos, slides and textbooks, and its size is growing with the popularity of online programs offered by Universities and Massive Open Online Courses (MOOCs). The increasing amount of multimedia learning resources available online makes it very challenging to browse through the materials or find where a specific concept of interest is covered. To enable semantic search on the lecture materials, their content must be annotated and indexed. Manual annotation of learning materials such as videos is tedious and cannot be envisioned for the growing quantity of online materials. One of the most commonly used methods for learning video annotation is to index the video, based on the transcript obtained from translating the audio track of the video into text. Existing speech to text translators require extensive training especially for non-native English speakers and are known to have low accuracy.
This dissertation proposes to index the slides, based on the keywords. The keywords extracted from the textbook index and the presentation slides are the basis of the indexing scheme. Two types of lecture videos are generally used (i.e., classroom recording using a regular camera or slide presentation screen captures using specific software) and their quality varies widely. The screen capture videos, have generally a good quality and sometimes come with metadata. But often, metadata is not reliable and hence image processing techniques are used to segment the videos. Since the learning videos have a static background of slide, it is challenging to detect the shot boundaries. Comparative analysis of the state of the art techniques to determine best feature descriptors suitable for detecting transitions in a learning video is presented in this dissertation. The videos are indexed with keywords obtained from slides and a correspondence is established by segmenting the video temporally using feature descriptors to match and align the video segments with the presentation slides converted into images. The classroom recordings using regular video cameras often have poor illumination with objects partially or totally occluded. For such videos, slide localization techniques based on segmentation and heuristics is presented to improve the accuracy of the transition detection.
A region prioritized ranking mechanism is proposed that integrates the keyword location in the presentation into the ranking of the slides when searching for a slide that covers a given keyword. This helps in getting the most relevant results first. With the increasing size of course materials gathered online, a user looking to understand a given concept can get overwhelmed. The standard way of learning and the concept of “one size fits all” is no longer the best way to learn for millennials. Personalized concept recommendation is presented according to the user’s background knowledge.
Finally, the contributions of this dissertation have been integrated into the Ultimate Course Search (UCS), a tool for an effective search of course materials. UCS integrates presentation, lecture videos and textbook content into a single platform with topic based search capabilities and easy navigation of lecture materials
Sensor Pattern Noise Estimation using Non-textured Video Frames For Efficient Source Smartphone Identification and Verification
Photo response non-uniformity (PRNU) noise is a sensor pattern noise characterizing the imaging device. It has been broadly used in the literature for image authentication and source camera identification. The abundant information that the PRNU carries in terms of the frequency content makes it unique, and therefore suitable for identifying the source camera and detecting forgeries in digital images. However, PRNU estimation from smartphone videos is a challenging process due to the presence of frame-dependent information (very dark/very textured), as well as other non-unique noise components and distortions due to lossy compression. In this paper, we propose an approach that considers only the non-textured frames in estimating the PRNU because its estimation in highly textured images has been proven to be inaccurate in image forensics. Furthermore, lossy compression distortions tend to affect mainly the textured and high activity regions and consequently weakens the presence of the PRNU in such areas. The proposed technique uses a number of texture measures obtained from the Grey Level Cooccurrence Matrix (GLCM) prior to an unsupervised learning process that splits the feature space through training video frames into two different sub-spaces, i.e., the textured space and the non-textured space. Non-textured video frames are filtered out and used for estimating the PRNU. Experimental results on a public video dataset captured by various smartphone devices have shown a significant gain obtained with the proposed approach over the conventional state-of-the-art approach
Recommended from our members
Distributed multimedia quality: The user perspective
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Distributed multimedia supports a symbiotic infotainment duality, i.e. the ability to transfer information to the user, yet also provide the user with a level of satisfaction. As multimedia is ultimately produced for the education and / or enjoyment of viewers, the user’s-perspective concerning the presentation quality is surely of equal importance as objective Quality of Service (QoS) technical parameters, to defining distributed multimedia quality. In order to extensively measure the user-perspective of multimedia video quality, we introduce an extended model of distributed multimedia quality that segregates quality into three discrete levels: the network-level, the media-level and content-level, using two distinct quality perspectives: the user-perspective and the technical-perspective.
Since experimental questionnaires do not provide continuous monitoring of user attention, eye tracking was used in our study in order to provide a better understanding of the role that the human element plays in the reception, analysis and synthesis of multimedia data. Results showed that video content adaptation, results in disparity in user video eye-paths when: i) no single / obvious point of focus exists; or ii) when the point of attention changes dramatically.
Accordingly, appropriate technical- and user-perspective parameter adaptation is implemented, for all quality abstractions of our model, i.e. network-level (via simulated delay and jitter), media-level (via a technical- and user-perspective manipulated region-of-interest attentive display) and content-level (via display-type and video clip-type). Our work has shown that user perception of distributed multimedia quality cannot be achieved by means of purely technical-perspective QoS parameter adaptation
- …