1,091 research outputs found
DCT-based video downscaling transcoder using split and merge technique
2005-2006 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe
An Efficient Motion Estimation Method for H.264-Based Video Transcoding with Arbitrary Spatial Resolution Conversion
As wireless and wired network connectivity is rapidly expanding
and the number of network users is steadily increasing, it has become more
and more important to support universal access of multimedia
content over the whole network. A big challenge, however, is
the great diversity of network devices from full screen computers
to small smart phones. This leads to research on transcoding,
which involves in efficiently reformatting compressed data from
its original high resolution to a desired spatial resolution
supported by the displaying device. Particularly, there is a
great momentum in the multimedia industry for H.264-based
transcoding as H.264 has been widely employed as a mandatory
player feature in applications ranging from television broadcast
to video for mobile devices.
While H.264 contains many new features for effective video
coding with excellent rate distortion (RD) performance, a major issue
for transcoding H.264 compressed video from one spatial resolution
to another is the computational complexity. Specifically, it is
the motion compensated prediction (MCP) part. MCP is the main
contributor to the excellent RD performance
of H.264 video compression, yet it is very time consuming. In general,
a brute-force search is used to find the best motion vectors for MCP.
In the scenario of transcoding, however, an immediate idea for
improving the MCP efficiency for the re-encoding procedure is to
utilize the motion vectors in the original compressed stream.
Intuitively, motion in the high resolution scene is highly related
to that in the down-scaled scene.
In this thesis, we study homogeneous video transcoding from H.264
to H.264. Specifically, for the video transcoding with arbitrary
spatial resolution conversion, we propose a motion vector estimation
algorithm based on a multiple linear regression model, which
systematically utilizes the motion information in the original scenes.
We also propose a practical solution for efficiently determining a
reference frame to take the advantage of the new feature of multiple
references in H.264. The performance of the algorithm was assessed
in an H.264 transcoder. Experimental results show that, as compared
with a benchmark solution, the proposed method significantly reduces
the transcoding complexity without degrading much the video quality
Datasets, Clues and State-of-the-Arts for Multimedia Forensics: An Extensive Review
With the large chunks of social media data being created daily and the
parallel rise of realistic multimedia tampering methods, detecting and
localising tampering in images and videos has become essential. This survey
focusses on approaches for tampering detection in multimedia data using deep
learning models. Specifically, it presents a detailed analysis of benchmark
datasets for malicious manipulation detection that are publicly available. It
also offers a comprehensive list of tampering clues and commonly used deep
learning architectures. Next, it discusses the current state-of-the-art
tampering detection methods, categorizing them into meaningful types such as
deepfake detection methods, splice tampering detection methods, copy-move
tampering detection methods, etc. and discussing their strengths and
weaknesses. Top results achieved on benchmark datasets, comparison of deep
learning approaches against traditional methods and critical insights from the
recent tampering detection methods are also discussed. Lastly, the research
gaps, future direction and conclusion are discussed to provide an in-depth
understanding of the tampering detection research arena
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Resampling Forgery Detection Using Deep Learning and A-Contrario Analysis
The amount of digital imagery recorded has recently grown exponentially, and
with the advancement of software, such as Photoshop or Gimp, it has become
easier to manipulate images. However, most images on the internet have not been
manipulated and any automated manipulation detection algorithm must carefully
control the false alarm rate. In this paper we discuss a method to
automatically detect local resampling using deep learning while controlling the
false alarm rate using a-contrario analysis. The automated procedure consists
of three primary steps. First, resampling features are calculated for image
blocks. A deep learning classifier is then used to generate a heatmap that
indicates if the image block has been resampled. We expect some of these blocks
to be falsely identified as resampled. We use a-contrario hypothesis testing to
both identify if the patterns of the manipulated blocks indicate if the image
has been tampered with and to localize the manipulation. We demonstrate that
this strategy is effective in indicating if an image has been manipulated and
localizing the manipulations.Comment: arXiv admin note: text overlap with arXiv:1802.0315
Efficient Learning-based Image Enhancement : Application to Compression Artifact Removal and Super-resolution
Many computer vision and computational photography applications essentially solve an image enhancement problem. The image has been deteriorated by a specific noise process, such as aberrations from camera optics and compression artifacts, that we would like to remove. We describe a framework for learning-based image enhancement. At the core of our algorithm lies a generic regularization framework that comprises a prior on natural images, as well as an application-specific conditional model based on Gaussian processes. In contrast to prior learning-based approaches, our algorithm can instantly learn task-specific degradation models from sample images which enables users to easily adapt the algorithm to a specific problem and data set of interest. This is facilitated by our efficient approximation scheme of large-scale Gaussian processes. We demonstrate the efficiency and effectiveness of our approach by applying it to example enhancement applications including single-image super-resolution, as well as artifact removal in JPEG- and JPEG 2000-encoded images
- …