74 research outputs found
User Constrained Thumbnail Generation using Adaptive Convolutions
Thumbnails are widely used all over the world as a preview for digital
images. In this work we propose a deep neural framework to generate thumbnails
of any size and aspect ratio, even for unseen values during training, with high
accuracy and precision. We use Global Context Aggregation (GCA) and a modified
Region Proposal Network (RPN) with adaptive convolutions to generate thumbnails
in real time. GCA is used to selectively attend and aggregate the global
context information from the entire image while the RPN is used to predict
candidate bounding boxes for the thumbnail image. Adaptive convolution
eliminates the problem of generating thumbnails of various aspect ratios by
using filter weights dynamically generated from the aspect ratio information.
The experimental results indicate the superior performance of the proposed
model over existing state-of-the-art techniques.Comment: International Conference on Acoustics, Speech, and Signal
Processing(ICASSP), 201
Perceptually based downscaling of images
We propose a perceptually based method for downscaling images that provides a better apparent depiction of the input image. We formulate image downscaling as an optimization problem where the difference between the input and output images is measured using a widely adopted perceptual image quality metric. The downscaled images retain perceptually important features and details, resulting in an accurate and spatio-temporally consistent representation of the high resolution input. We derive the solution of the optimization problem in closed-form, which leads to a simple, efficient and parallelizable implementation with sums and convolutions. The algorithm has running times similar to linear filtering and is orders of magnitude faster than the state-of-the-art for image downscaling. We validate the effectiveness of the technique with extensive tests on many images, video, and by performing a user study, which indicates a clear preference for the results of the new algorithm.</jats:p
Mobile graphics: SIGGRAPH Asia 2017 course
Peer ReviewedPostprint (published version
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
IMAGE MANAGEMENT USING PATTERN RECOGNITION SYSTEMS
With the popular usage of personal image devices and the continued increase of computing power, casual users need to handle a large number of images on computers. Image management is challenging because in addition to searching and browsing textual metadata, we also need to address two additional challenges. First, thumbnails, which are representative forms of original images, require significant screen space to be represented meaningfully. Second, while image metadata is crucial for managing images, creating metadata for images is expensive. My research on these issues is composed of three components which address these problems.
First, I explore a new way of browsing a large number of images. I redesign and implement a zoomable image browser, PhotoMesa, which is capable of showing thousands of images clustered by metadata. Combined with its simple navigation strategy, the zoomable image environment allows users to scale up the size of an image collection they can comfortably browse.
Second, I examine tradeoffs of displaying thumbnails in limited screen space. While bigger thumbnails use more screen space, smaller thumbnails are hard to recognize. I introduce an automatic thumbnail cropping algorithm based on a computer vision saliency model. The cropped thumbnails keep the core informative part and remove the less informative periphery. My user study shows that users performed visual searches more than 18% faster with cropped thumbnails.
Finally, I explore semi-automatic annotation techniques to help users make accurate annotations with low effort. Automatic metadata extraction is typically fast but inaccurate while manual annotation is slow but accurate. I investigate techniques to combine these two approaches. My semi-automatic annotation prototype, SAPHARI, generates image clusters which facilitate efficient bulk annotation. For automatic clustering, I present hierarchical event clustering and clothing based human recognition. Experimental results demonstrate the effectiveness of the semi-automatic annotation when applied on personal photo collections. Users were able to make annotation 49% and 6% faster with the semi-automatic annotation interface on event and face tasks, respectively
- …