283 research outputs found

    Browsing Large Image Datasets through Voronoi Diagrams

    Get PDF
    Conventional browsing of image collections use mechanisms such as thumbnails arranged on a regular grid or on a line, often mounted over a scrollable panel. However, this approach does not scale well with the size of the datasets (number of images). In this paper, we propose a new thumbnail-based interface to browse large collections of images. Our approach is based on weighted centroidal anisotropic Voronoi diagrams. A dynamically changing subset of images is represented by thumbnails and shown on the screen. Thumbnails are shaped like general polygons, to better cover screen space, while still reflecting the original aspect ratios or orientation of the represented images. During the browsing process, thumbnails are dynamically rearranged, reshaped and rescaled. The objective is to devote more screen space (more numerous and larger thumbnails) to the parts of the dataset closer to the current region of interest, and progressively lesser away from it, while still making the dataset visible as a whole. During the entire process, temporal coherence is always maintained. GPU implementation easily guarantees the frame rates needed for fully smooth interactivity

    Supervised Deep Learning for Content-Aware Image Retargeting with Fourier Convolutions

    Full text link
    Image retargeting aims to alter the size of the image with attention to the contents. One of the main obstacles to training deep learning models for image retargeting is the need for a vast labeled dataset. Labeled datasets are unavailable for training deep learning models in the image retargeting tasks. As a result, we present a new supervised approach for training deep learning models. We use the original images as ground truth and create inputs for the model by resizing and cropping the original images. A second challenge is generating different image sizes in inference time. However, regular convolutional neural networks cannot generate images of different sizes than the input image. To address this issue, we introduced a new method for supervised learning. In our approach, a mask is generated to show the desired size and location of the object. Then the mask and the input image are fed to the network. Comparing image retargeting methods and our proposed method demonstrates the model's ability to produce high-quality retargeted images. Afterward, we compute the image quality assessment score for each output image based on different techniques and illustrate the effectiveness of our approach.Comment: 18 pages, 5 figure

    Motion capture based on RGBD data from multiple sensors for avatar animation

    Get PDF
    With recent advances in technology and emergence of affordable RGB-D sensors for a wider range of users, markerless motion capture has become an active field of research both in computer vision and computer graphics. In this thesis, we designed a POC (Proof of Concept) for a new tool that enables us to perform motion capture by using a variable number of commodity RGB-D sensors of different brands and technical specifications on constraint-less layout environments. The main goal of this work is to provide a tool with motion capture capabilities by using a handful of RGB-D sensors, without imposing strong requirements in terms of lighting, background or extension of the motion capture area. Of course, the number of RGB-D sensors needed is inversely proportional to their resolution, and directly proportional to the size of the area to track to. Built on top of the OpenNI 2 library, we made this POC compatible with most of the nonhigh-end RGB-D sensors currently available in the market. Due to the lack of resources on a single computer, in order to support more than a couple of sensors working simultaneously, we need a setup composed of multiple computers. In order to keep data coherency and synchronization across sensors and computers, our tool makes use of a semi-automatic calibration method and a message-oriented network protocol. From color and depth data given by a sensor, we can also obtain a 3D pointcloud representation of the environment. By combining pointclouds from multiple sensors, we can collect a complete and animated 3D pointcloud that can be visualized from any viewpoint. Given a 3D avatar model and its corresponding attached skeleton, we can use an iterative optimization method (e.g. Simplex) to find a fit between each pointcloud frame and a skeleton configuration, resulting in 3D avatar animation when using such skeleton configurations as key frames


    Get PDF
    With the popular usage of personal image devices and the continued increase of computing power, casual users need to handle a large number of images on computers. Image management is challenging because in addition to searching and browsing textual metadata, we also need to address two additional challenges. First, thumbnails, which are representative forms of original images, require significant screen space to be represented meaningfully. Second, while image metadata is crucial for managing images, creating metadata for images is expensive. My research on these issues is composed of three components which address these problems. First, I explore a new way of browsing a large number of images. I redesign and implement a zoomable image browser, PhotoMesa, which is capable of showing thousands of images clustered by metadata. Combined with its simple navigation strategy, the zoomable image environment allows users to scale up the size of an image collection they can comfortably browse. Second, I examine tradeoffs of displaying thumbnails in limited screen space. While bigger thumbnails use more screen space, smaller thumbnails are hard to recognize. I introduce an automatic thumbnail cropping algorithm based on a computer vision saliency model. The cropped thumbnails keep the core informative part and remove the less informative periphery. My user study shows that users performed visual searches more than 18% faster with cropped thumbnails. Finally, I explore semi-automatic annotation techniques to help users make accurate annotations with low effort. Automatic metadata extraction is typically fast but inaccurate while manual annotation is slow but accurate. I investigate techniques to combine these two approaches. My semi-automatic annotation prototype, SAPHARI, generates image clusters which facilitate efficient bulk annotation. For automatic clustering, I present hierarchical event clustering and clothing based human recognition. Experimental results demonstrate the effectiveness of the semi-automatic annotation when applied on personal photo collections. Users were able to make annotation 49% and 6% faster with the semi-automatic annotation interface on event and face tasks, respectively

    Automatic Mobile Video Remixing and Collaborative Watching Systems

    Get PDF
    In the thesis, the implications of combining collaboration with automation for remix creation are analyzed. We first present a sensor-enhanced Automatic Video Remixing System (AVRS), which intelligently processes mobile videos in combination with mobile device sensor information. The sensor-enhanced AVRS system involves certain architectural choices, which meet the key system requirements (leverage user generated content, use sensor information, reduce end user burden), and user experience requirements. Architecture adaptations are required to improve certain key performance parameters. In addition, certain operating parameters need to be constrained, for real world deployment feasibility. Subsequently, sensor-less cloud based AVRS and low footprint sensorless AVRS approaches are presented. The three approaches exemplify the importance of operating parameter tradeoffs for system design. The approaches cover a wide spectrum, ranging from a multimodal multi-user client-server system (sensor-enhanced AVRS) to a mobile application which can automatically generate a multi-camera remix experience from a single video. Next, we present the findings from the four user studies involving 77 users related to automatic mobile video remixing. The goal was to validate selected system design goals, provide insights for additional features and identify the challenges and bottlenecks. Topics studied include the role of automation, the value of a video remix as an event memorabilia, the requirements for different types of events and the perceived user value from creating multi-camera remix from a single video. System design implications derived from the user studies are presented. Subsequently, sport summarization, which is a specific form of remix creation is analyzed. In particular, the role of content capture method is analyzed with two complementary approaches. The first approach performs saliency detection in casually captured mobile videos; in contrast, the second one creates multi-camera summaries from role based captured content. Furthermore, a method for interactive customization of summary is presented. Next, the discussion is extended to include the role of users’ situational context and the consumed content in facilitating collaborative watching experience. Mobile based collaborative watching architectures are described, which facilitate a common shared context between the participants. The concept of movable multimedia is introduced to highlight the multidevice environment of current day users. The thesis presents results which have been derived from end-to-end system prototypes tested in real world conditions and corroborated with extensive user impact evaluation

    Rockscapes:A Study of Forms in the Natural Formations of Hyderabad

    Get PDF
    Rock formations in the Deccan Plateau are very old; some of them are older than 2.5 million years. Geologically, rocks constitute of various mineral compositions within the core and these decide how they are shaped due to weathering over many years. These beautifully weathered landscapes are affected by the recent rapid urbanization. Thus by photographically studying the forms and divulging the inner souls, this project attempts to sensitize a viewer towards these rockscapes. Photographs are presented in square format to highlight the form and texture. As per the psychology of shapes, square is quite balanced and that encourages the viewer to move around within the frame. It provides a clutter free and simple composition. In addition, the images are printed in monochrome to eliminate the visual dominance of color, to emphasize form and texture, to feel the timelessness and to amplify the use of negative space. By grouping the images, the subject matter is presented to the viewer with intended emphasis – singles, sky, plants, shadow and radials

    Adaptive Layout for Interactive Documents

    Get PDF
    This thesis presents a novel approach to create automated layouts for rich illustrative material that could adapt according to the screen size and contextual requirements. The adaption not only considers global layout but also deals with the content and layout adaptation of individual illustrations in the layout. An unique solution has been developed that integrates constraint-based and force-directed techniques to create adaptive grid-based and non-grid layouts. A set of annotation layouts are developed which adapt the annotated illustrations to match the contextual requirements over time

    City-Scaled Digital Documentation: A Comparative Analysis of Digital Documentation Technologies for Recording Architectural Heritage

    Get PDF
    The historic preservation field, enabled by advances in technology, has demonstrated an increased interest in digitizing cultural heritage sites and historic structures. Increases in software capabilities as well as greater affordability has fostered augmented use of digital documentation technologies for architectural heritage applications. Literature establishes four prominent categories of digital documentation tools for preservation: laser scanning, photogrammetry, multimedia geographic information systems (GIS) and three-dimensional modeling. Thoroughly explored through published case studies, the documentation techniques for recording heritage are most often integrated. Scholarly literature does not provide a parallel comparison of the four technologies. A comparative analysis of the four techniques, as presented in this thesis, makes it possible for cities to understand the most applicable technique for their preservation objectives. The thesis analyzes four cases studies that employ applications of the technologies: New Orleans Laser Scanning, University of Maryland Photogrammetry, Historic Columbia Maps Project and the Virtual Historic Savannah Project. Following this, the thesis undertakes a trial of each documentation technology – laser scanning, photogrammetry, multimedia GIS and three-dimensional modeling – utilizing a block on Church Street between Queen and Chalmers streets within the Charleston Historic District. The apparent outcomes of each of the four techniques is analyzed according to a series of parameters including: audience, application, efficacy in recordation, refinement, expertise required, manageability of the product, labor intensity and necessary institutional capacity. A concluding matrix quantifies the capability of each of the technologies in terms of the parameters. This method furnishes a parallel comparison of the techniques and their efficacy in architectural heritage documentation within mid-sized cities

    Entropy in Image Analysis II

    Get PDF
    Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

    Computational Media Aesthetics for Media Synthesis

    Get PDF