36 research outputs found

    Scene Reconstruction and Visualization From Community Photo Collections

    Full text link

    Multimodal Video Analysis and Modeling

    Get PDF
    From recalling long forgotten experiences based on a familiar scent or on a piece of music, to lip reading aided conversation in noisy environments or travel sickness caused by mismatch of the signals from vision and the vestibular system, the human perception manifests countless examples of subtle and effortless joint adoption of the multiple senses provided to us by evolution. Emulating such multisensory (or multimodal, i.e., comprising multiple types of input modes or modalities) processing computationally offers tools for more effective, efficient, or robust accomplishment of many multimedia tasks using evidence from the multiple input modalities. Information from the modalities can also be analyzed for patterns and connections across them, opening up interesting applications not feasible with a single modality, such as prediction of some aspects of one modality based on another. In this dissertation, multimodal analysis techniques are applied to selected video tasks with accompanying modalities. More speciïŹcally, all the tasks involve some type of analysis of videos recorded by non-professional videographers using mobile devices.Fusion of information from multiple modalities is applied to recording environment classiïŹcation from video and audio as well as to sport type classiïŹcation from a set of multi-device videos, corresponding audio, and recording device motion sensor data. The environment classiïŹcation combines support vector machine (SVM) classiïŹers trained on various global visual low-level features with audio event histogram based environment classiïŹcation using k nearest neighbors (k-NN). Rule-based fusion schemes with genetic algorithm (GA)-optimized modality weights are compared to training a SVM classiïŹer to perform the multimodal fusion. A comprehensive selection of fusion strategies is compared for the task of classifying the sport type of a set of recordings from a common event. These include fusion prior to, simultaneously with, and after classiïŹcation; various approaches for using modality quality estimates; and fusing soft conïŹdence scores as well as crisp single-class predictions. Additionally, different strategies are examined for aggregating the decisions of single videos to a collective prediction from the set of videos recorded concurrently with multiple devices. In both tasks multimodal analysis shows clear advantage over separate classiïŹcation of the modalities.Another part of the work investigates cross-modal pattern analysis and audio-based video editing. This study examines the feasibility of automatically timing shot cuts of multi-camera concert recordings according to music-related cutting patterns learnt from professional concert videos. Cut timing is a crucial part of automated creation of multicamera mashups, where shots from multiple recording devices from a common event are alternated with the aim at mimicing a professionally produced video. In the framework, separate statistical models are formed for typical patterns of beat-quantized cuts in short segments, differences in beats between consecutive cuts, and relative deviation of cuts from exact beat times. Based on music meter and audio change point analysis of a new recording, the models can be used for synthesizing cut times. In a user study the proposed framework clearly outperforms a baseline automatic method with comparably advanced audio analysis and wins 48.2 % of comparisons against hand-edited videos

    Intelligent segmentation of lecture videos

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2002.Includes bibliographical references (leaves 58-60).by Bassam H. Chaptini.S.M

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Soundtrack recommendation for images

    Get PDF
    The drastic increase in production of multimedia content has emphasized the research concerning its organization and retrieval. In this thesis, we address the problem of music retrieval when a set of images is given as input query, i.e., the problem of soundtrack recommendation for images. The task at hand is to recommend appropriate music to be played during the presentation of a given set of query images. To tackle this problem, we formulate a hypothesis that the knowledge appropriate for the task is contained in publicly available contemporary movies. Our approach, Picasso, employs similarity search techniques inside the image and music domains, harvesting movies to form a link between the domains. To achieve a fair and unbiased comparison between different soundtrack recommendation approaches, we proposed an evaluation benchmark. The evaluation results are reported for Picasso and the baseline approach, using the proposed benchmark. We further address two efficiency aspects that arise from the Picasso approach. First, we investigate the problem of processing top-K queries with set-defined selections and propose an index structure that aims at minimizing the query answering latency. Second, we address the problem of similarity search in high-dimensional spaces and propose two enhancements to the Locality Sensitive Hashing (LSH) scheme. We also investigate the prospects of a distributed similarity search algorithm based on LSH using the MapReduce framework. Finally, we give an overview of the PicasSound|a smartphone application based on the Picasso approach.Der drastische Anstieg von verfĂŒgbaren Multimedia-Inhalten hat die Bedeutung der Forschung ĂŒber deren Organisation sowie Suche innerhalb der Daten hervorgehoben. In dieser Doktorarbeit betrachten wir das Problem der Suche nach geeigneten MusikstĂŒcken als Hintergrundmusik fĂŒr Diashows. Wir formulieren die Hypothese, dass die fĂŒr das Problem erforderlichen Kenntnisse in öffentlich zugĂ€nglichen, zeitgenössischen Filmen enthalten sind. Unser Ansatz, Picasso, verwendet Techniken aus dem Bereich der Ähnlichkeitssuche innerhalb von Bild- und Musik-Domains, um basierend auf Filmszenen eine Verbindung zwischen beliebigen Bildern und MusikstĂŒcken zu lernen. Um einen fairen und unvoreingenommenen Vergleich zwischen verschiedenen AnsĂ€tzen zur Musikempfehlung zu erreichen, schlagen wir einen Bewertungs-Benchmark vor. Die Ergebnisse der Auswertung werden, anhand des vorgeschlagenen Benchmarks, fĂŒr Picasso und einen weiteren, auf Emotionen basierenden Ansatz, vorgestellt. ZusĂ€tzlich behandeln wir zwei Effizienzaspekte, die sich aus dem Picasso Ansatz ergeben. (i) Wir untersuchen das Problem der AusfĂŒhrung von top-K Anfragen, bei denen die Ergebnismenge ad-hoc auf eine kleine Teilmenge des gesamten Indexes eingeschrĂ€nkt wird. (ii) Wir behandeln das Problem der Ähnlichkeitssuche in hochdimensionalen RĂ€umen und schlagen zwei Erweiterungen des LokalitĂ€tssensitiven Hashing (LSH) Schemas vor. ZusĂ€tzlich untersuchen wir die Erfolgsaussichten eines verteilten Algorithmus fĂŒr die Ähnlichkeitssuche, der auf LSH unter Verwendung des MapReduce Frameworks basiert. Neben den vorgenannten wissenschaftlichen Ergebnissen beschreiben wir ferner das Design und die Implementierung von PicassSound, einer auf Picasso basierenden Smartphone-Anwendung

    Video Abstracting at a Semantical Level

    Get PDF
    One the most common form of a video abstract is the movie trailer. Contemporary movie trailers share a common structure across genres which allows for an automatic generation and also reflects the corresponding moviea s composition. In this thesis a system for the automatic generation of trailers is presented. In addition to action trailers, the system is able to deal with further genres such as Horror and comedy trailers, which were first manually analyzed in order to identify their basic structures. To simplify the modeling of trailers and the abstract generation itself a new video abstracting application was developed. This application is capable of performing all steps of the abstract generation automatically and allows for previews and manual optimizations. Based on this system, new abstracting models for horror and comedy trailers were created and the corresponding trailers have been automatically generated using the new abstracting models. In an evaluation the automatic trailers were compared to the original Trailers and showed a similar structure. However, the automatically generated trailers still do not exhibit the full perfection of the Hollywood originals as they lack intentional storylines across shots

    Exploratory Browsing

    Get PDF
    In recent years the digital media has influenced many areas of our life. The transition from analogue to digital has substantially changed our ways of dealing with media collections. Today‟s interfaces for managing digital media mainly offer fixed linear models corresponding to the underlying technical concepts (folders, events, albums, etc.), or the metaphors borrowed from the analogue counterparts (e.g., stacks, film rolls). However, people‟s mental interpretations of their media collections often go beyond the scope of linear scan. Besides explicit search with specific goals, current interfaces can not sufficiently support the explorative and often non-linear behavior. This dissertation presents an exploration of interface design to enhance the browsing experience with media collections. The main outcome of this thesis is a new model of Exploratory Browsing to guide the design of interfaces to support the full range of browsing activities, especially the Exploratory Browsing. We define Exploratory Browsing as the behavior when the user is uncertain about her or his targets and needs to discover areas of interest (exploratory), in which she or he can explore in detail and possibly find some acceptable items (browsing). According to the browsing objectives, we group browsing activities into three categories: Search Browsing, General Purpose Browsing and Serendipitous Browsing. In the context of this thesis, Exploratory Browsing refers to the latter two browsing activities, which goes beyond explicit search with specific objectives. We systematically explore the design space of interfaces to support the Exploratory Browsing experience. Applying the methodology of User-Centered Design, we develop eight prototypes, covering two main usage contexts of browsing with personal collections and in online communities. The main studied media types are photographs and music. The main contribution of this thesis lies in deepening the understanding of how people‟s exploratory behavior has an impact on the interface design. This thesis contributes to the field of interface design for media collections in several aspects. With the goal to inform the interface design to support the Exploratory Browsing experience with media collections, we present a model of Exploratory Browsing, covering the full range of exploratory activities around media collections. We investigate this model in different usage contexts and develop eight prototypes. The substantial implications gathered during the development and evaluation of these prototypes inform the further refinement of our model: We uncover the underlying transitional relations between browsing activities and discover several stimulators to encourage a fluid and effective activity transition. Based on this model, we propose a catalogue of general interface characteristics, and employ this catalogue as criteria to analyze the effectiveness of our prototypes. We also present several general suggestions for designing interfaces for media collections

    Cruiser and PhoTable: Exploring Tabletop User Interface Software for Digital Photograph Sharing and Story Capture

    Get PDF
    Digital photography has not only changed the nature of photography and the photographic process, but also the manner in which we share photographs and tell stories about them. Some traditional methods, such as the family photo album or passing around piles of recently developed snapshots, are lost to us without requiring the digital photos to be printed. The current, purely digital, methods of sharing do not provide the same experience as printed photographs, and they do not provide effective face-to-face social interaction around photographs, as experienced during storytelling. Research has found that people are often dissatisfied with sharing photographs in digital form. The recent emergence of the tabletop interface as a viable multi-user direct-touch interactive large horizontal display has provided the hardware that has the potential to improve our collocated activities such as digital photograph sharing. However, while some software to communicate with various tabletop hardware technologies exists, software aspects of tabletop user interfaces are still at an early stage and require careful consideration in order to provide an effective, multi-user immersive interface that arbitrates the social interaction between users, without the necessary computer-human interaction interfering with the social dialogue. This thesis presents PhoTable, a social interface allowing people to effectively share, and tell stories about, recently taken, unsorted digital photographs around an interactive tabletop. In addition, the computer-arbitrated digital interaction allows PhoTable to capture the stories told, and associate them as audio metadata to the appropriate photographs. By leveraging the tabletop interface and providing a highly usable and natural interaction we can enable users to become immersed in their social interaction, telling stories about their photographs, and allow the computer interaction to occur as a side-effect of the social interaction. Correlating the computer interaction with the corresponding audio allows PhoTable to annotate an automatically created digital photo album with audible stories, which may then be archived. These stories remain useful for future sharing -- both collocated sharing and remote (e.g. via the Internet) -- and also provide a personal memento both of the event depicted in the photograph (e.g. as a reminder) and of the enjoyable photo sharing experience at the tabletop. To provide the necessary software to realise an interface such as PhoTable, this thesis explored the development of Cruiser: an efficient, extensible and reusable software framework for developing tabletop applications. Cruiser contributes a set of programming libraries and the necessary application framework to facilitate the rapid and highly flexible development of new tabletop applications. It uses a plugin architecture that encourages code reuse, stability and easy experimentation, and leverages the dedicated computer graphics hardware and multi-core processors of modern consumer-level systems to provide a responsive and immersive interactive tabletop user interface that is agnostic to the tabletop hardware and operating platform, using efficient, native cross-platform code. Cruiser's flexibility has allowed a variety of novel interactive tabletop applications to be explored by other researchers using the framework, in addition to PhoTable. To evaluate Cruiser and PhoTable, this thesis follows recommended practices for systems evaluation. The design rationale is framed within the above scenario and vision which we explore further, and the resulting design is critically analysed based on user studies, heuristic evaluation and a reflection on how it evolved over time. The effectiveness of Cruiser was evaluated in terms of its ability to realise PhoTable, use of it by others to explore many new tabletop applications, and an analysis of performance and resource usage. Usability, learnability and effectiveness of PhoTable was assessed on three levels: careful usability evaluations of elements of the interface; informal observations of usability when Cruiser was available to the public in several exhibitions and demonstrations; and a final evaluation of PhoTable in use for storytelling, where this had the side effect of creating a digital photo album, consisting of the photographs users interacted with on the table and associated audio annotations which PhoTable automatically extracted from the interaction. We conclude that our approach to design has resulted in an effective framework for creating new tabletop interfaces. The parallel goal of exploring the potential for tabletop interaction as a new way to share digital photographs was realised in PhoTable. It is able to support the envisaged goal of an effective interface for telling stories about one's photos. As a serendipitous side-effect, PhoTable was effective in the automatic capture of the stories about individual photographs for future reminiscence and sharing. This work provides foundations for future work in creating new ways to interact at a tabletop and to the ways to capture personal stories around digital photographs for sharing and long-term preservation

    Cruiser and PhoTable: Exploring Tabletop User Interface Software for Digital Photograph Sharing and Story Capture

    Get PDF
    Digital photography has not only changed the nature of photography and the photographic process, but also the manner in which we share photographs and tell stories about them. Some traditional methods, such as the family photo album or passing around piles of recently developed snapshots, are lost to us without requiring the digital photos to be printed. The current, purely digital, methods of sharing do not provide the same experience as printed photographs, and they do not provide effective face-to-face social interaction around photographs, as experienced during storytelling. Research has found that people are often dissatisfied with sharing photographs in digital form. The recent emergence of the tabletop interface as a viable multi-user direct-touch interactive large horizontal display has provided the hardware that has the potential to improve our collocated activities such as digital photograph sharing. However, while some software to communicate with various tabletop hardware technologies exists, software aspects of tabletop user interfaces are still at an early stage and require careful consideration in order to provide an effective, multi-user immersive interface that arbitrates the social interaction between users, without the necessary computer-human interaction interfering with the social dialogue. This thesis presents PhoTable, a social interface allowing people to effectively share, and tell stories about, recently taken, unsorted digital photographs around an interactive tabletop. In addition, the computer-arbitrated digital interaction allows PhoTable to capture the stories told, and associate them as audio metadata to the appropriate photographs. By leveraging the tabletop interface and providing a highly usable and natural interaction we can enable users to become immersed in their social interaction, telling stories about their photographs, and allow the computer interaction to occur as a side-effect of the social interaction. Correlating the computer interaction with the corresponding audio allows PhoTable to annotate an automatically created digital photo album with audible stories, which may then be archived. These stories remain useful for future sharing -- both collocated sharing and remote (e.g. via the Internet) -- and also provide a personal memento both of the event depicted in the photograph (e.g. as a reminder) and of the enjoyable photo sharing experience at the tabletop. To provide the necessary software to realise an interface such as PhoTable, this thesis explored the development of Cruiser: an efficient, extensible and reusable software framework for developing tabletop applications. Cruiser contributes a set of programming libraries and the necessary application framework to facilitate the rapid and highly flexible development of new tabletop applications. It uses a plugin architecture that encourages code reuse, stability and easy experimentation, and leverages the dedicated computer graphics hardware and multi-core processors of modern consumer-level systems to provide a responsive and immersive interactive tabletop user interface that is agnostic to the tabletop hardware and operating platform, using efficient, native cross-platform code. Cruiser's flexibility has allowed a variety of novel interactive tabletop applications to be explored by other researchers using the framework, in addition to PhoTable. To evaluate Cruiser and PhoTable, this thesis follows recommended practices for systems evaluation. The design rationale is framed within the above scenario and vision which we explore further, and the resulting design is critically analysed based on user studies, heuristic evaluation and a reflection on how it evolved over time. The effectiveness of Cruiser was evaluated in terms of its ability to realise PhoTable, use of it by others to explore many new tabletop applications, and an analysis of performance and resource usage. Usability, learnability and effectiveness of PhoTable was assessed on three levels: careful usability evaluations of elements of the interface; informal observations of usability when Cruiser was available to the public in several exhibitions and demonstrations; and a final evaluation of PhoTable in use for storytelling, where this had the side effect of creating a digital photo album, consisting of the photographs users interacted with on the table and associated audio annotations which PhoTable automatically extracted from the interaction. We conclude that our approach to design has resulted in an effective framework for creating new tabletop interfaces. The parallel goal of exploring the potential for tabletop interaction as a new way to share digital photographs was realised in PhoTable. It is able to support the envisaged goal of an effective interface for telling stories about one's photos. As a serendipitous side-effect, PhoTable was effective in the automatic capture of the stories about individual photographs for future reminiscence and sharing. This work provides foundations for future work in creating new ways to interact at a tabletop and to the ways to capture personal stories around digital photographs for sharing and long-term preservation
    corecore