549 research outputs found

    Automatic mashup generation of multiple-camera videos

    Get PDF
    The amount of user generated video content is growing enormously with the increase in availability and affordability of technologies for video capturing (e.g. camcorders, mobile-phones), storing (e.g. magnetic and optical devices, online storage services), and sharing (e.g. broadband internet, social networks). It has become a common sight at social occasions like parties, concerts, weddings, vacations that many people are shooting videos at approximately the same time. Such concurrent recordings provide multiple views of the same event. In professional video production, the use of multiple cameras is very common. In order to compose an interesting video to watch, audio and video segments from different recordings are mixed into a single video stream. However, in case of non-professional recordings, mixing different camera recordings is not common as the process is considered very time consuming and requires expertise to do. In this thesis, we research on how to automatically combine multiple-camera recordings in a single video stream, called as a mashup. Since non-professional recordings, in general, are characterized by low signal quality and lack of artistic appeal, our objective is to use mashups to enrich the viewing experience of such recordings. In order to define a target application and collect requirements for a mashup, we conducted a study by involving experts on video editing and general camera users by means of interviews and focus groups. Based on the study results, we decided to work on the domain of concert video. We listed the requirements for concert video mashups such as image quality, diversity, and synchronization. According to the requirements, we proposed a solution approach for mashup generation and introduced a formal model consisting of pre-processing, mashupcomposition and post-processing steps. This thesis describes the pre-processing and mashup-composition steps, which result in the automatic generation of a mashup satisfying a set of the elicited requirements. At the pre-processing step, we synchronized multiple-camera recordings to be represented in a common time-line. We proposed and developed synchronization methods based on detecting and matching audio and video features extracted from the recorded content. We developed three realizations of the approach using different features: still-camera flashes in video, audio-fingerprints and audio-onsets. The realizations are independent of the frame rate of the recordings, the number of cameras and provide the synchronization offset accuracy at frame level. Based on their performance in a common data-set, audio-fingerprint and audio-onset were found as the most suitable to apply in generating mashups of concert videos. In the mashup-composition step, we proposed an optimization based solution to compose a mashup from the synchronized recordings. The solution is based on maximizing an objective function containing a number of parameters, which represent the requirements that influence the mashup quality. The function is subjected to a number of constraints, which represent the requirements that must be fulfilled in a mashup. Different audio-visual feature extraction and analysis techniques were employed to measure the degree of fulfillment of the requirements represented in the objective function. We developed an algorithm, first-fit, to compose a mashup satisfying the constraints and maximizing the objective function. Finally, to validate our solution approach, we evaluated the mashups generated by the first-fit algorithm with the ones generated by two other methods. In the first method, naive, a mashup was generated by satisfying only the requirements given as constraints and in the second method, manual, a mashup was created by a professional. In the objective evaluation, first-fit mashups scored higher than both the manual and naive mashups. To assess the end-user satisfaction, we also conducted a user study where we measured user preferences on the mashups generated by the three methods on different aspects of mashup quality. In all the aspects, the naive mashup scored significantly low, while the manual and first-fit mashups scored similarly. We can conclude that the perceived quality of a mashup generated by the naive method is lower than first-fit and manual while the perceived quality of the mashups generated by first-fit and manual methods are similar

    Grand Challenges in Music Information Research

    Get PDF
    This paper discusses some grand challenges in which music information research will impact our daily lives and our society in the future. Here, some fundamental questions are how to provide the best music for each person, how to predict music trends, how to enrich human-music relationships, how to evolve new music, and how to address environmental, energy issues by using music technologies. Our goal is to increase both attractiveness and social impacts of music information research in the future through such discussions and developments

    Multimodal Video Analysis and Modeling

    Get PDF
    From recalling long forgotten experiences based on a familiar scent or on a piece of music, to lip reading aided conversation in noisy environments or travel sickness caused by mismatch of the signals from vision and the vestibular system, the human perception manifests countless examples of subtle and effortless joint adoption of the multiple senses provided to us by evolution. Emulating such multisensory (or multimodal, i.e., comprising multiple types of input modes or modalities) processing computationally offers tools for more effective, efficient, or robust accomplishment of many multimedia tasks using evidence from the multiple input modalities. Information from the modalities can also be analyzed for patterns and connections across them, opening up interesting applications not feasible with a single modality, such as prediction of some aspects of one modality based on another. In this dissertation, multimodal analysis techniques are applied to selected video tasks with accompanying modalities. More specifically, all the tasks involve some type of analysis of videos recorded by non-professional videographers using mobile devices.Fusion of information from multiple modalities is applied to recording environment classification from video and audio as well as to sport type classification from a set of multi-device videos, corresponding audio, and recording device motion sensor data. The environment classification combines support vector machine (SVM) classifiers trained on various global visual low-level features with audio event histogram based environment classification using k nearest neighbors (k-NN). Rule-based fusion schemes with genetic algorithm (GA)-optimized modality weights are compared to training a SVM classifier to perform the multimodal fusion. A comprehensive selection of fusion strategies is compared for the task of classifying the sport type of a set of recordings from a common event. These include fusion prior to, simultaneously with, and after classification; various approaches for using modality quality estimates; and fusing soft confidence scores as well as crisp single-class predictions. Additionally, different strategies are examined for aggregating the decisions of single videos to a collective prediction from the set of videos recorded concurrently with multiple devices. In both tasks multimodal analysis shows clear advantage over separate classification of the modalities.Another part of the work investigates cross-modal pattern analysis and audio-based video editing. This study examines the feasibility of automatically timing shot cuts of multi-camera concert recordings according to music-related cutting patterns learnt from professional concert videos. Cut timing is a crucial part of automated creation of multicamera mashups, where shots from multiple recording devices from a common event are alternated with the aim at mimicing a professionally produced video. In the framework, separate statistical models are formed for typical patterns of beat-quantized cuts in short segments, differences in beats between consecutive cuts, and relative deviation of cuts from exact beat times. Based on music meter and audio change point analysis of a new recording, the models can be used for synthesizing cut times. In a user study the proposed framework clearly outperforms a baseline automatic method with comparably advanced audio analysis and wins 48.2 % of comparisons against hand-edited videos

    Mashing through the Conventions: Convergence of Popular and Classical Music in the Works of The Piano Guys

    Full text link
    This dissertation is dedicated to examining the symbiosis between popular music and Western classical music in classical/popular mashups––a new style within the classical crossover genre. The research features the works of The Piano Guys, a contemporary ensemble that combines classical crossover characteristics and the techniques from modern sample-based styles to reconceptualize and reuse classical and popular works. This fusion demonstrates a new approach to presenting multi-genre works, forming a separate musical and cultural niche for this creative practice. This dissertation consists of three chapters. The first chapter is further divided into two thematic discourses: genre and authorship. The research draws on Eric Drott’s (2013) position that contemporary genre definition is a heterogeneous product of technological and cultural shifts in creation, production and presentation of music. Following Thomas Johnson’s (2018) research on genre in post-millennial popular music, the first part of the chapter traces chronological developments of genre categorizations and attempts to place classical/popular mashups as a separate style within the contemporary genre framework. The second part investigates the transformations and the current state of authorship attributions in popular music and illustrates how group creativity and consumer participation prompt multiple authorial distributions in classical/popular mashups. Applying Topic Theory established by Robert Hatten (1985) and Kofi Agawu (1992) and concepts of intertexuality developed by Serge Lacasse (2000, 2018) to the works of The Piano Guys and other musical works of the same style, the second chapter presents a comparative analysis, revealing a multi-layered structure of signification different from the intertextual and topical relationships found in the works of other styles. In the third chapter the detailed exploration of three works by The Piano Guys places these methodological theories in dialogue with formal analysis to draw out a series of quantifiable technical, musical and interpretive characteristics that differentiate “classically originated” mashups from similar practices in other genres

    Adapting Copyright for the Mashup Generation

    Get PDF

    Copyright’s Twilight Zone: Digital Copyright Lessons from the Vampire Blogosphere

    Get PDF
    Web 2.0 technologies, characterized by user-generated content, raise new challenges for copyright law. Online interactions involving reproductions of copyrighted works in blogs, online fan fiction, and online social networks do not comfortably fit existing copyright paradigms. It is unclear whether participants in Web 2.0 forums are creating derivative works, making legitimate fair uses of copyright works, or engaging in acts of digital copyright piracy and plagiarism. As online conduct becomes more interactive, copyright laws are less effective in creating clear signals about proscribed conduct. This article examines the application of copyright law to Web 2.0 technologies. It suggests that social norms must take on greater significance because of the community-oriented nature of much of today’s online conduct. Social norms are significant both as a form of social regulation and because they can guide law and policy makers about appropriate new directions for copyright law reform. This article focuses on four case studies involving the popular Twilight book and movie franchise. These case studies illuminate the relationship between copyright norms and laws in the Web 2.0 context. The author draws lessons from the case studies that might inform future developments in copyright law and policy that would better align laws with expectations of Web 2.0 participants. Twilight is chosen as the focal point because of the complex online relationships that have developed in recent years between the various copyright stakeholders: the book author; movie directors; producers and distributors of the books and movies; actors and production crews; and, the fans

    A Pedagogy for Original Synners

    Get PDF
    Part of the Volume on Digital Young, Innovation, and the UnexpectedThis essay begins by speculating about the learning environment of the class of 2020. It takes place entirely in a virtual world, populated by simulated avatars, managed through the pedagogy of gaming. Based on this projected version of a future-now-in-formation, the authors consider the implications of the current paradigm shift that is happening at the edges of institutions of higher education. From the development of programs in multimedia literacy to the focus on the creation of hybrid learning spaces (that combine the use of virtual worlds, social networking applications, and classroom activities), the scene of learning as well as the subjects of education are changing. The figure of the Original Synner is a projection of the student-of-the-future whose foundational literacy is grounded in their ability to synthesize information from multiple information streams

    The BG News December 12, 2014

    Get PDF
    The BGSU campus student newspaper, December 12, 2014 Volume 94 - Issue 47https://scholarworks.bgsu.edu/bg-news/9802/thumbnail.jp
    corecore