293 research outputs found

    Balancing automation and user control in a home video editing system

    Get PDF
    The context of this PhD project is the area of multimedia content management, in particular interaction with home videos. Nowadays, more and more home videos are produced, shared and edited. Home videos are captured by amateur users, mainly to document their lives. People frequently edit home videos, to select and keep the best parts of their visual memories and to add to them a touch of personal creativity. However, most users find the current products for video editing timeconsuming and sometimes too technical and difficult. One reason of the large amount of time required for editing is the slow accessibility caused by the temporal dimension of videos: a video needs to be played back in order to be watched or edited. Another reason of the limitation of current video editing tools is that they are modelled too much on professional video editing systems, including technical details like frame-by-frame browsing. This thesis aims at making home video editing more efficient and easier for the non-technical, amateur user. To accomplish this goal, an approach was taken characterized by two main guidelines. We designed a semi-automatic tool, and we adopted a user-centered approach. To gain insights on user behaviours and needs related to home video editing, we designed an Internet-based survey, which was answered by 180 home video users. The results of the survey revealed the facts that video editing is done frequently and is seen as a very time-consuming activity. We also found that users with low experience with PCs often consider video editing programs too complex. Although nearly all commercial editing tools are designed for a PC, many of our respondents said to be interested in doing video editing on a TV. We created a novel concept, Edit While Watching, designed to be user-friendly. It requires only a TV set and a remote control, instead of a PC. The video that the user inputs to the system is automatically analyzed and structured in small video segments. The editing operations happen on the basis of these video segments: the user is not aware anymore of the single video frames. After the input video has been analyzed and structured, a first edited version is automatically prepared. Successively, Edit While Watching allows the user to modify and enrich the automatically edited video while watching it. When the user is satisfied, the video can be saved to a DVD or to another storage medium. We performed two iterations of system implementation and use testing to refine our concept. After the first iteration, we discovered that two requirements were insufficiently addressed: to have an overview of the video and to precisely control which video content to keep or to discard. The second version of EditWhileWatching was designed to address these points. It allows the user to visualize the video at three levels of detail: the different chapters (or scenes) of the video, the shots inside one chapter, and the timeline representation of a single shot. Also, the second version allows the users to edit the video at different levels of automation. For example, the user can choose an event in the video (e.g. a child playing with a toy) and just ask the system to automatically include more content related to it. Alternatively, if the user wants more control, he or she can precisely select which content to add to the video. We evaluated the second version of our tool by inviting nine users to edit their own home videos with it. The users judged Edit While Watching as an easy to use and fast application. However, some of them missed the possibility of enriching the video with transitions, music, text and pictures. Our test showed that the requirements of overview on the video and control in the selection of the edited material are better addressed than in the first version. Moreover, the participants were able to select which video portions to keep or to discard in a time close to the playback time of the video. The second version of Edit While Watching exploits different levels of automation. In some editing functions the user only gives an indication about editing a clip, and the system automatically decides the start and end points of the part of the video to be cut. However, there are also editing functions in which the user has complete control on the start and end points of a cut. We wanted to investigate how to balance automation and user control to optimize the perceived ease of use, the perceived control, the objective editing efficiency and the mental effort. To this aim, we implemented three types of editing functions, each type representing a different balance between automation and user control. To compare these three levels, we invited 25 users to perform pre-defined tasks with the three function types. The results showed that the type of functions with the highest level of automation performed worse than the two other types, according to both subjective and objective measurements. The other two types of functions were equally liked. However, some users clearly preferred the functions that allowed faster editing while others preferred the functions that gave full control and a more complete overview. In conclusion, on the basis of this research some design guidelines can be offered for building an easy and efficient video editing application. Such application should automatically structure the video, eliminate the detail about single frames, support a scalable video overview, implement a rich set of editing functionalities, and should be preferably TV-based

    Automatic Mobile Video Remixing and Collaborative Watching Systems

    Get PDF
    In the thesis, the implications of combining collaboration with automation for remix creation are analyzed. We first present a sensor-enhanced Automatic Video Remixing System (AVRS), which intelligently processes mobile videos in combination with mobile device sensor information. The sensor-enhanced AVRS system involves certain architectural choices, which meet the key system requirements (leverage user generated content, use sensor information, reduce end user burden), and user experience requirements. Architecture adaptations are required to improve certain key performance parameters. In addition, certain operating parameters need to be constrained, for real world deployment feasibility. Subsequently, sensor-less cloud based AVRS and low footprint sensorless AVRS approaches are presented. The three approaches exemplify the importance of operating parameter tradeoffs for system design. The approaches cover a wide spectrum, ranging from a multimodal multi-user client-server system (sensor-enhanced AVRS) to a mobile application which can automatically generate a multi-camera remix experience from a single video. Next, we present the findings from the four user studies involving 77 users related to automatic mobile video remixing. The goal was to validate selected system design goals, provide insights for additional features and identify the challenges and bottlenecks. Topics studied include the role of automation, the value of a video remix as an event memorabilia, the requirements for different types of events and the perceived user value from creating multi-camera remix from a single video. System design implications derived from the user studies are presented. Subsequently, sport summarization, which is a specific form of remix creation is analyzed. In particular, the role of content capture method is analyzed with two complementary approaches. The first approach performs saliency detection in casually captured mobile videos; in contrast, the second one creates multi-camera summaries from role based captured content. Furthermore, a method for interactive customization of summary is presented. Next, the discussion is extended to include the role of users’ situational context and the consumed content in facilitating collaborative watching experience. Mobile based collaborative watching architectures are described, which facilitate a common shared context between the participants. The concept of movable multimedia is introduced to highlight the multidevice environment of current day users. The thesis presents results which have been derived from end-to-end system prototypes tested in real world conditions and corroborated with extensive user impact evaluation

    iGeneration: The Social Cognitive Effects of Digital Technology on Teenagers

    Get PDF
    Into today’s world, digital technology changes so rapidly and integrates into our society at such an accelerated rate, it is hard to keep up with it, let alone reflect on the effects it has on our lives. Although Facebook, YouTube, and Twitter, did not exist a mere decade ago, they are now ubiquitous forms of media and communication in our culture. Today’s generation of teenagers, born in the 1990s, aptly labeled the “iGeneration”, are the most connected generation ever. These iGen teens are digital natives growing up in an era of a massive influx of technology. They do not know of a world that does not include the Internet and easy access to technology. Parents of iGen youth, however, are “digital immigrants”. As immigrants, they struggle with a learning curve and lack the innate knowledge and ease with digital technology as that of their native offspring. There is little historical data or longitudinal studies of the social cognitive effects of digital media consumption to help inform and guide digital immigrants and natives alike in making choices about digital practices. Statistics change so quickly, it makes for an ongoing challenge to understand how to structure or regulate digital consumption. The intention of this research is to better understand how digital consumption effects teenager’s cognitive abilities and socialization processes, with the goal of discovering best practices and guidelines for educators and parents to implement with regard to their teenagers’ digital consumption, as we spin faster and faster into this digital era

    An Outlook into the Future of Egocentric Vision

    Full text link
    What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.Comment: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1

    Automatic mashup generation of multiple-camera videos

    Get PDF
    The amount of user generated video content is growing enormously with the increase in availability and affordability of technologies for video capturing (e.g. camcorders, mobile-phones), storing (e.g. magnetic and optical devices, online storage services), and sharing (e.g. broadband internet, social networks). It has become a common sight at social occasions like parties, concerts, weddings, vacations that many people are shooting videos at approximately the same time. Such concurrent recordings provide multiple views of the same event. In professional video production, the use of multiple cameras is very common. In order to compose an interesting video to watch, audio and video segments from different recordings are mixed into a single video stream. However, in case of non-professional recordings, mixing different camera recordings is not common as the process is considered very time consuming and requires expertise to do. In this thesis, we research on how to automatically combine multiple-camera recordings in a single video stream, called as a mashup. Since non-professional recordings, in general, are characterized by low signal quality and lack of artistic appeal, our objective is to use mashups to enrich the viewing experience of such recordings. In order to define a target application and collect requirements for a mashup, we conducted a study by involving experts on video editing and general camera users by means of interviews and focus groups. Based on the study results, we decided to work on the domain of concert video. We listed the requirements for concert video mashups such as image quality, diversity, and synchronization. According to the requirements, we proposed a solution approach for mashup generation and introduced a formal model consisting of pre-processing, mashupcomposition and post-processing steps. This thesis describes the pre-processing and mashup-composition steps, which result in the automatic generation of a mashup satisfying a set of the elicited requirements. At the pre-processing step, we synchronized multiple-camera recordings to be represented in a common time-line. We proposed and developed synchronization methods based on detecting and matching audio and video features extracted from the recorded content. We developed three realizations of the approach using different features: still-camera flashes in video, audio-fingerprints and audio-onsets. The realizations are independent of the frame rate of the recordings, the number of cameras and provide the synchronization offset accuracy at frame level. Based on their performance in a common data-set, audio-fingerprint and audio-onset were found as the most suitable to apply in generating mashups of concert videos. In the mashup-composition step, we proposed an optimization based solution to compose a mashup from the synchronized recordings. The solution is based on maximizing an objective function containing a number of parameters, which represent the requirements that influence the mashup quality. The function is subjected to a number of constraints, which represent the requirements that must be fulfilled in a mashup. Different audio-visual feature extraction and analysis techniques were employed to measure the degree of fulfillment of the requirements represented in the objective function. We developed an algorithm, first-fit, to compose a mashup satisfying the constraints and maximizing the objective function. Finally, to validate our solution approach, we evaluated the mashups generated by the first-fit algorithm with the ones generated by two other methods. In the first method, naive, a mashup was generated by satisfying only the requirements given as constraints and in the second method, manual, a mashup was created by a professional. In the objective evaluation, first-fit mashups scored higher than both the manual and naive mashups. To assess the end-user satisfaction, we also conducted a user study where we measured user preferences on the mashups generated by the three methods on different aspects of mashup quality. In all the aspects, the naive mashup scored significantly low, while the manual and first-fit mashups scored similarly. We can conclude that the perceived quality of a mashup generated by the naive method is lower than first-fit and manual while the perceived quality of the mashups generated by first-fit and manual methods are similar

    Audeosynth: music-driven video montage

    Get PDF
    We introduce music-driven video montage, a media format that offers a pleasant way to browse or summarize video clips collected from various occasions, including gatherings and adventures. In music-driven video montage, the music drives the composition of the video content. According to musical movement and beats, video clips are organized to form a montage that visually reflects the experiential properties of the music. Nonetheless, it takes enormous manual work and artistic expertise to create it. In this paper, we develop a framework for automatically generating music-driven video montages. The input is a set of video clips and a piece of background music. By analyzing the music and video content, our system extracts carefully designed temporal features from the input, and casts the synthesis problem as an optimization and solves the parameters through Markov Chain Monte Carlo sampling. The output is a video montage whose visual activities are cut and synchronized with the rhythm of the music, rendering a symphony of audio-visual resonance.postprin

    Collaborative video editing

    Get PDF
    Samarbeid i videoredigering Denne avhandlingen tar opp følgende spørsmål: Hvordan kan vi støtte samarbeid i videoredigering? I ulike anvendelsesområder, som skriving og design, er bruk av samarbeidsverktøy utbredt. Likevel er programvare for videoredigering i hovedsak utviklet for individuell bruk. Videoredigering bør forstås som en sosial aktivitet og blir i profesjonelle sammenhenger ofte utført som et samarbeid mellom ulike aktører. Basert på intervjuer og designverksteder, undersøker denne avhandlingen hvordan videoredigerere samarbeider og utforsker mulighetsrommet for å støtte samarbeid i videoredigering gjennom design av nye løsninger. I tre studier undersøker denne avhandlingen videoredigering fra tre perspektiver. Først undersøker den samarbeidspraksiser blant profesjonelle videoredigerere og identifiserer ulike strategier og sosiale mekanismer som brukes for å oppnå enighet mellom aktørene som er involvert i videoproduksjon. Denne første studien identifiserer ni temaer som beskriver hvordan videoredigerere håndterer usikkerhet og oppnår enighet, spesielt gjennom organisatoriske mekanismer, dokumentasjon og ikoniske referanser. Studien foreslår også tre ulike retninger for design av nye løsninger for å støtte samarbeid i videoredigering. Det andre studiet undersøker videoproduksjon fra et organisatorisk perspektiv, med fokus på en pågående overgang til distribuert arbeid og dets innvirkning på videoproduksjon. Den andre studien skisserer de kortsiktige og langsiktige implikasjonene av å innføre distribuerte arbeidsformer i TV-produksjonsorganisasjoner under COVID-19-pandemien. Den siste studien ser på samarbeid i videoredigering som et designproblem og presenterer designideer for hvordan man kan støtte et slikt samarbeid. I tillegg peker denne studien på utfordringer som kan være til hinder for innføringen av nye videoredigeringsverktøy som skal støtte samarbeid. Ved å sammenstille resultatene fra de tre studiene, samt analysere tidligere forskning og eksisterende videoredigeringsverktøy, identifiserer avhandlingen tre designtilnærminger for å støtte samarbeid i videoredigeringsprogramvare: holistisk, skreddersydd og konfigurerbar. Selv om disse tilnærmingene diskuteres med tanke på samarbeid i videoredigeringspraksiser, kan de tilby et bredere analytisk rammeverk for å vurdere utformingen av samarbeidsverktøy også for andre anvendelsesområder.This thesis addresses the following question: how can collaboration be supported in video editing? In many domains, such as writing and design, collaborative tools have become common and widespread. However, video-editing software is still predominantly designed for solo users. Nevertheless, video editing is a social activity that, in a professional setting, often involves various people working together. Based on interviews and design workshops, this thesis investigates the collaborative practices of video editors and explores the design space of collaborative video editing. In three studies, this thesis looks at video editing from three angles. First, it investigates the collaborative practices of video editors and identifies the strategies and social mechanisms they employ to reach agreements with various parties involved in the videoproduction process. The first study identifies nine themes that characterise the ways video editors manage uncertainties and reach agreements, particularly through organisational mechanisms, documentation, and iconic referencing. The study also suggests three design paths to explore further. Second, it examines video production from an organisational point of view, focusing on the recent shift towards remote work and its impact on video production. The second study delineates the short-term and long-term implications of adopting remote work in TV production organisations during the COVID-19 pandemic. Third, it approaches collaborative video editing as a design problem and offers design ideas to enhance collaboration. Additionally, it uncovers challenges that might impede the adoption of new collaborative video-editing tools. In synthesising the results of the three studies, as well as analysing previous research and existing video-editing tools, this thesis identifies three design approaches for supporting collaboration in video-editing software: holistic, tailored, and configurable. While discussed in the context of collaborative video editing, these approaches offer a broader analytical framework for considering the design of collaborative production tools.Doktorgradsavhandlin
    • …
    corecore