1,006 research outputs found

    Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

    Full text link
    Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots. We present Virtual Dynamic Storyboard (VDS) to allow users storyboarding shots in virtual environments, where the filming staff can easily test the settings of shots before the actual filming. VDS runs on a "propose-simulate-discriminate" mode: Given a formatted story script and a camera script as input, it generates several character animation and camera movement proposals following predefined story and cinematic rules to allow an off-the-shelf simulation engine to render videos. To pick up the top-quality dynamic storyboard from the candidates, we equip it with a shot ranking discriminator based on shot quality criteria learned from professional manual-created data. VDS is comprehensively validated via extensive experiments and user studies, demonstrating its efficiency, effectiveness, and great potential in assisting amateur video production.Comment: Project page: https://virtualfilmstudio.github.io

    LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

    Full text link
    Video creation has become increasingly popular, yet the expertise and effort required for editing often pose barriers to beginners. In this paper, we explore the integration of large language models (LLMs) into the video editing workflow to reduce these barriers. Our design vision is embodied in LAVE, a novel system that provides LLM-powered agent assistance and language-augmented editing features. LAVE automatically generates language descriptions for the user's footage, serving as the foundation for enabling the LLM to process videos and assist in editing tasks. When the user provides editing objectives, the agent plans and executes relevant actions to fulfill them. Moreover, LAVE allows users to edit videos through either the agent or direct UI manipulation, providing flexibility and enabling manual refinement of agent actions. Our user study, which included eight participants ranging from novices to proficient editors, demonstrated LAVE's effectiveness. The results also shed light on user perceptions of the proposed LLM-assisted editing paradigm and its impact on users' creativity and sense of co-creation. Based on these findings, we propose design implications to inform the future development of agent-assisted content editing.Comment: Paper accepted to the ACM Conference on Intelligent User Interfaces (ACM IUI) 202

    VISUAL NOVEL ‘DUSTBIN DREAMING’ A BULLY STORY

    Get PDF
    People at the stage of adolescence may reject help and advice from their peers or elders as an effort to grow independent. At this era of technology, most teenagers will have come in contact frequently with electronic media such as smart phones and computers, and some may even see them as their only pillar of support. This project aims to support and advice this type of schooling teenagers, of age thirteen to seventeen, on at least one issue that many of them will meet in their school life, and that is bully. Also, this project hopes to rekindle reading as an attractive hobby. This project aims to create a Visual Novel titled ‘Dustbin Dreaming’ to achieve the above objectives. A visual novel is a work of interactive fiction which features clever usage of static graphics such as anime-style art, live-action stills, and sometimes video footage. The specialized platform that will be used to construct the game is called the Ren’Py engine. In addition, supporting tools such as Adobe Photoshop will be used to acquire image and sound resources for this project. This project is expected to be completed within 6 months of time by the end of November 2012

    Engram, an application of recording and throwing back memories with your parents

    Get PDF
    Adults can recall memories from the age of three. Our brains help us sort out important things for long-term memory and more trivial things for short-term memory. 1 We pull up these pieces of memory when we need them. We can remember a lot, but we also keep forgetting, forgetting details. Therefore, humans have always had the habit of recording. Today we record our lives by posting in social media, taking photos, and so on. In most of the more modern methods, we record our life initiatively, which means we need to do something to record a piece of memory. So, we mostly only record the unique things and easily ignore the tiny things like a common dinner or a usual TV night. But those little things are actually big parts of our life experience, especially when they relate to memories about our parents. As we grow up, we no longer have enough time to spend with our parents. Our interactions with them are often routine and go unrecorded. For example, we might just sit in a room with them and talk. This usually does not make it in our life recordings, and many other routine meetings do not. The result is that our recordings do not include much about our parents, who are some of the most important people to us. Engram is an app that is designed to solve this problem of only having records of unique situations but not the tiny moments. With user permission, Engram can mark details about every second of an interaction, such as the weather during that time or the distance between users and their parents, without any proactive operation. When users go through their past recordings, the small memory hints that were recorded by the software can help them recall memories

    Abstract visualization of large-scale time-varying data

    Get PDF
    The explosion of large-scale time-varying datasets has created critical challenges for scientists to study and digest. One core problem for visualization is to develop effective approaches that can be used to study various data features and temporal relationships among large-scale time-varying datasets. In this dissertation, we first present two abstract visualization approaches to visualizing and analyzing time-varying datasets. The first approach visualizes time-varying datasets with succinct lines to represent temporal relationships of the datasets. A time line visualizes time steps as points and temporal sequence as a line. They are generated by sampling the distributions of virtual words across time to study temporal features. The key idea of time line is to encode various data properties with virtual words. We apply virtual words to characterize feature points and use their distribution statistics to measure temporal relationships. The second approach is ensemble visualization, which provides a highly abstract platform for visualizing an ensemble of datasets. Both approaches can be used for exploration, analysis, and demonstration purposes. The second component of this dissertation is an animated visualization approach to study dramatic temporal changes. Animation has been widely used to show trends, dynamic features and transitions in scientific simulations, while animated visualization is new. We present an automatic animation generation approach that simulates the composition and transition of storytelling techniques and synthesizes animations to describe various event features. We also extend the concept of animated visualization to non-traditional time-varying datasets--network protocols--for visualizing key information in abstract sequences. We have evaluated the effectiveness of our animated visualization with a formal user study and demonstrated the advantages of animated visualization for studying time-varying datasets

    Differentiator factors in the implementation of social network sites

    Get PDF
    EstĂĄgio realizado na Business Analyst da Documento CrĂ­tico - Desenvolvimento de Software, S. A. (Cardmobili) e orientado pelo Eng.ÂȘ Catarina MaiaTese de mestrado integrado. Engenharia InformĂĄtica e Computação. Faculdade de Engenharia. Universidade do Porto. 200

    Personalised video retrieval: application of implicit feedback and semantic user profiles

    Get PDF
    A challenging problem in the user profiling domain is to create profiles of users of retrieval systems. This problem even exacerbates in the multimedia domain. Due to the Semantic Gap, the difference between low-level data representation of videos and the higher concepts users associate with videos, it is not trivial to understand the content of multimedia documents and to find other documents that the users might be interested in. A promising approach to ease this problem is to set multimedia documents into their semantic contexts. The semantic context can lead to a better understanding of the personal interests. Knowing the context of a video is useful for recommending users videos that match their information need. By exploiting these contexts, videos can also be linked to other, contextually related videos. From a user profiling point of view, these links can be of high value to recommend semantically related videos, hence creating a semantic-based user profile. This thesis introduces a semantic user profiling approach for news video retrieval, which exploits a generic ontology to put news stories into its context. Major challenges which inhibit the creation of such semantic user profiles are the identification of user's long-term interests and the adaptation of retrieval results based on these personal interests. Most personalisation services rely on users explicitly specifying preferences, a common approach in the text retrieval domain. By giving explicit feedback, users are forced to update their need, which can be problematic when their information need is vague. Furthermore, users tend not to provide enough feedback on which to base an adaptive retrieval algorithm. Deviating from the method of explicitly asking the user to rate the relevance of retrieval results, the use of implicit feedback techniques helps by learning user interests unobtrusively. The main advantage is that users are relieved from providing feedback. A disadvantage is that information gathered using implicit techniques is less accurate than information based on explicit feedback. In this thesis, we focus on three main research questions. First of all, we study whether implicit relevance feedback, which is provided while interacting with a video retrieval system, can be employed to bridge the Semantic Gap. We therefore first identify implicit indicators of relevance by analysing representative video retrieval interfaces. Studying whether these indicators can be exploited as implicit feedback within short retrieval sessions, we recommend video documents based on implicit actions performed by a community of users. Secondly, implicit relevance feedback is studied as potential source to build user profiles and hence to identify users' long-term interests in specific topics. This includes studying the identification of different aspects of interests and storing these interests in dynamic user profiles. Finally, we study how this feedback can be exploited to adapt retrieval results or to recommend related videos that match the users' interests. We analyse our research questions by performing both simulation-based and user-centred evaluation studies. The results suggest that implicit relevance feedback can be employed in the video domain and that semantic-based user profiles have the potential to improve video exploration

    Multimedia resource discovery

    Get PDF
    This chapter examines the challenges and opportunities of Multimedia Information Retrieval and corresponding search engine applications. Computer technology has changed our access to information tremendously: We used to search authors or titles (which we had to know) in library cards in order to locate relevant books; now we can issue keyword searches within the full text of whole book repositories in order to identify authors, titles and locations of relevant books. What about the corresponding challenge of finding multimedia by fragments, examples and excerpts? Rather than asking for a music piece by artist and title, can we hum its tune to find it? Can doctors submit scans of a patient to identify medically similar images of diagnosed cases in a database? Can your mobile phone take a picture of a statue and tell you about its artist and significance via a service that it sends this picture to? In an attempt to answer some of these questions we get to know basic concepts of multimedia resource discovery technologies for a number of different query and document types: piggy-back text search, i.e., reducing the multimedia to pseudo text documents; automated annotation of visual components; content-based retrieval where the query is an image; and fingerprinting to match near duplicates. Some of the research challenges are given by the semantic gap between the simple pixel properties computers can readily index and high-level human concepts; related to this is an inherent technological limitation of automated annotation of images from pixels alone. Other challenges are given by polysemy, i.e., the many meanings and interpretations that are inherent in visual material and the corresponding wide range of a user’s information need. This chapter demonstrates how these challenges can be tackled by automated processing and machine learning and by utilising the skills of the user, for example through browsing or through a process that is called relevance feedback, thus putting the user at centre stage. The latter is made easier by “added value” technologies, exemplified here by summaries of complex multimedia objects such as TV news, information visualisation techniques for document clusters, visual search by example, and methods to create browsable structures within the collection
    • 

    corecore