13 research outputs found
Recommended from our members
Teaching People and Machines to Enhance Images
Procedural tasks such as following a recipe or editing an image are very common. They require a person to execute a sequence of operations (e.g. chop onions, or sharpen the image) in order to achieve the goal of the task. People commonly use step-by-step tutorials to learn these tasks. We focus on software tutorials, more specifically photo manipulation tutorials, and present a set of tools and techniques to help people learn, compare and automate photo manipulation procedures. We describe three different systems that are each designed to help with a different stage in acquiring procedural knowledge.Today, people primarily rely on hand-crafted tutorials in books and on websites to learn photo manipulation procedures. However, putting together a high quality step-by-step tutorial is a time-consuming process. As a consequence, many online tutorials are poorly designed which can lead to confusion and slow down the learning process. We present a demonstration-based system for automatically generating succinct step-by-step visual tutorials of photo manipulations. An author first demonstrates the manipulation using an instrumented version of GIMP (GNU Image Manipulation Program) that records all changes in interface and application state. From the example recording, our system automatically generates tutorials that illustrate the manipulation using images, text, and annotations. It leverages automated image labeling (recognition of facial features and outdoor scene structures in our implementation) to generate more precise text descriptions of many of the steps in the tutorials. A user study finds that our tutorials are effective for learning the steps of a procedure; users are 20-44% faster and make 60-95% fewer errors when using our tutorials than when using screencapture video tutorials or hand-designed tutorials.We also demonstrate a new interface that allows learners to navigate, explore and compare large collections (i.e. thousands) of photo manipulation tutorials based on their command-level structure. Sites such as tutorialized.com or good-tutorials.com collect tens of thousands of photo manipulation tutorials. These collections typically contain many different tutorials for the same task. For example, there are many different tutorials that describe how to recolor the hair of a person in an image. Learners often want to compare these tutorials to understand the different ways a task can be done. They may also want to identify common strategies that are used across tutorials for a variety of tasks. However, the large number of tutorials in these collections and their inconsistent formats can make it difficult for users to systematically explore and compare them. Current tutorial collections do not exploit the underlying command-level structure of tutorials, and to explore the collection users have to either page through long lists of tutorial titles or perform keyword searches on the natural language tutorial text. We present a new browsing interface to help learners navigate, explore and compare collections of photo manipulation tutorials based on their command-level structure. Our browser indexes tutorials by their commands, identifies common strategies within the tutorial collection, and highlights the similarities and differences between sets of tutorials that execute the same task. User feedback suggests that our interface is easy to understand and use, and that users find command-level browsing to be useful for exploring large tutorial collections. They strongly preferred to explore tutorial collections with our browser over keyword search. Finally, we present a framework for generating content-adaptive macros (programs) that can transfer complex photo manipulation procedures to new target images. After learners master a photo manipulation procedure, they often repeatedly apply it to multiple images. For example, they might routinely apply the same vignetting effect to all their photographs. This process can be very tedious especially for procedures that involve many steps. While image manipulation programs provide basic macro authoring tools that allow users to record and then replay a sequence of operations, these macros are very brittle and cannot adapt to new images. We present a more comprehensive approach for generating content-adaptive macros that can automatically transfer operations to new target images. To create these macro, we make use of multiple training demonstrations. Specifically, we use automated image labeling and machine learning techniques to to adapt the parameters of each operation to the new image content. We show that our framework is able to learn a large class of the most commonly-used manipulations using as few as 20 training demonstrations. Our content-adaptive macros allow users to transfer photo manipulation procedures with a single button click and thereby significantly simplify repetitive procedures.Together these tools provide an automated system to create high quality step-by-step tutorials, allow users to explore and compare large collections of online tutorials, and facilitate repetitive procedures by automating the transfer of photo manipulations. As more and more instructional material appears online, we believe that providing such tools for learning, comparing and automating procedures will be essential to help people work efficiently with software tools
Visual transcripts: lecture notes from blackboard-style lecture videos
Blackboard-style lecture videos are popular, but learning using existing video player interfaces can be challenging. Viewers cannot consume the lecture material at their own pace, and the content is also difficult to search or skim. For these reasons, some people prefer lecture notes to videos. To address these limitations, we present Visual Transcripts, a readable representation of lecture videos that combines visual information with transcript text. To generate a Visual Transcript, we first segment the visual content of a lecture into discrete visual entities that correspond to equations, figures, or lines of text. Then, we analyze the temporal correspondence between the transcript and visuals to determine how sentences relate to visual entities. Finally, we arrange the text and visuals in a linear layout based on these relationships. We compare our result with a standard video player, and a state-of-the-art interface designed specifically for blackboard-style lecture videos. User evaluation suggests that users prefer our interface for learning and that our interface is effective in helping them browse or search through lecture videos.Quanta Computer (Firm)Royal Dutch-Shell GroupSamsung Scholarship Foundatio
Recommended from our members
Content-based tools for editing audio stories
Audio stories are an engaging form of communication that combine speech and music into compelling narratives. Existing audio editing tools force story producers to manipulate speech and music tracks via tedious, low-level waveform editing. In contrast, we present a set of tools that analyze the audio content of the speech and music and thereby allow producers to work at much higher level. Our tools address several challenges in creating audio stories, including (1) navigating and editing speech, (2) selecting appropriate music for the score, and (3) editing the music to complement the speech. Key features include a transcript-based speech editing tool that automatically propagates edits in the transcript text to the corresponding speech track; a music browser that supports searching based on emotion, tempo, key, or timbral similarity to other songs; and music retargeting tools that make it easy to combine sections of music with the speech. We have used our tools to create audio stories from a variety of raw speech sources, including scripted narratives, interviews and political speeches. Informal feedback from first-time users suggests that our tools are easy to learn and greatly facilitate the process of editing raw footage into a final story