Search CORE

512 research outputs found

Non-Parametric Probabilistic Image Segmentation

Author: Andreetto Marco
Perona Pietro
Zelnik-Manor Lihi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We propose a simple probabilistic generative model for image segmentation. Like other probabilistic algorithms (such as EM on a Mixture of Gaussians) the proposed model is principled, provides both hard and probabilistic cluster assignments, as well as the ability to naturally incorporate prior knowledge. While previous probabilistic approaches are restricted to parametric models of clusters (e.g., Gaussians) we eliminate this limitation. The suggested approach does not make heavy assumptions on the shape of the clusters and can thus handle complex structures. Our experiments show that the suggested approach outperforms previous work on a variety of image segmentation tasks

CiteSeerX

Crossref

Caltech Authors

Recommended from our members

LEARNING TO RIG CHARACTERS

Author: Xu Zhan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 08/08/2023
Field of study

With the emergence of 3D virtual worlds, 3D social media, and massive online games, the need for diverse, high-quality, animation-ready characters and avatars is greater than ever. To animate characters, artists hand-craft articulation structures, such as animation skeletons and part deformers, which require significant amount of manual and laborious interaction with 2D/3D modeling interfaces. This thesis presents deep learning methods that are able to significantly automate the process of character rigging. First, the thesis introduces RigNet, a method capable of predicting an animation skeleton for an input static 3D shape in the form of a polygon mesh. The predicted skeletons match the animator expectations in joint placement and topology. RigNet also estimates surface skin weights which determine how the mesh is animated given the different skeletal poses. In contrast to prior work that fits pre-defined skeletal templates with hand-tuned objectives, RigNet is able to automatically rig diverse characters, such as humanoids, quadrupeds, toys, birds, with varying articulation structure and geometry. RigNet is based on a deep neural architecture that directly operates on the mesh representation. The architecture is trained on a diverse dataset of rigged models that we mined online and curated. The dataset includes 2.7K polygon meshes, along with their associated skeletons and corresponding skin weights. Second, the thesis introduces Morig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters. Compared to RigNet, MoRig\u27s rigging is \emph{motion-aware}: its neural network encodes motion cues from the point clouds into compact feature representations that are informative about the articulated parts of the performing character. These motion-aware features guide the inference of an appropriate skeletal rig for the input mesh. Furthermore, Morig is able to animate the rig according to the captured point cloud motion. Morig can handle diverse characters with different morphologies (e.g., humanoids, quadrupeds, toy characters). It also accounts for occluded regions in the point clouds and mismatches in the part proportions between the input mesh and captured character. Third, the thesis introduces APES, a method that takes as input 2D raster images depicting a small set of poses of a character shown in a sprite sheet, and identifies articulated parts useful for rigging the character. APES uses a combination of neural network inference and integer linear programming to identify a compact set of articulated body parts, e.g. head, torso and limbs, that best reconstruct the input poses. Compared to Morig and RigNet that require a large collection of training models with associated skeletons and skinning weights, APES\u27 neural architecture relies on less effortful supervision from (i) pixel correspondences readily available in existing large cartoon image datasets (e.g., Creative Flow), (ii) a relatively small dataset of 57 cartoon characters segmented into moving parts. Finally, the thesis discusses future research directions related to combining neural rigging with 3D and 4D reconstruction of characters from point cloud data and 2D video as well as automating the process of motion synthesis for 3D characters

ScholarWorks@UMass Amherst

Outbreak: Lessons Learned from Developing a “History Game”

Author: Bachynski John
Kee Kevin
Publication venue: Canadian Game Studies Association
Publication date: 17/09/2009
Field of study

This paper describes the production of Outbreak, a game focused on the 1885 smallpox epidemic in Montreal. It is a preliminary report on the manner in which, by both theorizing about and building a game, we are responding to some of the questions that have animated the literature on computer games for history. The article begins with a survey of publications by researchers who have studied the capacity of games to support learning, and outlined how these can be used in concert with books and other media. We next provide the context to our project, which was conceived to market a film to be broadcast on television, and support a book on which the film was based – a bestselling history of a preventable tragedy that resulted in the deaths of over 3,000 Montrealers. We outline how we built from the book, creating a game that asked the player to save as many as possible from death, using tools that mimicked that which was available in the late nineteenth century. We conclude by reflecting on the lessons that we learned, and how we will apply these to our present and future projects

Loading - The Journal of the Canadian Game Studies Association

Gaming Fluencies: Pathways into Participatory Culture in a Community Design Studio

Author: Kafal Yasmin B.
Peppler Kylie A.
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Many recent efforts to promote new literacies involve the promotion of creative media production as a way to foster youth’s literate engagement with digital media. Those interested in gaming literacies view game design as a way to engage youth in reflective and critical reading of the gaming culture. In this paper, we propose the concept of “gaming fluencies” to promote game design as a context in which youth not only learn to read but also to produce digital media in creative ways. Gaming fluencies also present the added benefit of addressing equity issues of participation in the new media literacy landscape. We report on an ethnographic study that documented urban youth producing digital games in a community technology center. Our analyses focus on an archive of 643 game designs collected over a 24-month period, selecting a random sample to identify evidence of creative and technical dimensions in game designs. In addition, we highlight three case studies of game designs to identify different pathways into the participatory culture. Our goal is to illustrate how gaming fluencies allow for a wide range of designs, provide low thresholds and high ceilings for complex projects, and make room for creative expression. In our discussion, we address how gaming fluencies represent a complementary pathway for learning and participation in today’s media culture

ScholarlyCommons@Penn

Automatic video segmentation employing object/camera modeling techniques

Author: Farin D.S.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2005
Field of study

Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

Repository TU/e

Pure OAI Repository

Doctor of Philosophy in Computing

Author: Jones Benjamin James
Publication venue: University of Utah
Publication date: 01/01/2015
Field of study

dissertationPhysics-based animation has proven to be a powerful tool for creating compelling animations for film and games. Most techniques in graphics are based on methods developed for predictive simulation for engineering applications; however, the goals for graphics applications are dramatically different than the goals of engineering applications. As a result, most physics-based animation tools are difficult for artists to work with, providing little direct control over simulation results. In this thesis, we describe tools for physics-based animation designed with artist needs and expertise in mind. Most materials can be modeled as elastoplastic: they recover from small deformations, but large deformations permanently alter their rest shape. Unfortunately, large plastic deformations, common in graphical applications, cause simulation instabilities if not addressed. Most elastoplastic simulation techniques in graphics rely on a finite-element approach where objects are discretized into a tetrahedral mesh. Using these approaches, maintaining simulation stability during large plastic flows requires remeshing, a complex and computationally expensive process. We introduce a new point-based approach that does not rely on an explicit mesh and avoids the expense of remeshing. Our approach produces comparable results with much lower implementation complexity. Points are a ubiquitous primitive for many effects, so our approach also integrates well with existing artist pipelines. Next, we introduce a new technique for animating stylized images which we call Dynamic Sprites. Artists can use our tool to create digital assets that interact in a natural, but stylized, way in virtual environments. In order to support the types of nonphysical, exaggerated motions often desired by artists, our approach relies on a heavily modified deformable body simulator, equipped with a set of new intuitive controls and an example-based deformation model. Our approach allows artists to specify how the shape of the object should change as it moves and collides in interactive virtual environments. Finally, we introduce a new technique for animating destructive scenes. Our approach is built on the insight that the most important visual aspects of destruction are plastic deformation and fracture. Like with Dynamic Sprites, we use an example-based model of deformation for intuitive artist control. Our simulator treats objects as rigid when computing dynamics but allows them to deform plastically and fracture in between timesteps based on interactions with the other objects. We demonstrate that our approach can efficiently animate the types of destructive scenes common in film and games. These animation techniques are designed to exploit artist expertise to ease creation of complex animations. By using artist-friendly primitives and allowing artists to provide characteristic deformations as input, our techniques enable artists to create more compelling animations, more easily

The University of Utah: J. Willard Marriott Digital Library

Challenges for the Research in Virtual Humans(invited paper)

Author: Thalmann D
Publication venue
Publication date: 07/03/2007
Field of study

Infoscience - École polytechnique fédérale de Lausanne

The Role of Virtual Humans in Virtual Environment Technology and Interfaces

Author: Thalmann D
Publication venue
Publication date: 20/03/2007
Field of study

Infoscience - École polytechnique fédérale de Lausanne

I Am Error

Author: Altice Nathan
Publication venue: VCU Scholars Compass
Publication date: 03/08/2012
Field of study

I Am Error is a platform study of the Nintendo Family Computer (or Famicom), a videogame console first released in Japan in July 1983 and later exported to the rest of the world as the Nintendo Entertainment System (or NES). The book investigates the underlying computational architecture of the console and its effects on the creative works (e.g. videogames) produced for the platform. I Am Error advances the concept of platform as a shifting configuration of hardware and software that extends even beyond its ‘native’ material construction. The book provides a deep technical understanding of how the platform was programmed and engineered, from code to silicon, including the design decisions that shaped both the expressive capabilities of the machine and the perception of videogames in general. The book also considers the platform beyond the console proper, including cartridges, controllers, peripherals, packaging, marketing, licensing, and play environments. Likewise, it analyzes the NES’s extension and afterlife in emulation and hacking, birthing new genres of creative expression such as ROM hacks and tool-assisted speed runs. I Am Error considers videogames and their platforms to be important objects of cultural expression, alongside cinema, dance, painting, theater and other media. It joins the discussion taking place in similar burgeoning disciplines—code studies, game studies, computational theory—that engage digital media with critical rigor and descriptive depth. But platform studies is not simply a technical discussion—it also keeps a keen eye on the cultural, social, and economic forces that influence videogames. No platform exists in a vacuum: circuits, code, and console alike are shaped by the currents of history, politics, economics, and culture—just as those currents are shaped in kind

VCU Scholars Compass