4 research outputs found

    Edge-Sensitive Human Cutout With Hierarchical Granularity and Loopy Matting Guidance

    No full text

    Scale-Adaptive Video Understanding.

    Full text link
    The recent rise of large-scale, diverse video data has urged a new era of high-level video understanding. It is increasingly critical for intelligent systems to extract semantics from videos. In this dissertation, we explore the use of supervoxel hierarchies as a type of video representation for high-level video understanding. The supervoxel hierarchies contain rich multiscale decompositions of video content, where various structures can be found at various levels. However, no single level of scale contains all the desired structures we need. It is essential to adaptively choose the scales for subsequent video analysis. Thus, we present a set of tools to manipulate scales in supervoxel hierarchies including both scale generation and scale selection methods. In our scale generation work, we evaluate a set of seven supervoxel methods in the context of what we consider to be a good supervoxel for video representation. We address a key limitation that has traditionally prevented supervoxel scale generation on long videos. We do so by proposing an approximation framework for streaming hierarchical scale generation that is able to generate multiscale decompositions for arbitrarily-long videos using constant memory. Subsequently, we present two scale selection methods that are able to adaptively choose the scales according to application needs. The first method flattens the entire supervoxel hierarchy into a single segmentation that overcomes the limitation induced by trivial selection of a single scale. We show that the selection can be driven by various post hoc feature criteria. The second scale selection method combines the supervoxel hierarchy with a conditional random field for the task of labeling actors and actions in videos. We formulate the scale selection problem and the video labeling problem in a joint framework. Experiments on a novel large-scale video dataset demonstrate the effectiveness of the explicit consideration of scale selection in video understanding. Aside from the computational methods, we present a visual psychophysical study to quantify how well the actor and action semantics in high-level video understanding are retained in supervoxel hierarchies. The ultimate findings suggest that some semantics are well-retained in the supervoxel hierarchies and can be used for further video analysis.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133202/1/cliangxu_1.pd

    Efficient Representation Learning With Graph Neural Networks

    Get PDF
    Graph neural networks (GNNs) have emerged as the dominant paradigm for graph representation learning, igniting widespread interest in utilizing sophisticated GNNs for diverse computer vision tasks in various domains, including visual SLAM, 3D object recognition and segmentation, as well as visual perception with event cameras. However, the applications of these GNNs often rely on cumbersome GNN architectures for favorable performance, posing challenges for real-time interaction, particularly in edge computing scenarios. This is particularly relevant in cases such as autonomous driving, where timely responses are crucial for handling complex traffic conditions. The objective of this thesis is to contribute to the advancement of learning efficient representations using lightweight GNNs, enabling their effective deployment in resource-constrained environments. To achieve this goal, the thesis explores various efficient learning schemes, focusing on four key aspects: the data side, the model side, the data-model side, and the application side. In terms of data-driven efficient learning, the thesis proposes an adaptive data modification scheme that allows a pre-trained model to be repurposed for multiple designated downstream tasks in a resource-efficient manner, without the need for re-training or fine-tuning. For model-centric efficiency, the thesis introduces a multi-talented and lightweight architecture, without accessing human annotations, that can integrate the expertise of the pre-trained complex GNNs specializing in different tasks. Furthermore, the thesis explores a dedicated binarization scheme on the data-model side that converts both input data and model parameters into 1-bit representations, resulting in lightweight 1-bit architectures. Finally, the thesis investigates an application-specific efficient learning scheme that models the style transfer process as message passing in GNNs, enabling efficient semi-parametric stylization
    corecore