553 research outputs found

    FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference

    Full text link
    The main obstacle to weakly supervised semantic image segmentation is the difficulty of obtaining pixel-level information from coarse image-level annotations. Most methods based on image-level annotations use localization maps obtained from the classifier, but these only focus on the small discriminative parts of objects and do not capture precise boundaries. FickleNet explores diverse combinations of locations on feature maps created by generic deep neural networks. It selects hidden units randomly and then uses them to obtain activation scores for image classification. FickleNet implicitly learns the coherence of each location in the feature maps, resulting in a localization map which identifies both discriminative and other parts of objects. The ensemble effects are obtained from a single network by selecting random hidden unit pairs, which means that a variety of localization maps are generated from a single image. Our approach does not require any additional training steps and only adds a simple layer to a standard convolutional neural network; nevertheless it outperforms recent comparable techniques on the Pascal VOC 2012 benchmark in both weakly and semi-supervised settings.Comment: To appear in CVPR 201

    Improvement of 802.11 Protocol on Fully Programmable Wireless Radio

    Get PDF
    The growth in the number of connected device usage has led to a rapidly increased data traffic on wireless network and the demand for access to high speed and stable Internet connection is becoming more prominent. However, current off the shelf wireless cards are not programmable or observable across layers of the standard protocol stack, which leads to poor practical performance. Thus, Wireless Open Access Research Platform (WARP), a scalable wireless platform providing programmable functionality at every layer of the network stack, has been used for the real-time implementation and improvement of 802.11 protocol

    After the big wind stops I see gentle waves

    Get PDF
    This thesis covers my reflections on the inspirations and the motivations behind selected works including my candidacy exhibition; Resonance and my thesis exhibition; after the big wind stops I see gentle waves. It contains my life throughout my MFA studies and the development of my art practice. Through its story-within-a-story method of narration and my describing streams of my thoughts, I am attempting to explain the processes of my development and the discoveries I have made, the little things in my daily life, and the big turning points that inspired me. My work and this document have been strongly determined by my poetic imagination and the emotional events and experiences I have had

    Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation

    Full text link
    When a deep neural network is trained on data with only image-level labeling, the regions activated in each image tend to identify only a small region of the target object. We propose a method of using videos automatically harvested from the web to identify a larger region of the target object by using temporal information, which is not present in the static image. The temporal variations in a video allow different regions of the target object to be activated. We obtain an activated region in each frame of a video, and then aggregate the regions from successive frames into a single image, using a warping technique based on optical flow. The resulting localization maps cover more of the target object, and can then be used as proxy ground-truth to train a segmentation network. This simple approach outperforms existing methods under the same level of supervision, and even approaches relying on extra annotations. Based on VGG-16 and ResNet 101 backbones, our method achieves the mIoU of 65.0 and 67.4, respectively, on PASCAL VOC 2012 test images, which represents a new state-of-the-art.Comment: ICCV 201

    Metal-organic framework based on hinged cube tessellation as transformable mechanical metamaterial

    Get PDF
    Mechanical metamaterials exhibit unusual properties, such as negative Poisson???s ratio, which are difficult to achieve in conventional materials. Rational design of mechanical metamaterials at the microscale is becoming popular partly because of the advance in three-dimensional printing technologies. However, incorporating movable building blocks inside solids, thereby enabling us to manipulate mechanical movement at the molecular scale, has been a difficult task. Here, we report a metal-organic framework, self-assembled from a porphyrin linker and a new type of Zn-based secondary building unit, serving as a joint in a hinged cube tessellation. Detailed structural analysis and theoretical calculation show that this material is a mechanical metamaterial exhibiting auxetic behavior. This work demonstrates that the topology of the framework and flexible hinges inside the structure are intimately related to the mechanical properties of the material, providing a guideline for the rational design of mechanically responsive metal-organic frameworks

    Improving Visual Prompt Tuning for Self-supervised Vision Transformers

    Full text link
    Visual Prompt Tuning (VPT) is an effective tuning method for adapting pretrained Vision Transformers (ViTs) to downstream tasks. It leverages extra learnable tokens, known as prompts, which steer the frozen pretrained ViTs. Although VPT has demonstrated its applicability with supervised vision transformers, it often underperforms with self-supervised ones. Through empirical observations, we deduce that the effectiveness of VPT hinges largely on the ViT blocks with which the prompt tokens interact. Specifically, VPT shows improved performance on image classification tasks for MAE and MoCo v3 when the prompt tokens are inserted into later blocks rather than the first block. These observations suggest that there exists an optimal location of blocks for the insertion of prompt tokens. Unfortunately, identifying the optimal blocks for prompts within each self-supervised ViT for diverse future scenarios is a costly process. To mitigate this problem, we propose a simple yet effective method that learns a gate for each ViT block to adjust its intervention into the prompt tokens. With our method, prompt tokens are selectively influenced by blocks that require steering for task adaptation. Our method outperforms VPT variants in FGVC and VTAB image classification and ADE20K semantic segmentation. The code is available at https://github.com/ryongithub/GatedPromptTuning.Comment: International Conference on Machine Learning (ICML) 202
    corecore