573 research outputs found
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference
The main obstacle to weakly supervised semantic image segmentation is the
difficulty of obtaining pixel-level information from coarse image-level
annotations. Most methods based on image-level annotations use localization
maps obtained from the classifier, but these only focus on the small
discriminative parts of objects and do not capture precise boundaries.
FickleNet explores diverse combinations of locations on feature maps created by
generic deep neural networks. It selects hidden units randomly and then uses
them to obtain activation scores for image classification. FickleNet implicitly
learns the coherence of each location in the feature maps, resulting in a
localization map which identifies both discriminative and other parts of
objects. The ensemble effects are obtained from a single network by selecting
random hidden unit pairs, which means that a variety of localization maps are
generated from a single image. Our approach does not require any additional
training steps and only adds a simple layer to a standard convolutional neural
network; nevertheless it outperforms recent comparable techniques on the Pascal
VOC 2012 benchmark in both weakly and semi-supervised settings.Comment: To appear in CVPR 201
Improvement of 802.11 Protocol on Fully Programmable Wireless Radio
The growth in the number of connected device usage has led to a rapidly increased data traffic on wireless network and the demand for access to high speed and stable Internet connection is becoming more prominent. However, current off the shelf wireless cards are not programmable or observable across layers of the standard protocol stack, which leads to poor practical performance. Thus, Wireless Open Access Research Platform (WARP), a scalable wireless platform providing programmable functionality at every layer of the network stack, has been used for the real-time implementation and improvement of 802.11 protocol
After the big wind stops I see gentle waves
This thesis covers my reflections on the inspirations and the motivations behind selected works including my candidacy exhibition; Resonance and my thesis exhibition; after the big wind stops I see gentle waves. It contains my life throughout my MFA studies and the development of my art practice. Through its story-within-a-story method of narration and my describing streams of my thoughts, I am attempting to explain the processes of my development and the discoveries I have made, the little things in my daily life, and the big turning points that inspired me. My work and this document have been strongly determined by my poetic imagination and the emotional events and experiences I have had
Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation
When a deep neural network is trained on data with only image-level labeling,
the regions activated in each image tend to identify only a small region of the
target object. We propose a method of using videos automatically harvested from
the web to identify a larger region of the target object by using temporal
information, which is not present in the static image. The temporal variations
in a video allow different regions of the target object to be activated. We
obtain an activated region in each frame of a video, and then aggregate the
regions from successive frames into a single image, using a warping technique
based on optical flow. The resulting localization maps cover more of the target
object, and can then be used as proxy ground-truth to train a segmentation
network. This simple approach outperforms existing methods under the same level
of supervision, and even approaches relying on extra annotations. Based on
VGG-16 and ResNet 101 backbones, our method achieves the mIoU of 65.0 and 67.4,
respectively, on PASCAL VOC 2012 test images, which represents a new
state-of-the-art.Comment: ICCV 201
Metal-organic framework based on hinged cube tessellation as transformable mechanical metamaterial
Mechanical metamaterials exhibit unusual properties, such as negative Poisson???s ratio, which are difficult to achieve in conventional materials. Rational design of mechanical metamaterials at the microscale is becoming popular partly because of the advance in three-dimensional printing technologies. However, incorporating movable building blocks inside solids, thereby enabling us to manipulate mechanical movement at the molecular scale, has been a difficult task. Here, we report a metal-organic framework, self-assembled from a porphyrin linker and a new type of Zn-based secondary building unit, serving as a joint in a hinged cube tessellation. Detailed structural analysis and theoretical calculation show that this material is a mechanical metamaterial exhibiting auxetic behavior. This work demonstrates that the topology of the framework and flexible hinges inside the structure are intimately related to the mechanical properties of the material, providing a guideline for the rational design of mechanically responsive metal-organic frameworks
Improving Visual Prompt Tuning for Self-supervised Vision Transformers
Visual Prompt Tuning (VPT) is an effective tuning method for adapting
pretrained Vision Transformers (ViTs) to downstream tasks. It leverages extra
learnable tokens, known as prompts, which steer the frozen pretrained ViTs.
Although VPT has demonstrated its applicability with supervised vision
transformers, it often underperforms with self-supervised ones. Through
empirical observations, we deduce that the effectiveness of VPT hinges largely
on the ViT blocks with which the prompt tokens interact. Specifically, VPT
shows improved performance on image classification tasks for MAE and MoCo v3
when the prompt tokens are inserted into later blocks rather than the first
block. These observations suggest that there exists an optimal location of
blocks for the insertion of prompt tokens. Unfortunately, identifying the
optimal blocks for prompts within each self-supervised ViT for diverse future
scenarios is a costly process. To mitigate this problem, we propose a simple
yet effective method that learns a gate for each ViT block to adjust its
intervention into the prompt tokens. With our method, prompt tokens are
selectively influenced by blocks that require steering for task adaptation. Our
method outperforms VPT variants in FGVC and VTAB image classification and
ADE20K semantic segmentation. The code is available at
https://github.com/ryongithub/GatedPromptTuning.Comment: International Conference on Machine Learning (ICML) 202
- …