55 research outputs found
Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection
Self-supervised Video Representation Learning (VRL) aims to learn
transferrable representations from uncurated, unlabeled video streams that
could be utilized for diverse downstream tasks. With recent advances in Masked
Image Modeling (MIM), in which the model learns to predict randomly masked
regions in the images given only the visible patches, MIM-based VRL methods
have emerged and demonstrated their potential by significantly outperforming
previous VRL methods. However, they require an excessive amount of computations
due to the added temporal dimension. This is because existing MIM-based VRL
methods overlook spatial and temporal inequality of information density among
the patches in arriving videos by resorting to random masking strategies,
thereby wasting computations on predicting uninformative tokens/frames. To
tackle these limitations of Masked Video Modeling, we propose a new token
selection method that masks our more important tokens according to the object's
motions in an online manner, which we refer to as Motion-centric Token
Selection. Further, we present a dynamic frame selection strategy that allows
the model to focus on informative and causal frames with minimal redundancy. We
validate our method over multiple benchmark and Ego4D datasets, showing that
the pre-trained model using our proposed method significantly outperforms
state-of-the-art VRL methods on downstream tasks, such as action recognition
and object state change classification while largely reducing memory
requirements during pre-training and fine-tuning.Comment: 15 page
Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same
This paper introduces a media service model that exploits artificial
intelligence (AI) video generators at the receive end. This proposal deviates
from the traditional multimedia ecosystem, completely relying on in-house
production, by shifting part of the content creation onto the receiver. We
bring a semantic process into the framework, allowing the distribution network
to provide service elements that prompt the content generator, rather than
distributing encoded data of fully finished programs. The service elements
include fine-tailored text descriptions, lightweight image data of some
objects, or application programming interfaces, comprehensively referred to as
semantic sources, and the user terminal translates the received semantic data
into video frames. Empowered by the random nature of generative AI, the users
could then experience super-personalized services accordingly. The proposed
idea incorporates the situations in which the user receives different service
providers' element packages; a sequence of packages over time, or multiple
packages at the same time. Given promised in-context coherence and content
integrity, the combinatory dynamics will amplify the service diversity,
allowing the users to always chance upon new experiences. This work
particularly aims at short-form videos and advertisements, which the users
would easily feel fatigued by seeing the same frame sequence every time. In
those use cases, the content provider's role will be recast as scripting
semantic sources, transformed from a thorough producer. Overall, this work
explores a new form of media ecosystem facilitated by receiver-embedded
generative models, featuring both random content dynamics and enhanced delivery
efficiency simultaneously.Comment: 13 pages, 7 figure
Localization Uncertainty Estimation for Anchor-Free Object Detection
Since many safety-critical systems, such as surgical robots and autonomous
driving cars, are in unstable environments with sensor noise and incomplete
data, it is desirable for object detectors to take into account the confidence
of localization prediction. There are three limitations of the prior
uncertainty estimation methods for anchor-based object detection. 1) They model
the uncertainty based on object properties having different characteristics,
such as location (center point) and scale (width, height). 2) they model a box
offset and ground-truth as Gaussian distribution and Dirac delta distribution,
which leads to the model misspecification problem. Because the Dirac delta
distribution is not exactly represented as Gaussian, i.e., for any and
. 3) Since anchor-based methods are sensitive to hyper-parameters of
anchor, the localization uncertainty modeling is also sensitive to these
parameters. Therefore, we propose a new localization uncertainty estimation
method called Gaussian-FCOS for anchor-free object detection. Our method
captures the uncertainty based on four directions of box offsets~(left, right,
top, bottom) that have similar properties, which enables to capture which
direction is uncertain and provide a quantitative value in range~[0, 1]. To
this end, we design a new uncertainty loss, negative power log-likelihood loss,
to measure uncertainty by weighting IoU to the likelihood loss, which
alleviates the model misspecification problem. Experiments on COCO datasets
demonstrate that our Gaussian-FCOS reduces false positives and finds more
missing-objects by mitigating over-confidence scores with the estimated
uncertainty. We hope Gaussian-FCOS serves as a crucial component for the
reliability-required task
OF@TEIN: An OpenFlow-enabled SDN Testbed over International SmartX Rack Sites
In this paper, we will discuss our on-going effort for OF@TEIN SDN(Software-Defined Networking) testbed, which currently spans over Korea and fiveSouth-East Asian (SEA) collaborators with internationally deployed OpenFlowenabledSmartX Racks
Selective Growth and Contact Gap-Fill of Low Resistivity Si via Microwave Plasma-Enhanced CVD
Low resistivity polycrystalline Si could be selectively grown in the deep (~200 nm) and narrow patterns (~20 nm) of 20 nm pitch design rule DRAM (Dynamic Random Access Memory) by microwave plasma-enhanced chemical vapor deposition (MW-CVD). We were able to achieve the high phosphorus (CVD gap-fill in a large electrical contact area which does is affected by line pitch size) doping concentration (>2.5 × 1021 cm−3) and, thus, a low resistivity by adjusting source gas (SiH4, H2, PH3) decomposition through MW-CVD with a showerhead controlling the decomposition of source gases by using two different gas injection paths. In this study, a selective growth mechanism was applied by using the deposition/etch cyclic process to achieve the bottom–up process in the L-shaped contact, using H2 plasma that simultaneously promoted the deposition and the etch processes. Additionally, the cyclic selective growth technique was set up by controlling the SiH4 flow rate. The bottom-up process resulted in a uniform doping distribution, as well as an excellent filling capacity without seam and center void formation. Thus, low contact resistivity and higher transistor on-current could be achieved at a high and uniform phosphorus (P)-concentration. Compared to the conventional thermal, this method is expected to be a strong candidate for the complicated deep and narrow contact process
- …