55 research outputs found

    Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection

    Full text link
    Self-supervised Video Representation Learning (VRL) aims to learn transferrable representations from uncurated, unlabeled video streams that could be utilized for diverse downstream tasks. With recent advances in Masked Image Modeling (MIM), in which the model learns to predict randomly masked regions in the images given only the visible patches, MIM-based VRL methods have emerged and demonstrated their potential by significantly outperforming previous VRL methods. However, they require an excessive amount of computations due to the added temporal dimension. This is because existing MIM-based VRL methods overlook spatial and temporal inequality of information density among the patches in arriving videos by resorting to random masking strategies, thereby wasting computations on predicting uninformative tokens/frames. To tackle these limitations of Masked Video Modeling, we propose a new token selection method that masks our more important tokens according to the object's motions in an online manner, which we refer to as Motion-centric Token Selection. Further, we present a dynamic frame selection strategy that allows the model to focus on informative and causal frames with minimal redundancy. We validate our method over multiple benchmark and Ego4D datasets, showing that the pre-trained model using our proposed method significantly outperforms state-of-the-art VRL methods on downstream tasks, such as action recognition and object state change classification while largely reducing memory requirements during pre-training and fine-tuning.Comment: 15 page

    Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same

    Full text link
    This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elements that prompt the content generator, rather than distributing encoded data of fully finished programs. The service elements include fine-tailored text descriptions, lightweight image data of some objects, or application programming interfaces, comprehensively referred to as semantic sources, and the user terminal translates the received semantic data into video frames. Empowered by the random nature of generative AI, the users could then experience super-personalized services accordingly. The proposed idea incorporates the situations in which the user receives different service providers' element packages; a sequence of packages over time, or multiple packages at the same time. Given promised in-context coherence and content integrity, the combinatory dynamics will amplify the service diversity, allowing the users to always chance upon new experiences. This work particularly aims at short-form videos and advertisements, which the users would easily feel fatigued by seeing the same frame sequence every time. In those use cases, the content provider's role will be recast as scripting semantic sources, transformed from a thorough producer. Overall, this work explores a new form of media ecosystem facilitated by receiver-embedded generative models, featuring both random content dynamics and enhanced delivery efficiency simultaneously.Comment: 13 pages, 7 figure

    Localization Uncertainty Estimation for Anchor-Free Object Detection

    Full text link
    Since many safety-critical systems, such as surgical robots and autonomous driving cars, are in unstable environments with sensor noise and incomplete data, it is desirable for object detectors to take into account the confidence of localization prediction. There are three limitations of the prior uncertainty estimation methods for anchor-based object detection. 1) They model the uncertainty based on object properties having different characteristics, such as location (center point) and scale (width, height). 2) they model a box offset and ground-truth as Gaussian distribution and Dirac delta distribution, which leads to the model misspecification problem. Because the Dirac delta distribution is not exactly represented as Gaussian, i.e., for any μ\mu and Σ\Sigma. 3) Since anchor-based methods are sensitive to hyper-parameters of anchor, the localization uncertainty modeling is also sensitive to these parameters. Therefore, we propose a new localization uncertainty estimation method called Gaussian-FCOS for anchor-free object detection. Our method captures the uncertainty based on four directions of box offsets~(left, right, top, bottom) that have similar properties, which enables to capture which direction is uncertain and provide a quantitative value in range~[0, 1]. To this end, we design a new uncertainty loss, negative power log-likelihood loss, to measure uncertainty by weighting IoU to the likelihood loss, which alleviates the model misspecification problem. Experiments on COCO datasets demonstrate that our Gaussian-FCOS reduces false positives and finds more missing-objects by mitigating over-confidence scores with the estimated uncertainty. We hope Gaussian-FCOS serves as a crucial component for the reliability-required task

    OF@TEIN: An OpenFlow-enabled SDN Testbed over International SmartX Rack Sites

    Get PDF
    In this paper, we will discuss our on-going effort for OF@TEIN SDN(Software-Defined Networking) testbed, which currently spans over Korea and fiveSouth-East Asian (SEA) collaborators with internationally deployed OpenFlowenabledSmartX Racks

    Selective Growth and Contact Gap-Fill of Low Resistivity Si via Microwave Plasma-Enhanced CVD

    No full text
    Low resistivity polycrystalline Si could be selectively grown in the deep (~200 nm) and narrow patterns (~20 nm) of 20 nm pitch design rule DRAM (Dynamic Random Access Memory) by microwave plasma-enhanced chemical vapor deposition (MW-CVD). We were able to achieve the high phosphorus (CVD gap-fill in a large electrical contact area which does is affected by line pitch size) doping concentration (>2.5 × 1021 cm−3) and, thus, a low resistivity by adjusting source gas (SiH4, H2, PH3) decomposition through MW-CVD with a showerhead controlling the decomposition of source gases by using two different gas injection paths. In this study, a selective growth mechanism was applied by using the deposition/etch cyclic process to achieve the bottom–up process in the L-shaped contact, using H2 plasma that simultaneously promoted the deposition and the etch processes. Additionally, the cyclic selective growth technique was set up by controlling the SiH4 flow rate. The bottom-up process resulted in a uniform doping distribution, as well as an excellent filling capacity without seam and center void formation. Thus, low contact resistivity and higher transistor on-current could be achieved at a high and uniform phosphorus (P)-concentration. Compared to the conventional thermal, this method is expected to be a strong candidate for the complicated deep and narrow contact process
    • …
    corecore