Search CORE

5,577 research outputs found

Cross Pixel Optical Flow Similarity for Self-Supervised Learning

Author: A Dosovitskiy
A Mahendran
A Owens
D Todorovic
DJ Butler
I Misra
M Noroozi
N Cristianini
O Russakovsky
R Gao
R Zhang
Publication venue
Publication date: 15/07/2018
Field of study

We propose a novel method for learning convolutional neural image representations without manual supervision. We use motion cues in the form of optical flow, to supervise representations of static images. The obvious approach of training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose a much simpler learning goal: embed pixels such that the similarity between their embeddings matches that between their optical flow vectors. At test time, the learned deep network can be used without access to video or flow information and transferred to tasks such as image classification, detection, and segmentation. Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision in general, and is overall state of the art in self-supervised pretraining for semantic image segmentation, as demonstrated on standard benchmarks

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

Author: BG Vijay Kumar
Chandraker Manmohan
Fu Yun
Khan Zaid
Schulter Samuel
Yu Xiang
Publication venue
Publication date: 29/03/2022
Field of study

Recent progress in large-scale vision-language pre-training has shown the importance of aligning the visual and text modalities for downstream vision-language tasks. Many methods use a dual-stream architecture that fuses visual tokens and language tokens after representation learning, which aligns only at a global level and cannot extract finer-scale semantics. In contrast, we propose a single stream model that aligns the modalities at multiple levels: i) instance level, ii) fine-grained patch level, iii) conceptual semantic level. We achieve this using two novel tasks: symmetric cross-modality reconstruction and a pseudo-labeled key word prediction. In the former part, we mask the input tokens from one of the modalities and use the cross-modal information to reconstruct the masked token, thus improving fine-grained alignment between the two modalities. In the latter part, we parse the caption to select a few key words and feed it together with the momentum encoder pseudo signal to self-supervise the visual encoder, enforcing it to learn rich semantic concepts that are essential for grounding a textual token to an image region. We demonstrate top performance on a set of Vision-Language downstream tasks such as zero-shot/fine-tuned image/text retrieval, referring expression, and VQA. We also demonstrate how the proposed models can align the modalities at multiple levels.Comment: 22 pages, 7 figure

arXiv.org e-Print Archive

Recommended from our members

An assessment of the load modifying potential of model predictive controlled dynamic facades within the California context

Author: Blum DH
Gehbauer C
Lee ES
Wang T
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

California is making major strides towards meeting its greenhouse gas emission reduction goals with the transformation of its electrical grid to accommodate renewable generation, aggressive promotion of building energy efficiency, and increased emphasis on moving toward electrification of end uses (e.g., residential heating, etc.). As a result of this activity, the State is faced with significant challenges of systemwide resource adequacy, power quality and grid reliability that could be addressed in part with demand responsive (DR) load modifying strategies using controllable building technologies. Dynamic facades have the ability to potentially shift and shed loads at critical times of the day in combination with daylighting and HVAC controls. This study explores the technical potential of dynamic facades to support net load shape objectives. A model predictive controller (MPC) was designed based on reduced order thermal (Modelica) and window (Radiance) models. Using an automated workflow (involving JModelica.org and MPCPy), these models were converted and differentiated to formulate a non-linear optimization problem. A gradient-based, non-linear programming problem solver (IPOPT) was used to derive an optimal control strategy, then a post-optimization step was used to convert the solution to a discrete state for facade actuation. Continuous state modulation of the façade was also modeled. The performance of the MPC controller with and without activation of thermal mass was evaluated in a south-facing perimeter office zone with a three-zone electrochromic window for a clear sunny week during summer and winter periods in Oakland and Burbank, California. MPC strategies reduced total energy cost by 9–28% and critical coincident peak demand was reduced by up to 0.58 W/ft2-floor or 19–43% in the 4.6 m (15 ft) deep south zone on sunny summer days in Oakland compared to state-of-the-art heuristic control. Similar savings were achieved for the hotter, Burbank climate in Southern California. This outcome supports the argument that MPC control of dynamic facades can provide significant electricity cost reductions and net load management capabilities of benefit to both the building owner and evolving electrical grid

eScholarship - University of California

Formal Synthesis of Controllers for Safety-Critical Autonomous Systems: Developments and Challenges

Author: Gao Bingzhao
Yin Xiang
Yu Xiao
Publication venue
Publication date: 20/02/2024
Field of study

In recent years, formal methods have been extensively used in the design of autonomous systems. By employing mathematically rigorous techniques, formal methods can provide fully automated reasoning processes with provable safety guarantees for complex dynamic systems with intricate interactions between continuous dynamics and discrete logics. This paper provides a comprehensive review of formal controller synthesis techniques for safety-critical autonomous systems. Specifically, we categorize the formal control synthesis problem based on diverse system models, encompassing deterministic, non-deterministic, and stochastic, and various formal safety-critical specifications involving logic, real-time, and real-valued domains. The review covers fundamental formal control synthesis techniques, including abstraction-based approaches and abstraction-free methods. We explore the integration of data-driven synthesis approaches in formal control synthesis. Furthermore, we review formal techniques tailored for multi-agent systems (MAS), with a specific focus on various approaches to address the scalability challenges in large-scale systems. Finally, we discuss some recent trends and highlight research challenges in this area

arXiv.org e-Print Archive

Reinforcement Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Brains rule the world, and brain-like computation is increasingly used in computers and electronic devices. Brain-like computation is about processing and interpreting data or directly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. The first 11 chapters of this book describe and extend the scope of reinforcement learning. The remaining 11 chapters show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field

Directory of Open Access Books (DOAB)

Regularizing Deep Networks by Modeling and Predicting Label Structure

Author: Maire Michael
Mostajabi Mohammadreza
Shakhnarovich Gregory
Publication venue
Publication date: 05/04/2018
Field of study

We construct custom regularization functions for use in supervised training of deep neural networks. Our technique is applicable when the ground-truth labels themselves exhibit internal structure; we derive a regularizer by learning an autoencoder over the set of annotations. Training thereby becomes a two-phase procedure. The first phase models labels with an autoencoder. The second phase trains the actual network of interest by attaching an auxiliary branch that must predict output via a hidden layer of the autoencoder. After training, we discard this auxiliary branch. We experiment in the context of semantic segmentation, demonstrating this regularization strategy leads to consistent accuracy boosts over baselines, both when training from scratch, or in combination with ImageNet pretraining. Gains are also consistent over different choices of convolutional network architecture. As our regularizer is discarded after training, our method has zero cost at test time; the performance improvements are essentially free. We are simply able to learn better network weights by building an abstract model of the label space, and then training the network to understand this abstraction alongside the original task.Comment: to appear at CVPR 201

arXiv.org e-Print Archive

Crossref