5,577 research outputs found
Cross Pixel Optical Flow Similarity for Self-Supervised Learning
We propose a novel method for learning convolutional neural image
representations without manual supervision. We use motion cues in the form of
optical flow, to supervise representations of static images. The obvious
approach of training a network to predict flow from a single image can be
needlessly difficult due to intrinsic ambiguities in this prediction task. We
instead propose a much simpler learning goal: embed pixels such that the
similarity between their embeddings matches that between their optical flow
vectors. At test time, the learned deep network can be used without access to
video or flow information and transferred to tasks such as image
classification, detection, and segmentation. Our method, which significantly
simplifies previous attempts at using motion for self-supervision, achieves
state-of-the-art results in self-supervision using motion cues, competitive
results for self-supervision in general, and is overall state of the art in
self-supervised pretraining for semantic image segmentation, as demonstrated on
standard benchmarks
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Recent progress in large-scale vision-language pre-training has shown the
importance of aligning the visual and text modalities for downstream
vision-language tasks. Many methods use a dual-stream architecture that fuses
visual tokens and language tokens after representation learning, which aligns
only at a global level and cannot extract finer-scale semantics. In contrast,
we propose a single stream model that aligns the modalities at multiple levels:
i) instance level, ii) fine-grained patch level, iii) conceptual semantic
level. We achieve this using two novel tasks: symmetric cross-modality
reconstruction and a pseudo-labeled key word prediction. In the former part, we
mask the input tokens from one of the modalities and use the cross-modal
information to reconstruct the masked token, thus improving fine-grained
alignment between the two modalities. In the latter part, we parse the caption
to select a few key words and feed it together with the momentum encoder pseudo
signal to self-supervise the visual encoder, enforcing it to learn rich
semantic concepts that are essential for grounding a textual token to an image
region. We demonstrate top performance on a set of Vision-Language downstream
tasks such as zero-shot/fine-tuned image/text retrieval, referring expression,
and VQA. We also demonstrate how the proposed models can align the modalities
at multiple levels.Comment: 22 pages, 7 figure
Recommended from our members
An assessment of the load modifying potential of model predictive controlled dynamic facades within the California context
California is making major strides towards meeting its greenhouse gas emission reduction goals with the transformation of its electrical grid to accommodate renewable generation, aggressive promotion of building energy efficiency, and increased emphasis on moving toward electrification of end uses (e.g., residential heating, etc.). As a result of this activity, the State is faced with significant challenges of systemwide resource adequacy, power quality and grid reliability that could be addressed in part with demand responsive (DR) load modifying strategies using controllable building technologies. Dynamic facades have the ability to potentially shift and shed loads at critical times of the day in combination with daylighting and HVAC controls. This study explores the technical potential of dynamic facades to support net load shape objectives. A model predictive controller (MPC) was designed based on reduced order thermal (Modelica) and window (Radiance) models. Using an automated workflow (involving JModelica.org and MPCPy), these models were converted and differentiated to formulate a non-linear optimization problem. A gradient-based, non-linear programming problem solver (IPOPT) was used to derive an optimal control strategy, then a post-optimization step was used to convert the solution to a discrete state for facade actuation. Continuous state modulation of the façade was also modeled. The performance of the MPC controller with and without activation of thermal mass was evaluated in a south-facing perimeter office zone with a three-zone electrochromic window for a clear sunny week during summer and winter periods in Oakland and Burbank, California. MPC strategies reduced total energy cost by 9–28% and critical coincident peak demand was reduced by up to 0.58 W/ft2-floor or 19–43% in the 4.6 m (15 ft) deep south zone on sunny summer days in Oakland compared to state-of-the-art heuristic control. Similar savings were achieved for the hotter, Burbank climate in Southern California. This outcome supports the argument that MPC control of dynamic facades can provide significant electricity cost reductions and net load management capabilities of benefit to both the building owner and evolving electrical grid
Formal Synthesis of Controllers for Safety-Critical Autonomous Systems: Developments and Challenges
In recent years, formal methods have been extensively used in the design of
autonomous systems. By employing mathematically rigorous techniques, formal
methods can provide fully automated reasoning processes with provable safety
guarantees for complex dynamic systems with intricate interactions between
continuous dynamics and discrete logics. This paper provides a comprehensive
review of formal controller synthesis techniques for safety-critical autonomous
systems. Specifically, we categorize the formal control synthesis problem based
on diverse system models, encompassing deterministic, non-deterministic, and
stochastic, and various formal safety-critical specifications involving logic,
real-time, and real-valued domains. The review covers fundamental formal
control synthesis techniques, including abstraction-based approaches and
abstraction-free methods. We explore the integration of data-driven synthesis
approaches in formal control synthesis. Furthermore, we review formal
techniques tailored for multi-agent systems (MAS), with a specific focus on
various approaches to address the scalability challenges in large-scale
systems. Finally, we discuss some recent trends and highlight research
challenges in this area
Reinforcement Learning
Brains rule the world, and brain-like computation is increasingly used in computers and electronic devices. Brain-like computation is about processing and interpreting data or directly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. The first 11 chapters of this book describe and extend the scope of reinforcement learning. The remaining 11 chapters show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field
Regularizing Deep Networks by Modeling and Predicting Label Structure
We construct custom regularization functions for use in supervised training
of deep neural networks. Our technique is applicable when the ground-truth
labels themselves exhibit internal structure; we derive a regularizer by
learning an autoencoder over the set of annotations. Training thereby becomes a
two-phase procedure. The first phase models labels with an autoencoder. The
second phase trains the actual network of interest by attaching an auxiliary
branch that must predict output via a hidden layer of the autoencoder. After
training, we discard this auxiliary branch.
We experiment in the context of semantic segmentation, demonstrating this
regularization strategy leads to consistent accuracy boosts over baselines,
both when training from scratch, or in combination with ImageNet pretraining.
Gains are also consistent over different choices of convolutional network
architecture. As our regularizer is discarded after training, our method has
zero cost at test time; the performance improvements are essentially free. We
are simply able to learn better network weights by building an abstract model
of the label space, and then training the network to understand this
abstraction alongside the original task.Comment: to appear at CVPR 201
- …