4,864 research outputs found
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Machine learning techniques are now integral to the advancement of
intelligent urban services, playing a crucial role in elevating the efficiency,
sustainability, and livability of urban environments. The recent emergence of
foundation models such as ChatGPT marks a revolutionary shift in the fields of
machine learning and artificial intelligence. Their unparalleled capabilities
in contextual understanding, problem solving, and adaptability across a wide
range of tasks suggest that integrating these models into urban domains could
have a transformative impact on the development of smart cities. Despite
growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces
challenges such as a lack of clear definitions, systematic reviews, and
universalizable solutions. To this end, this paper first introduces the concept
of UFM and discusses the unique challenges involved in building them. We then
propose a data-centric taxonomy that categorizes current UFM-related works,
based on urban data modalities and types. Furthermore, to foster advancement in
this field, we present a promising framework aimed at the prospective
realization of UFMs, designed to overcome the identified challenges.
Additionally, we explore the application landscape of UFMs, detailing their
potential impact in various urban contexts. Relevant papers and open-source
resources have been collated and are continuously updated at
https://github.com/usail-hkust/Awesome-Urban-Foundation-Models
Domain Randomization and Generative Models for Robotic Grasping
Deep learning-based robotic grasping has made significant progress thanks to
algorithmic improvements and increased data availability. However,
state-of-the-art models are often trained on as few as hundreds or thousands of
unique object instances, and as a result generalization can be a challenge.
In this work, we explore a novel data generation pipeline for training a deep
neural network to perform grasp planning that applies the idea of domain
randomization to object synthesis. We generate millions of unique, unrealistic
procedurally generated objects, and train a deep neural network to perform
grasp planning on these objects.
Since the distribution of successful grasps for a given object can be highly
multimodal, we propose an autoregressive grasp planning model that maps sensor
inputs of a scene to a probability distribution over possible grasps. This
model allows us to sample grasps efficiently at test time (or avoid sampling
entirely).
We evaluate our model architecture and data generation pipeline in simulation
and the real world. We find we can achieve a 90% success rate on previously
unseen realistic objects at test time in simulation despite having only been
trained on random objects. We also demonstrate an 80% success rate on
real-world grasp attempts despite having only been trained on random simulated
objects.Comment: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2018
PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis
Multimodal sentiment analysis (MSA) leverages heterogeneous data sources to
interpret the complex nature of human sentiments. Despite significant progress
in multimodal architecture design, the field lacks comprehensive regularization
methods. This paper introduces PowMix, a versatile embedding space regularizer
that builds upon the strengths of unimodal mixing-based regularization
approaches and introduces novel algorithmic components that are specifically
tailored to multimodal tasks. PowMix is integrated before the fusion stage of
multimodal architectures and facilitates intra-modal mixing, such as mixing
text with text, to act as a regularizer. PowMix consists of five components: 1)
a varying number of generated mixed examples, 2) mixing factor reweighting, 3)
anisotropic mixing, 4) dynamic mixing, and 5) cross-modal label mixing.
Extensive experimentation across benchmark MSA datasets and a broad spectrum of
diverse architectural designs demonstrate the efficacy of PowMix, as evidenced
by consistent performance improvements over baselines and existing mixing
methods. An in-depth ablation study highlights the critical contribution of
each PowMix component and how they synergistically enhance performance.
Furthermore, algorithmic analysis demonstrates how PowMix behaves in different
scenarios, particularly comparing early versus late fusion architectures.
Notably, PowMix enhances overall performance without sacrificing model
robustness or magnifying text dominance. It also retains its strong performance
in situations of limited data. Our findings position PowMix as a promising
versatile regularization strategy for MSA. Code will be made available.Comment: Preprin
- …