56 research outputs found
A Review on Objective-Driven Artificial Intelligence
While advancing rapidly, Artificial Intelligence still falls short of human
intelligence in several key aspects due to inherent limitations in current AI
technologies and our understanding of cognition. Humans have an innate ability
to understand context, nuances, and subtle cues in communication, which allows
us to comprehend jokes, sarcasm, and metaphors. Machines struggle to interpret
such contextual information accurately. Humans possess a vast repository of
common-sense knowledge that helps us make logical inferences and predictions
about the world. Machines lack this innate understanding and often struggle
with making sense of situations that humans find trivial. In this article, we
review the prospective Machine Intelligence candidates, a review from Prof.
Yann LeCun, and other work that can help close this gap between human and
machine intelligence. Specifically, we talk about what's lacking with the
current AI techniques such as supervised learning, reinforcement learning,
self-supervised learning, etc. Then we show how Hierarchical planning-based
approaches can help us close that gap and deep-dive into energy-based,
latent-variable methods and Joint embedding predictive architecture methods.Comment: 5 pages, 5 figures, workshop submissio
Training Strategies for Vision Transformers for Object Detection
Vision-based Transformer have shown huge application in the perception module
of autonomous driving in terms of predicting accurate 3D bounding boxes, owing
to their strong capability in modeling long-range dependencies between the
visual features. However Transformers, initially designed for language models,
have mostly focused on the performance accuracy, and not so much on the
inference-time budget. For a safety critical system like autonomous driving,
real-time inference at the on-board compute is an absolute necessity. This
keeps our object detection algorithm under a very tight run-time budget. In
this paper, we evaluated a variety of strategies to optimize on the
inference-time of vision transformers based object detection methods keeping a
close-watch on any performance variations. Our chosen metric for these
strategies is accuracy-runtime joint optimization. Moreover, for actual
inference-time analysis we profile our strategies with float32 and float16
precision with TensorRT module. This is the most common format used by the
industry for deployment of their Machine Learning networks on the edge devices.
We showed that our strategies are able to improve inference-time by 63% at the
cost of performance drop of mere 3% for our problem-statement defined in
evaluation section. These strategies brings down Vision Transformers detectors
inference-time even less than traditional single-image based CNN detectors like
FCOS. We recommend practitioners use these techniques to deploy Transformers
based hefty multi-view networks on a budge-constrained robotic platform.Comment: 9 pages, 2 figures, IEEE CVPR WAD'23 conferenc
Vision-RADAR fusion for Robotics BEV Detections: A Survey
Due to the trending need of building autonomous robotic perception system,
sensor fusion has attracted a lot of attention amongst researchers and
engineers to make best use of cross-modality information. However, in order to
build a robotic platform at scale we need to emphasize on autonomous robot
platform bring-up cost as well. Cameras and radars, which inherently includes
complementary perception information, has potential for developing autonomous
robotic platform at scale. However, there is a limited work around radar fused
with Vision, compared to LiDAR fused with vision work. In this paper, we tackle
this gap with a survey on Vision-Radar fusion approaches for a BEV object
detection system. First we go through the background information viz., object
detection tasks, choice of sensors, sensor setup, benchmark datasets and
evaluation metrics for a robotic perception system. Later, we cover
per-modality (Camera and RADAR) data representation, then we go into detail
about sensor fusion techniques based on sub-groups viz., early-fusion,
deep-fusion, and late-fusion to easily understand the pros and cons of each
method. Finally, we propose possible future trends for vision-radar fusion to
enlighten future research. Regularly updated summary can be found at:
https://github.com/ApoorvRoboticist/Vision-RADAR-Fusion-BEV-SurveyComment: 6 pages, 6 figures, 2 table
End-to-end Autonomous Driving using Deep Learning: A Systematic Review
End-to-end autonomous driving is a fully differentiable machine learning
system that takes raw sensor input data and other metadata as prior information
and directly outputs the ego vehicle's control signals or planned trajectories.
This paper attempts to systematically review all recent Machine Learning-based
techniques to perform this end-to-end task, including, but not limited to,
object detection, semantic scene understanding, object tracking, trajectory
predictions, trajectory planning, vehicle control, social behavior, and
communications. This paper focuses on recent fully differentiable end-to-end
reinforcement learning and deep learning-based techniques. Our paper also
builds taxonomies of the significant approaches by sub-grouping them and
showcasing their research trends. Finally, this survey highlights the open
challenges and points out possible future directions to enlighten further
research on the topic.Comment: 11 pages, 6 figures, submitted in WACV conferenc
Construction Sites Safety in India: An Assessment Through Eyes of Workers
The construction industry has been the backbone of a nation in development processes and economy. It is one of the most hazardous industry not by severity ratio but by occurrence ratio. It is the largest employer of workers in the agriculture industry, thus making it more prone to accidents. Environment, Health & Safety (EHS)is an area that covers every profession and it is an essential area of the industry. The right knowledge about this can lead to human lives being saved, which is more important than properties loss. EHS empowers a worker/person regarding his work,his conduct,and his motives. EHS allows a worker to be more aware, be more cautious and be more productive. Psychological analysis can empower a worker to be more effective and productive.It increases the will power of workers,‘where there is a will,there is way’.This thesis is regarding the assessment of construction sites in India, through an eye of a worker which leads us to various revelations in the sites and thus portray the picture of how their welfare is being taken off in the industry. The study showed how a worker is dealt with various induction processes and training modules that educates him about EHS. This report also tells us the voids that have been left untouched and which play a significant role in workers’ safety. These voids have been addressed in this report, and solutions have been suggested along with them. The study was conducted on an observational basis which leads to the psychological analysis of workers, their understanding of safety policies, their active participation in safety meetings and their meaning of training imparted to them. The psychological study answers the 5 W’s (why, where, whom, who and whose) of accidents. This study recommends in the improvement and organising worker-to-safety engineer talks
Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey
Vision-based 3D Detection task is fundamental task for the perception of an
autonomous driving system, which has peaked interest amongst many researchers
and autonomous driving engineers. However achieving a rather good 3D BEV
(Bird's Eye View) performance is not an easy task using 2D sensor input-data
with cameras. In this paper we provide a literature survey for the existing
Vision Based 3D detection methods, focused on autonomous driving. We have made
detailed analysis of over papers leveraging Vision BEV detections
approaches and highlighted different sub-groups for detailed understanding of
common trends. Moreover, we have highlighted how the literature and industry
trend have moved towards surround-view image based methods and note down
thoughts on what special cases this method addresses. In conclusion, we provoke
thoughts of 3D Vision techniques for future research based on shortcomings of
the current techniques including the direction of collaborative perception
3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection
3D visual perception tasks based on multi-camera images are essential for
autonomous driving systems. Latest work in this field performs 3D object
detection by leveraging multi-view images as an input and iteratively enhancing
object queries (object proposals) by cross-attending multi-view features.
However, individual backbone features are not updated with multi-view features
and it stays as a mere collection of the output of the single-image backbone
network. Therefore we propose 3M3D: A Multi-view, Multi-path,
Multi-representation for 3D Object Detection where we update both multi-view
features and query features to enhance the representation of the scene in both
fine panoramic view and coarse global view. Firstly, we update multi-view
features by multi-view axis self-attention. It will incorporate panoramic
information in the multi-view features and enhance understanding of the global
scene. Secondly, we update multi-view features by self-attention of the ROI
(Region of Interest) windows which encodes local finer details in the features.
It will help exchange the information not only along the multi-view axis but
also along the other spatial dimension. Lastly, we leverage the fact of
multi-representation of queries in different domains to further boost the
performance. Here we use sparse floating queries along with dense BEV (Bird's
Eye View) queries, which are later post-processed to filter duplicate
detections. Moreover, we show performance improvements on nuScenes benchmark
dataset on top of our baselines
Regularized spectral methods for clustering signed networks
We study the problem of -way clustering in signed graphs. Considerable
attention in recent years has been devoted to analyzing and modeling signed
graphs, where the affinity measure between nodes takes either positive or
negative values. Recently, Cucuringu et al. [CDGT 2019] proposed a spectral
method, namely SPONGE (Signed Positive over Negative Generalized Eigenproblem),
which casts the clustering task as a generalized eigenvalue problem optimizing
a suitably defined objective function. This approach is motivated by social
balance theory, where the clustering task aims to decompose a given network
into disjoint groups, such that individuals within the same group are connected
by as many positive edges as possible, while individuals from different groups
are mainly connected by negative edges. Through extensive numerical
simulations, SPONGE was shown to achieve state-of-the-art empirical
performance. On the theoretical front, [CDGT 2019] analyzed SPONGE and the
popular Signed Laplacian method under the setting of a Signed Stochastic Block
Model (SSBM), for equal-sized clusters, in the regime where the graph is
moderately dense.
In this work, we build on the results in [CDGT 2019] on two fronts for the
normalized versions of SPONGE and the Signed Laplacian. Firstly, for both
algorithms, we extend the theoretical analysis in [CDGT 2019] to the general
setting of unequal-sized clusters in the moderately dense regime.
Secondly, we introduce regularized versions of both methods to handle sparse
graphs -- a regime where standard spectral methods underperform -- and provide
theoretical guarantees under the same SSBM model. To the best of our knowledge,
regularized spectral methods have so far not been considered in the setting of
clustering signed graphs. We complement our theoretical results with an
extensive set of numerical experiments on synthetic data.Comment: 55 pages, 5 figure
- …