16,807 research outputs found
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
This paper provides an overview of the current state-of-the-art in selective
harvesting robots (SHRs) and their potential for addressing the challenges of
global food production. SHRs have the potential to increase productivity,
reduce labour costs, and minimise food waste by selectively harvesting only
ripe fruits and vegetables. The paper discusses the main components of SHRs,
including perception, grasping, cutting, motion planning, and control. It also
highlights the challenges in developing SHR technologies, particularly in the
areas of robot design, motion planning and control. The paper also discusses
the potential benefits of integrating AI and soft robots and data-driven
methods to enhance the performance and robustness of SHR systems. Finally, the
paper identifies several open research questions in the field and highlights
the need for further research and development efforts to advance SHR
technologies to meet the challenges of global food production. Overall, this
paper provides a starting point for researchers and practitioners interested in
developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is
demonstrated to be one small step for generative AI (GAI), but one giant leap
for artificial general intelligence (AGI). Since its official release in
November 2022, ChatGPT has quickly attracted numerous users with extensive
media coverage. Such unprecedented attention has also motivated numerous
researchers to investigate ChatGPT from various aspects. According to Google
scholar, there are more than 500 articles with ChatGPT in their titles or
mentioning it in their abstracts. Considering this, a review is urgently
needed, and our work fills this gap. Overall, this work is the first to survey
ChatGPT with a comprehensive review of its underlying technology, applications,
and challenges. Moreover, we present an outlook on how ChatGPT might evolve to
realize general-purpose AIGC (a.k.a. AI-generated content), which will be a
significant milestone for the development of AGI.Comment: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated
([email protected]
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Automatic speech recognition (ASR) has gained a remarkable success thanks to
recent advances of deep learning, but it usually degrades significantly under
real-world noisy conditions. Recent works introduce speech enhancement (SE) as
front-end to improve speech quality, which is proved effective but may not be
optimal for downstream ASR due to speech distortion problem. Based on that,
latest works combine SE and currently popular self-supervised learning (SSL) to
alleviate distortion and improve noise robustness. Despite the effectiveness,
the speech distortion caused by conventional SE still cannot be completely
eliminated. In this paper, we propose a self-supervised framework named
Wav2code to implement a generalized SE without distortions for noise-robust
ASR. First, in pre-training stage the clean speech representations from SSL
model are sent to lookup a discrete codebook via nearest-neighbor feature
matching, the resulted code sequence are then exploited to reconstruct the
original clean representations, in order to store them in codebook as prior.
Second, during finetuning we propose a Transformer-based code predictor to
accurately predict clean codes by modeling the global dependency of input noisy
representations, which enables discovery and restoration of high-quality clean
representations without distortions. Furthermore, we propose an interactive
feature fusion network to combine original noisy and the restored clean
representations to consider both fidelity and quality, resulting in even more
informative features for downstream ASR. Finally, experiments on both synthetic
and real noisy datasets demonstrate that Wav2code can solve the speech
distortion and improve ASR performance under various noisy conditions,
resulting in stronger robustness.Comment: 12 pages, 7 figures, Submitted to IEEE/ACM TASL
Concept Graph Neural Networks for Surgical Video Understanding
We constantly integrate our knowledge and understanding of the world to
enhance our interpretation of what we see.
This ability is crucial in application domains which entail reasoning about
multiple entities and concepts, such as AI-augmented surgery. In this paper, we
propose a novel way of integrating conceptual knowledge into temporal analysis
tasks via temporal concept graph networks. In the proposed networks, a global
knowledge graph is incorporated into the temporal analysis of surgical
instances, learning the meaning of concepts and relations as they apply to the
data. We demonstrate our results in surgical video data for tasks such as
verification of critical view of safety, as well as estimation of Parkland
grading scale. The results show that our method improves the recognition and
detection of complex benchmarks as well as enables other analytic applications
of interest
Ausubel's meaningful learning re-visited
This review provides a critique of David Ausubel’s theory of meaningful learning and the use of advance organizers in teaching. It takes into account the developments in cognition and neuroscience which have taken place in the 50 or so years since he advanced his ideas, developments which challenge our understanding of cognitive structure and the recall of prior learning. These include (i) how effective questioning to ascertain previous knowledge necessitates in-depth Socratic dialogue; (ii) how many findings in cognition and neuroscience indicate that memory may be non-representational, thereby affecting our interpretation of student recollections; (iii) the now recognised dynamism of memory; (iv) usefully regarding concepts as abilities or simulators and skills; (v) acknowledging conscious and unconscious memory and imagery; (vi) how conceptual change involves conceptual coexistence and revision; (vii) noting linguistic and neural pathways as a result of experience and neural selection; and (viii) recommending that wider concepts of scaffolding should be adopted, particularly given the increasing focus on collaborative learning in a technological world
Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction
Saliency Prediction aims to predict the attention distribution of human eyes
given an RGB image. Most of the recent state-of-the-art methods are based on
deep image feature representations from traditional CNNs. However, the
traditional convolution could not capture the global features of the image well
due to its small kernel size. Besides, the high-level factors which closely
correlate to human visual perception, e.g., objects, color, light, etc., are
not considered. Inspired by these, we propose a Transformer-based method with
semantic segmentation as another learning objective. More global cues of the
image could be captured by Transformer. In addition, simultaneously learning
the object segmentation simulates the human visual perception, which we would
verify in our investigation of human gaze control in cognitive science. We
build an extra decoder for the subtask and the multiple tasks share the same
Transformer encoder, forcing it to learn from multiple feature spaces. We find
in practice simply adding the subtask might confuse the main task learning,
hence Multi-task Attention Module is proposed to deal with the feature
interaction between the multiple learning targets. Our method achieves
competitive performance compared to other state-of-the-art methods
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
Information-Theoretic GAN Compression with Variational Energy-based Model
We propose an information-theoretic knowledge distillation approach for the
compression of generative adversarial networks, which aims to maximize the
mutual information between teacher and student networks via a variational
optimization based on an energy-based model. Because the direct computation of
the mutual information in continuous domains is intractable, our approach
alternatively optimizes the student network by maximizing the variational lower
bound of the mutual information. To achieve a tight lower bound, we introduce
an energy-based model relying on a deep neural network to represent a flexible
variational distribution that deals with high-dimensional images and consider
spatial dependencies between pixels, effectively. Since the proposed method is
a generic optimization algorithm, it can be conveniently incorporated into
arbitrary generative adversarial networks and even dense prediction networks,
e.g., image enhancement models. We demonstrate that the proposed algorithm
achieves outstanding performance in model compression of generative adversarial
networks consistently when combined with several existing models.Comment: Accepted at Neurips202
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
We present CrossLoc3D, a novel 3D place recognition method that solves a
large-scale point matching problem in a cross-source setting. Cross-source
point cloud data corresponds to point sets captured by depth sensors with
different accuracies or from different distances and perspectives. We address
the challenges in terms of developing 3D place recognition methods that account
for the representation gap between points captured by different sources. Our
method handles cross-source data by utilizing multi-grained features and
selecting convolution kernel sizes that correspond to most prominent features.
Inspired by the diffusion models, our method uses a novel iterative refinement
process that gradually shifts the embedding spaces from different sources to a
single canonical space for better metric learning. In addition, we present
CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of
point cloud data from both aerial and ground LiDAR scans. The point clouds in
CS-Campus3D have representation gaps and other features like different views,
point densities, and noise patterns. We show that our CrossLoc3D algorithm can
achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall
on our CS-Campus3D benchmark and achieves performance comparable to
state-of-the-art 3D place recognition method on the Oxford RobotCar. We will
release the code and CS-Campus3D benchmark
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
Neural-network-based single image depth prediction (SIDP) is a challenging
task where the goal is to predict the scene's per-pixel depth at test time.
Since the problem, by definition, is ill-posed, the fundamental goal is to come
up with an approach that can reliably model the scene depth from a set of
training examples. In the pursuit of perfect depth estimation, most existing
state-of-the-art learning techniques predict a single scalar depth value
per-pixel. Yet, it is well-known that the trained model has accuracy limits and
can predict imprecise depth. Therefore, an SIDP approach must be mindful of the
expected depth variations in the model's prediction at test time. Accordingly,
we introduce an approach that performs continuous modeling of per-pixel depth,
where we can predict and reason about the per-pixel depth and its distribution.
To this end, we model per-pixel scene depth using a multivariate Gaussian
distribution. Moreover, contrary to the existing uncertainty modeling methods
-- in the same spirit, where per-pixel depth is assumed to be independent, we
introduce per-pixel covariance modeling that encodes its depth dependency w.r.t
all the scene points. Unfortunately, per-pixel depth covariance modeling leads
to a computationally expensive continuous loss function, which we solve
efficiently using the learned low-rank approximation of the overall covariance
matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and
SUN-RGB-D, the SIDP model obtained by optimizing our loss function shows
state-of-the-art results. Our method's accuracy (named MG) is among the top on
the KITTI depth-prediction benchmark leaderboard.Comment: Accepted to IEEE/CVF CVPR 2023. Draft info: 17 pages, 13 Figures, 9
Table
- …