19,733 research outputs found
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
This paper provides an overview of the current state-of-the-art in selective
harvesting robots (SHRs) and their potential for addressing the challenges of
global food production. SHRs have the potential to increase productivity,
reduce labour costs, and minimise food waste by selectively harvesting only
ripe fruits and vegetables. The paper discusses the main components of SHRs,
including perception, grasping, cutting, motion planning, and control. It also
highlights the challenges in developing SHR technologies, particularly in the
areas of robot design, motion planning and control. The paper also discusses
the potential benefits of integrating AI and soft robots and data-driven
methods to enhance the performance and robustness of SHR systems. Finally, the
paper identifies several open research questions in the field and highlights
the need for further research and development efforts to advance SHR
technologies to meet the challenges of global food production. Overall, this
paper provides a starting point for researchers and practitioners interested in
developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic
Security and Privacy Problems in Voice Assistant Applications: A Survey
Voice assistant applications have become omniscient nowadays. Two models that
provide the two most important functions for real-life applications (i.e.,
Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR)
models and Speaker Identification (SI) models. According to recent studies,
security and privacy threats have also emerged with the rapid development of
the Internet of Things (IoT). The security issues researched include attack
techniques toward machine learning models and other hardware components widely
used in voice assistant applications. The privacy issues include technical-wise
information stealing and policy-wise privacy breaches. The voice assistant
application takes a steadily growing market share every year, but their privacy
and security issues never stopped causing huge economic losses and endangering
users' personal sensitive information. Thus, it is important to have a
comprehensive survey to outline the categorization of the current research
regarding the security and privacy problems of voice assistant applications.
This paper concludes and assesses five kinds of security attacks and three
types of privacy threats in the papers published in the top-tier conferences of
cyber security and voice domain.Comment: 5 figure
Fast Charging of Lithium-Ion Batteries Using Deep Bayesian Optimization with Recurrent Neural Network
Fast charging has attracted increasing attention from the battery community
for electrical vehicles (EVs) to alleviate range anxiety and reduce charging
time for EVs. However, inappropriate charging strategies would cause severe
degradation of batteries or even hazardous accidents. To optimize fast-charging
strategies under various constraints, particularly safety limits, we propose a
novel deep Bayesian optimization (BO) approach that utilizes Bayesian recurrent
neural network (BRNN) as the surrogate model, given its capability in handling
sequential data. In addition, a combined acquisition function of expected
improvement (EI) and upper confidence bound (UCB) is developed to better
balance the exploitation and exploration. The effectiveness of the proposed
approach is demonstrated on the PETLION, a porous electrode theory-based
battery simulator. Our method is also compared with the state-of-the-art BO
methods that use Gaussian process (GP) and non-recurrent network as surrogate
models. The results verify the superior performance of the proposed fast
charging approaches, which mainly results from that: (i) the BRNN-based
surrogate model provides a more precise prediction of battery lifetime than
that based on GP or non-recurrent network; and (ii) the combined acquisition
function outperforms traditional EI or UCB criteria in exploring the optimal
charging protocol that maintains the longest battery lifetime
ADS_UNet: A Nested UNet for Histopathology Image Segmentation
The UNet model consists of fully convolutional network (FCN) layers arranged
as contracting encoder and upsampling decoder maps. Nested arrangements of
these encoder and decoder maps give rise to extensions of the UNet model, such
as UNete and UNet++. Other refinements include constraining the outputs of the
convolutional layers to discriminate between segment labels when trained end to
end, a property called deep supervision. This reduces feature diversity in
these nested UNet models despite their large parameter space. Furthermore, for
texture segmentation, pixel correlations at multiple scales contribute to the
classification task; hence, explicit deep supervision of shallower layers is
likely to enhance performance. In this paper, we propose ADS UNet, a stage-wise
additive training algorithm that incorporates resource-efficient deep
supervision in shallower layers and takes performance-weighted combinations of
the sub-UNets to create the segmentation model. We provide empirical evidence
on three histopathology datasets to support the claim that the proposed ADS
UNet reduces correlations between constituent features and improves performance
while being more resource efficient. We demonstrate that ADS_UNet outperforms
state-of-the-art Transformer-based models by 1.08 and 0.6 points on CRAG and
BCSS datasets, and yet requires only 37% of GPU consumption and 34% of training
time as that required by Transformers.Comment: To be published in Expert Systems With Application
Enhancing Low-resolution Face Recognition with Feature Similarity Knowledge Distillation
In this study, we introduce a feature knowledge distillation framework to
improve low-resolution (LR) face recognition performance using knowledge
obtained from high-resolution (HR) images. The proposed framework transfers
informative features from an HR-trained network to an LR-trained network by
reducing the distance between them. A cosine similarity measure was employed as
a distance metric to effectively align the HR and LR features. This approach
differs from conventional knowledge distillation frameworks, which use the L_p
distance metrics and offer the advantage of converging well when reducing the
distance between features of different resolutions. Our framework achieved a 3%
improvement over the previous state-of-the-art method on the AgeDB-30 benchmark
without bells and whistles, while maintaining a strong performance on HR
images. The effectiveness of cosine similarity as a distance metric was
validated through statistical analysis, making our approach a promising
solution for real-world applications in which LR images are frequently
encountered. The code and pretrained models are publicly available on
https://github.com/gist-ailab/feature-similarity-KD
Quantifying and Explaining Machine Learning Uncertainty in Predictive Process Monitoring: An Operations Research Perspective
This paper introduces a comprehensive, multi-stage machine learning
methodology that effectively integrates information systems and artificial
intelligence to enhance decision-making processes within the domain of
operations research. The proposed framework adeptly addresses common
limitations of existing solutions, such as the neglect of data-driven
estimation for vital production parameters, exclusive generation of point
forecasts without considering model uncertainty, and lacking explanations
regarding the sources of such uncertainty. Our approach employs Quantile
Regression Forests for generating interval predictions, alongside both local
and global variants of SHapley Additive Explanations for the examined
predictive process monitoring problem. The practical applicability of the
proposed methodology is substantiated through a real-world production planning
case study, emphasizing the potential of prescriptive analytics in refining
decision-making procedures. This paper accentuates the imperative of addressing
these challenges to fully harness the extensive and rich data resources
accessible for well-informed decision-making
Sign Language Translation from Instructional Videos
The advances in automatic sign language translation (SLT) to spoken languages
have been mostly benchmarked with datasets of limited size and restricted
domains. Our work advances the state of the art by providing the first baseline
results on How2Sign, a large and broad dataset.
We train a Transformer over I3D video features, using the reduced BLEU as a
reference metric for validation, instead of the widely used BLEU score. We
report a result of 8.03 on the BLEU score, and publish the first open-source
implementation of its kind to promote further advances.Comment: Paper accepted at WiCV @CVPR2
Continual Learning of Hand Gestures for Human-Robot Interaction
In this paper, we present an efficient method to incrementally learn to
classify static hand gestures. This method allows users to teach a robot to
recognize new symbols in an incremental manner. Contrary to other works which
use special sensors or external devices such as color or data gloves, our
proposed approach makes use of a single RGB camera to perform static hand
gesture recognition from 2D images. Furthermore, our system is able to
incrementally learn up to 38 new symbols using only 5 samples for each old
class, achieving a final average accuracy of over 90\%. In addition to that,
the incremental training time can be reduced to a 10\% of the time required
when using all data available
UniverSeg: Universal Medical Image Segmentation
While deep learning models have become the predominant method for medical
image segmentation, they are typically not capable of generalizing to unseen
segmentation tasks involving new anatomies, image modalities, or labels. Given
a new segmentation task, researchers generally have to train or fine-tune
models, which is time-consuming and poses a substantial barrier for clinical
researchers, who often lack the resources and expertise to train neural
networks. We present UniverSeg, a method for solving unseen medical
segmentation tasks without additional training. Given a query image and example
set of image-label pairs that define a new segmentation task, UniverSeg employs
a new Cross-Block mechanism to produce accurate segmentation maps without the
need for additional training. To achieve generalization to new tasks, we have
gathered and standardized a collection of 53 open-access medical segmentation
datasets with over 22,000 scans, which we refer to as MegaMedical. We used this
collection to train UniverSeg on a diverse set of anatomies and imaging
modalities. We demonstrate that UniverSeg substantially outperforms several
related methods on unseen tasks, and thoroughly analyze and draw insights about
important aspects of the proposed system. The UniverSeg source code and model
weights are freely available at https://universeg.csail.mit.eduComment: Victor and Jose Javier contributed equally to this work. Project
Website: https://universeg.csail.mit.ed
Concept Graph Neural Networks for Surgical Video Understanding
We constantly integrate our knowledge and understanding of the world to
enhance our interpretation of what we see.
This ability is crucial in application domains which entail reasoning about
multiple entities and concepts, such as AI-augmented surgery. In this paper, we
propose a novel way of integrating conceptual knowledge into temporal analysis
tasks via temporal concept graph networks. In the proposed networks, a global
knowledge graph is incorporated into the temporal analysis of surgical
instances, learning the meaning of concepts and relations as they apply to the
data. We demonstrate our results in surgical video data for tasks such as
verification of critical view of safety, as well as estimation of Parkland
grading scale. The results show that our method improves the recognition and
detection of complex benchmarks as well as enables other analytic applications
of interest
- …