21,331 research outputs found
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
This paper provides an overview of the current state-of-the-art in selective
harvesting robots (SHRs) and their potential for addressing the challenges of
global food production. SHRs have the potential to increase productivity,
reduce labour costs, and minimise food waste by selectively harvesting only
ripe fruits and vegetables. The paper discusses the main components of SHRs,
including perception, grasping, cutting, motion planning, and control. It also
highlights the challenges in developing SHR technologies, particularly in the
areas of robot design, motion planning and control. The paper also discusses
the potential benefits of integrating AI and soft robots and data-driven
methods to enhance the performance and robustness of SHR systems. Finally, the
paper identifies several open research questions in the field and highlights
the need for further research and development efforts to advance SHR
technologies to meet the challenges of global food production. Overall, this
paper provides a starting point for researchers and practitioners interested in
developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic
Concept Graph Neural Networks for Surgical Video Understanding
We constantly integrate our knowledge and understanding of the world to
enhance our interpretation of what we see.
This ability is crucial in application domains which entail reasoning about
multiple entities and concepts, such as AI-augmented surgery. In this paper, we
propose a novel way of integrating conceptual knowledge into temporal analysis
tasks via temporal concept graph networks. In the proposed networks, a global
knowledge graph is incorporated into the temporal analysis of surgical
instances, learning the meaning of concepts and relations as they apply to the
data. We demonstrate our results in surgical video data for tasks such as
verification of critical view of safety, as well as estimation of Parkland
grading scale. The results show that our method improves the recognition and
detection of complex benchmarks as well as enables other analytic applications
of interest
TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion Odometry Estimation
Multi-modal fusion of sensors is a commonly used approach to enhance the
performance of odometry estimation, which is also a fundamental module for
mobile robots. However, the question of \textit{how to perform fusion among
different modalities in a supervised sensor fusion odometry estimation task?}
is still one of challenging issues remains. Some simple operations, such as
element-wise summation and concatenation, are not capable of assigning adaptive
attentional weights to incorporate different modalities efficiently, which make
it difficult to achieve competitive odometry results. Recently, the Transformer
architecture has shown potential for multi-modal fusion tasks, particularly in
the domains of vision with language. In this work, we propose an end-to-end
supervised Transformer-based LiDAR-Inertial fusion framework (namely
TransFusionOdom) for odometry estimation. The multi-attention fusion module
demonstrates different fusion approaches for homogeneous and heterogeneous
modalities to address the overfitting problem that can arise from blindly
increasing the complexity of the model. Additionally, to interpret the learning
process of the Transformer-based multi-modal interactions, a general
visualization approach is introduced to illustrate the interactions between
modalities. Moreover, exhaustive ablation studies evaluate different
multi-modal fusion strategies to verify the performance of the proposed fusion
strategy. A synthetic multi-modal dataset is made public to validate the
generalization ability of the proposed fusion strategy, which also works for
other combinations of different modalities. The quantitative and qualitative
odometry evaluations on the KITTI dataset verify the proposed TransFusionOdom
could achieve superior performance compared with other related works.Comment: Submitted to IEEE Sensors Journal with some modifications. This work
has been submitted to the IEEE for possible publication. Copyright may be
transferred without notice, after which this version may no longer be
accessibl
Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds
This paper presents a framework for semantic segmentation on sparse
sequential point clouds of millimeter-wave radar. Compared with cameras and
lidars, millimeter-wave radars have the advantage of not revealing privacy,
having a strong anti-interference ability, and having long detection distance.
The sparsity and capturing temporal-topological features of mmWave data is
still a problem. However, the issue of capturing the temporal-topological
coupling features under the human semantic segmentation task prevents previous
advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from
being well utilized in practical scenarios. To address the challenge caused by
the sparsity and temporal-topological feature of the data, we (i) introduce
graph structure and topological features to the point cloud, (ii) propose a
semantic segmentation framework including a global feature-extracting module
and a sequential feature-extracting module. In addition, we design an efficient
and more fitting loss function for a better training process and segmentation
results based on graph clustering. Experimentally, we deploy representative
semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset.
Experimental results indicate that our model achieves mean accuracy on the
custom dataset by and outperforms the state-of-the-art
algorithms. Moreover, to validate the model's robustness, we deploy our model
on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean
accuracy by , outperforming baseline algorithms
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
INFERENSI KONTEKS BERDASARKAN ANALISIS RELASI MAKNA WEBTOON “SMILE BRUSH: MY OLD PICTURES”
The study in this research is oriented to the analysis and description of inferences on the context and a comprehensive understanding of other linguistic variables in the text and discourse in it. The research data are lingual lexical units and phrases that show the relation of synonymy and polysemy meanings in the narrative text of the comic "Smile Brush: My Old Pictures" by Waroo, which can be accessed on the Webtoon platform. The data is processed using descriptive qualitative linguistic research characteristics combined with ethnoscience analysis. Data was occupied by the distribution method using the BUL/Direct Element Sharing technique and coding. The result states that this inference is the conclusion of cognition based on the context built by involving participants, awareness, and over-paradigmatic relations to syntagmatic other ties. This inference is the role of the association of meaning to other linguistic units in understanding the context in terminating inference. The process and conclusion of all these factors and variables show the stimulative, systemic, and holistic linguistic correlation of metafunctions and stratification of linguistic domains
GETT-QA: Graph Embedding based T2T Transformer for Knowledge Graph Question Answering
In this work, we present an end-to-end Knowledge Graph Question Answering
(KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text
pre-trained language model. The model takes a question in natural language as
input and produces a simpler form of the intended SPARQL query. In the simpler
form, the model does not directly produce entity and relation IDs. Instead, it
produces corresponding entity and relation labels. The labels are grounded to
KG entity and relation IDs in a subsequent step. To further improve the
results, we instruct the model to produce a truncated version of the KG
embedding for each entity. The truncated KG embedding enables a finer search
for disambiguation purposes. We find that T5 is able to learn the truncated KG
embeddings without any change of loss function, improving KGQA performance. As
a result, we report strong results for LC-QuAD 2.0 and SimpleQuestions-Wikidata
datasets on end-to-end KGQA over Wikidata.Comment: 16 pages single column format accepted at ESWC 2023 research trac
Hi4D: 4D Instance Segmentation of Close Human Interaction
We propose Hi4D, a method and dataset for the automatic analysis of
physically close human-human interaction under prolonged contact. Robustly
disentangling several in-contact subjects is a challenging task due to
occlusions and complex shapes. Hence, existing multi-view systems typically
fuse 3D surfaces of close subjects into a single, connected mesh. To address
this issue we leverage i) individually fitted neural implicit avatars; ii) an
alternating optimization scheme that refines pose and surface through periods
of close proximity; and iii) thus segment the fused raw scans into individual
instances. From these instances we compile Hi4D dataset of 4D textured scans of
20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D
contains rich interaction-centric annotations in 2D and 3D alongside accurately
registered parametric body models. We define varied human pose and shape
estimation tasks on this dataset and provide results from state-of-the-art
methods on these benchmarks.Comment: Project page: https://yifeiyin04.github.io/Hi4D
An Information Extraction Study: Take In Mind the Tokenization!
Current research on the advantages and trade-offs of using characters,
instead of tokenized text, as input for deep learning models, has evolved
substantially. New token-free models remove the traditional tokenization step;
however, their efficiency remains unclear. Moreover, the effect of tokenization
is relatively unexplored in sequence tagging tasks. To this end, we investigate
the impact of tokenization when extracting information from documents and
present a comparative study and analysis of subword-based and character-based
models. Specifically, we study Information Extraction (IE) from biomedical
texts. The main outcome is twofold: tokenization patterns can introduce
inductive bias that results in state-of-the-art performance, and the
character-based models produce promising results; thus, transitioning to
token-free IE models is feasible.Comment: presented at EUSFLAT 202
Examples of works to practice staccato technique in clarinet instrument
Klarnetin staccato tekniğini güçlendirme aşamaları eser çalışmalarıyla uygulanmıştır. Staccato
geçişlerini hızlandıracak ritim ve nüans çalışmalarına yer verilmiştir. Çalışmanın en önemli amacı
sadece staccato çalışması değil parmak-dilin eş zamanlı uyumunun hassasiyeti üzerinde de
durulmasıdır. Staccato çalışmalarını daha verimli hale getirmek için eser çalışmasının içinde etüt
çalışmasına da yer verilmiştir. Çalışmaların üzerinde titizlikle durulması staccato çalışmasının ilham
verici etkisi ile müzikal kimliğe yeni bir boyut kazandırmıştır. Sekiz özgün eser çalışmasının her
aşaması anlatılmıştır. Her aşamanın bir sonraki performans ve tekniği güçlendirmesi esas alınmıştır.
Bu çalışmada staccato tekniğinin hangi alanlarda kullanıldığı, nasıl sonuçlar elde edildiği bilgisine
yer verilmiştir. Notaların parmak ve dil uyumu ile nasıl şekilleneceği ve nasıl bir çalışma disiplini
içinde gerçekleşeceği planlanmıştır. Kamış-nota-diyafram-parmak-dil-nüans ve disiplin
kavramlarının staccato tekniğinde ayrılmaz bir bütün olduğu saptanmıştır. Araştırmada literatür
taraması yapılarak staccato ile ilgili çalışmalar taranmıştır. Tarama sonucunda klarnet tekniğin de
kullanılan staccato eser çalışmasının az olduğu tespit edilmiştir. Metot taramasında da etüt
çalışmasının daha çok olduğu saptanmıştır. Böylelikle klarnetin staccato tekniğini hızlandırma ve
güçlendirme çalışmaları sunulmuştur. Staccato etüt çalışmaları yapılırken, araya eser çalışmasının
girmesi beyni rahatlattığı ve istekliliği daha arttırdığı gözlemlenmiştir. Staccato çalışmasını yaparken
doğru bir kamış seçimi üzerinde de durulmuştur. Staccato tekniğini doğru çalışmak için doğru bir
kamışın dil hızını arttırdığı saptanmıştır. Doğru bir kamış seçimi kamıştan rahat ses çıkmasına
bağlıdır. Kamış, dil atma gücünü vermiyorsa daha doğru bir kamış seçiminin yapılması gerekliliği
vurgulanmıştır. Staccato çalışmalarında baştan sona bir eseri yorumlamak zor olabilir. Bu açıdan
çalışma, verilen müzikal nüanslara uymanın, dil atış performansını rahatlattığını ortaya koymuştur.
Gelecek nesillere edinilen bilgi ve birikimlerin aktarılması ve geliştirici olması teşvik edilmiştir.
Çıkacak eserlerin nasıl çözüleceği, staccato tekniğinin nasıl üstesinden gelinebileceği anlatılmıştır.
Staccato tekniğinin daha kısa sürede çözüme kavuşturulması amaç edinilmiştir. Parmakların
yerlerini öğrettiğimiz kadar belleğimize de çalışmaların kaydedilmesi önemlidir. Gösterilen azmin ve
sabrın sonucu olarak ortaya çıkan yapıt başarıyı daha da yukarı seviyelere çıkaracaktır
- …