2,337 research outputs found
Revisiting Unsupervised Relation Extraction
Unsupervised relation extraction (URE) extracts relations between named
entities from raw text without manually-labelled data and existing knowledge
bases (KBs). URE methods can be categorised into generative and discriminative
approaches, which rely either on hand-crafted features or surface form.
However, we demonstrate that by using only named entities to induce relation
types, we can outperform existing methods on two popular datasets. We conduct a
comparison and evaluation of our findings with other URE techniques, to
ascertain the important features in URE. We conclude that entity types provide
a strong inductive bias for URE.Comment: 8 pages, 1 figure, 2 tables. Accepted in ACL 202
Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning
Object-centric learning (OCL) aspires general and compositional understanding
of scenes by representing a scene as a collection of object-centric
representations. OCL has also been extended to multi-view image and video
datasets to apply various data-driven inductive biases by utilizing geometric
or temporal information in the multi-image data. Single-view images carry less
information about how to disentangle a given scene than videos or multi-view
images do. Hence, owing to the difficulty of applying inductive biases, OCL for
single-view images remains challenging, resulting in inconsistent learning of
object-centric representation. To this end, we introduce a novel OCL framework
for single-view images, SLot Attention via SHepherding (SLASH), which consists
of two simple-yet-effective modules on top of Slot Attention. The new modules,
Attention Refining Kernel (ARK) and Intermediate Point Predictor and Encoder
(IPPE), respectively, prevent slots from being distracted by the background
noise and indicate locations for slots to focus on to facilitate learning of
object-centric representation. We also propose a weak semi-supervision approach
for OCL, whilst our proposed framework can be used without any assistant
annotation during the inference. Experiments show that our proposed method
enables consistent learning of object-centric representation and achieves
strong performance across four datasets. Code is available at
\url{https://github.com/object-understanding/SLASH}
Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos
This work proposes a self-supervised learning system for segmenting rigid
objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D
videos of static objects, which can be captured with a camera carried by a
mobile robot. A key feature of the self-supervised training process is a
graph-matching algorithm that operates on the over-segmentation output of the
point cloud that is reconstructed from each video. The graph matching, along
with point cloud registration, is able to find reoccurring object patterns
across videos and combine them into 3D object pseudo labels, even under
occlusions or different viewing angles. Projected 2D object masks from 3D
pseudo labels are used to train a pixel-wise feature extractor through
contrastive learning. During online inference, a clustering method uses the
learned features to cluster foreground pixels into object segments. Experiments
highlight the method's effectiveness on both real and synthetic video datasets,
which include cluttered scenes of tabletop objects. The proposed method
outperforms existing unsupervised methods for object segmentation by a large
margin
Sub-token ViT Embedding via Stochastic Resonance Transformers
We discover the presence of quantization artifacts in Vision Transformers
(ViTs), which arise due to the image tokenization step inherent in these
architectures. These artifacts result in coarsely quantized features, which
negatively impact performance, especially on downstream dense prediction tasks.
We present a zero-shot method to improve how pre-trained ViTs handle spatial
quantization. In particular, we propose to ensemble the features obtained from
perturbing input images via sub-token spatial translations, inspired by
Stochastic Resonance, a method traditionally applied to climate dynamics and
signal processing. We term our method ``Stochastic Resonance Transformer"
(SRT), which we show can effectively super-resolve features of pre-trained
ViTs, capturing more of the local fine-grained structures that might otherwise
be neglected as a result of tokenization. SRT can be applied at any layer, on
any task, and does not require any fine-tuning. The advantage of the former is
evident when applied to monocular depth prediction, where we show that
ensembling model outputs are detrimental while applying SRT on intermediate ViT
features outperforms the baseline models by an average of 4.7% and 14.9% on the
RMSE and RMSE-log metrics across three different architectures. When applied to
semi-supervised video object segmentation, SRT also improves over the baseline
models uniformly across all metrics, and by an average of 2.4% in F&J score. We
further show that these quantization artifacts can be attenuated to some extent
via self-distillation. On the unsupervised salient region segmentation, SRT
improves upon the base model by an average of 2.1% on the maxF metric. Finally,
despite operating purely on pixel-level features, SRT generalizes to non-dense
prediction tasks such as image retrieval and object discovery, yielding
consistent improvements of up to 2.6% and 1.0% respectively
Machine infelicity in a poignant visitor setting: Comparing human and AI’s ability to analyze discourse
This study compares the efficacy of computer and human analytics in a commemorative setting. Both deductive and inductive reasoning are compared using the same data across both methods. The data comprises 2490 non-repeated, non-dialogical social media comments from the popular touristic site Tripadvisor. Included in the analysis is participant observation at two Anzac commemorative sites, one in Western Australia and one in Northern France. The data is then processed using both Leximancer V4.51 and Dialectic Thematic Analysis. The findings demonstrate artificial intelligence (AI) was incapable of insight beyond metric-driven content analysis. While fully deduced by human analysis the metamodel was only partially deduced by AI. There was also a difference in the ability to induce themes with AI producing anodyne, axiomatic concepts. Contrastingly, human analytics was capable of transcendent themes representing ampliative, phronetic knowledge. The implications of the study suggest (1) tempering the belief that the current iteration of AI can do more than organise, summarise, and visualise data; (2) advocating for the inclusion of preconception and context in thematic analysis, and (3) encouraging a discussion of the appropriateness of using AI in research
- …