1,257 research outputs found
RACE: Large-scale ReAding Comprehension Dataset From Examinations
We present RACE, a new dataset for benchmark evaluation of methods in the
reading comprehension task. Collected from the English exams for middle and
high school Chinese students in the age range between 12 to 18, RACE consists
of near 28,000 passages and near 100,000 questions generated by human experts
(English instructors), and covers a variety of topics which are carefully
designed for evaluating the students' ability in understanding and reasoning.
In particular, the proportion of questions that requires reasoning is much
larger in RACE than that in other benchmark datasets for reading comprehension,
and there is a significant gap between the performance of the state-of-the-art
models (43%) and the ceiling human performance (95%). We hope this new dataset
can serve as a valuable resource for research and evaluation in machine
comprehension. The dataset is freely available at
http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at
https://github.com/qizhex/RACE_AR_baselines.Comment: EMNLP 201
Diagnosing Human-object Interaction Detectors
Although we have witnessed significant progress in human-object interaction
(HOI) detection with increasingly high mAP (mean Average Precision), a single
mAP score is too concise to obtain an informative summary of a model's
performance and to understand why one approach is better than another. In this
paper, we introduce a diagnosis toolbox for analyzing the error sources of the
existing HOI detection models. We first conduct holistic investigations in the
pipeline of HOI detection, consisting of human-object pair detection and then
interaction classification. We define a set of errors and the oracles to fix
each of them. By measuring the mAP improvement obtained from fixing an error
using its oracle, we can have a detailed analysis of the significance of
different errors. We then delve into the human-object detection and interaction
classification, respectively, and check the model's behavior. For the first
detection task, we investigate both recall and precision, measuring the
coverage of ground-truth human-object pairs as well as the noisiness level in
the detections. For the second classification task, we compute mAP for
interaction classification only, without considering the detection scores. We
also measure the performance of the models in differentiating human-object
pairs with and without actual interactions using the AP (Average Precision)
score. Our toolbox is applicable for different methods across different
datasets and available at https://github.com/neu-vi/Diag-HOI
ADAPTIVE TRANSMISSION POWER IN LOW-POWER AND LOSSY NETWORK
Techniques are provided herein for intelligent transmission power control under different transmission patterns in a connected grid mesh. The transmission patterns include asynchronized transmission, broadcast transmission, and unicast transmission. They also provide a mechanism to help data packets compete against interference on specific channels and help high priority Quality of Service (QoS) packet have a greater chance to be received when congestion occurs. This enables the connected grid mesh to achieve higher reliability of communication with efficient power consumption
Self-Sampling Meta SAM: Enhancing Few-shot Medical Image Segmentation with Meta-Learning
While the Segment Anything Model (SAM) excels in semantic segmentation for
general-purpose images, its performance significantly deteriorates when applied
to medical images, primarily attributable to insufficient representation of
medical images in its training dataset. Nonetheless, gathering comprehensive
datasets and training models that are universally applicable is particularly
challenging due to the long-tail problem common in medical images. To address
this gap, here we present a Self-Sampling Meta SAM (SSM-SAM) framework for
few-shot medical image segmentation. Our innovation lies in the design of three
key modules: 1) An online fast gradient descent optimizer, further optimized by
a meta-learner, which ensures swift and robust adaptation to new tasks. 2) A
Self-Sampling module designed to provide well-aligned visual prompts for
improved attention allocation; and 3) A robust attention-based decoder
specifically designed for medical few-shot learning to capture relationship
between different slices. Extensive experiments on a popular abdominal CT
dataset and an MRI dataset demonstrate that the proposed method achieves
significant improvements over state-of-the-art methods in few-shot
segmentation, with an average improvements of 10.21% and 1.80% in terms of DSC,
respectively. In conclusion, we present a novel approach for rapid online
adaptation in interactive image segmentation, adapting to a new organ in just
0.83 minutes. Code is publicly available on GitHub upon acceptance
Direct Superpoints Matching for Fast and Robust Point Cloud Registration
Although deep neural networks endow the downsampled superpoints with
discriminative feature representations, directly matching them is usually not
used alone in state-of-the-art methods, mainly for two reasons. First, the
correspondences are inevitably noisy, so RANSAC-like refinement is usually
adopted. Such ad hoc postprocessing, however, is slow and not differentiable,
which can not be jointly optimized with feature learning. Second, superpoints
are sparse and thus more RANSAC iterations are needed. Existing approaches use
the coarse-to-fine strategy to propagate the superpoints correspondences to the
point level, which are not discriminative enough and further necessitates the
postprocessing refinement. In this paper, we present a simple yet effective
approach to extract correspondences by directly matching superpoints using a
global softmax layer in an end-to-end manner, which are used to determine the
rigid transformation between the source and target point cloud. Compared with
methods that directly predict corresponding points, by leveraging the rich
information from the superpoints matchings, we can obtain more accurate
estimation of the transformation and effectively filter out outliers without
any postprocessing refinement. As a result, our approach is not only fast, but
also achieves state-of-the-art results on the challenging ModelNet and 3DMatch
benchmarks. Our code and model weights will be publicly released
SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
To advance research in learning-based defogging algorithms, various synthetic
fog datasets have been developed. However, existing datasets created using the
Atmospheric Scattering Model (ASM) or real-time rendering engines often
struggle to produce photo-realistic foggy images that accurately mimic the
actual imaging process. This limitation hinders the effective generalization of
models from synthetic to real data. In this paper, we introduce an end-to-end
simulation pipeline designed to generate photo-realistic foggy images. This
pipeline comprehensively considers the entire physically-based foggy scene
imaging process, closely aligning with real-world image capture methods. Based
on this pipeline, we present a new synthetic fog dataset named SynFog, which
features both sky light and active lighting conditions, as well as three levels
of fog density. Experimental results demonstrate that models trained on SynFog
exhibit superior performance in visual perception and detection accuracy
compared to others when applied to real-world foggy images
Unsupervised Deep Cross-Language Entity Alignment
Cross-lingual entity alignment is the task of finding the same semantic
entities from different language knowledge graphs. In this paper, we propose a
simple and novel unsupervised method for cross-language entity alignment. We
utilize the deep learning multi-language encoder combined with a machine
translator to encode knowledge graph text, which reduces the reliance on label
data. Unlike traditional methods that only emphasize global or local alignment,
our method simultaneously considers both alignment strategies. We first view
the alignment task as a bipartite matching problem and then adopt the
re-exchanging idea to accomplish alignment. Compared with the traditional
bipartite matching algorithm that only gives one optimal solution, our
algorithm generates ranked matching results which enabled many potentials
downstream tasks. Additionally, our method can adapt two different types of
optimization (minimal and maximal) in the bipartite matching process, which
provides more flexibility. Our evaluation shows, we each scored 0.966, 0.990,
and 0.996 Hits@1 rates on the DBP15K dataset in Chinese, Japanese, and French
to English alignment tasks. We outperformed the state-of-the-art method in
unsupervised and semi-supervised categories. Compared with the state-of-the-art
supervised method, our method outperforms 2.6% and 0.4% in Ja-En and Fr-En
alignment tasks while marginally lower by 0.2% in the Zh-En alignment task.Comment: 17 pages,5 figures, Accepted by ECML PKDD 2023(Research Track
- …