303 research outputs found
Examining Teaching Charisma and Its Relation to Student Engagement
This study focuses on the factor of teaching charisma which comprises four key constructs: knowledge, character traits, teaching techniques, and humor. Participants were collected from 17 regular education classrooms within 6 colleges or universities in central Taiwan. The results revealed that the Inventory of Teaching Charisma in the College Classroom (ITCCC) is a psychometrically valid instrument which can accurately assess students’ perceptions of the quality of a teacher’s teaching in a professional course. Furthermore, a strong positive relationship between teacher’s charisma and student engagement was found and three factors of the teaching charisma can jointly predict student engagement in the professional subject. The importance of the teacher’s charisma in enhancing student engagement is confirmed
F3Net: Fusion, Feedback and Focus for Salient Object Detection
Most of existing salient object detection models have achieved great progress
by aggregating multi-level features extracted from convolutional neural
networks. However, because of the different receptive fields of different
convolutional layers, there exists big differences between features generated
by these layers. Common feature fusion strategies (addition or concatenation)
ignore these differences and may cause suboptimal solutions. In this paper, we
propose the F3Net to solve above problem, which mainly consists of cross
feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing
a new pixel position aware loss (PPA). Specifically, CFM aims to selectively
aggregate multi-level features. Different from addition and concatenation, CFM
adaptively selects complementary components from input features before fusion,
which can effectively avoid introducing too much redundant information that may
destroy the original features. Besides, CFD adopts a multi-stage feedback
mechanism, where features closed to supervision will be introduced to the
output of previous layers to supplement them and eliminate the differences
between features. These refined features will go through multiple similar
iterations before generating the final saliency maps. Furthermore, different
from binary cross entropy, the proposed PPA loss doesn't treat pixels equally,
which can synthesize the local structure information of a pixel to guide the
network to focus more on local details. Hard pixels from boundaries or
error-prone parts will be given more attention to emphasize their importance.
F3Net is able to segment salient object regions accurately and provide clear
local details. Comprehensive experiments on five benchmark datasets demonstrate
that F3Net outperforms state-of-the-art approaches on six evaluation metrics.Comment: Accepted by AAAI2020, https://github.com/weijun88/F3Ne
General Greedy De-bias Learning
Neural networks often make predictions relying on the spurious correlations
from the datasets rather than the intrinsic properties of the task of interest,
facing sharp degradation on out-of-distribution (OOD) test data. Existing
de-bias learning frameworks try to capture specific dataset bias by annotations
but they fail to handle complicated OOD scenarios. Others implicitly identify
the dataset bias by special design low capability biased models or losses, but
they degrade when the training and testing data are from the same distribution.
In this paper, we propose a General Greedy De-bias learning framework (GGD),
which greedily trains the biased models and the base model. The base model is
encouraged to focus on examples that are hard to solve with biased models, thus
remaining robust against spurious correlations in the test stage. GGD largely
improves models' OOD generalization ability on various tasks, but sometimes
over-estimates the bias level and degrades on the in-distribution test. We
further re-analyze the ensemble process of GGD and introduce the Curriculum
Regularization inspired by curriculum learning, which achieves a good trade-off
between in-distribution and out-of-distribution performance. Extensive
experiments on image classification, adversarial question answering, and visual
question answering demonstrate the effectiveness of our method. GGD can learn a
more robust base model under the settings of both task-specific biased models
with prior knowledge and self-ensemble biased model without prior knowledge.Comment: This work has been submitted to IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
ALID: Scalable Dominant Cluster Detection
Detecting dominant clusters is important in many analytic applications. The
state-of-the-art methods find dense subgraphs on the affinity graph as the
dominant clusters. However, the time and space complexity of those methods are
dominated by the construction of the affinity graph, which is quadratic with
respect to the number of data points, and thus impractical on large data sets.
To tackle the challenge, in this paper, we apply Evolutionary Game Theory (EGT)
and develop a scalable algorithm, Approximate Localized Infection Immunization
Dynamics (ALID). The major idea is to perform Localized Infection Immunization
Dynamics (LID) to find dense subgraph within local range of the affinity graph.
LID is further scaled up with guaranteed high efficiency and detection quality
by an estimated Region of Interest (ROI) and a carefully designed Candidate
Infective Vertex Search method (CIVS). ALID only constructs small local
affinity graphs and has a time complexity of O(C(a^*+ {\delta})n) and a space
complexity of O(a^*(a^*+ {\delta})), where a^* is the size of the largest
dominant cluster and C << n and {\delta} << n are small constants. We
demonstrate by extensive experiments on both synthetic data and real world data
that ALID achieves state-of-the-art detection quality with much lower time and
space cost on single machine. We also demonstrate the encouraging
parallelization performance of ALID by implementing the Parallel ALID (PALID)
on Apache Spark. PALID processes 50 million SIFT data points in 2.29 hours,
achieving a speedup ratio of 7.51 with 8 executors
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
Recent text-to-image (T2I) diffusion models have achieved remarkable progress
in generating high-quality images given text-prompts as input. However, these
models fail to convey appropriate spatial composition specified by a layout
instruction. In this work, we probe into zero-shot grounded T2I generation with
diffusion models, that is, generating images corresponding to the input layout
information without training auxiliary modules or finetuning diffusion models.
We propose a Region and Boundary (R&B) aware cross-attention guidance approach
that gradually modulates the attention maps of diffusion model during
generative process, and assists the model to synthesize images (1) with high
fidelity, (2) highly compatible with textual input, and (3) interpreting layout
instructions accurately. Specifically, we leverage the discrete sampling to
bridge the gap between consecutive attention maps and discrete layout
constraints, and design a region-aware loss to refine the generative layout
during diffusion process. We further propose a boundary-aware loss to
strengthen object discriminability within the corresponding regions.
Experimental results show that our method outperforms existing state-of-the-art
zero-shot grounded T2I generation methods by a large margin both qualitatively
and quantitatively on several benchmarks.Comment: Preprint. Under review. Project page:
https://sagileo.github.io/Region-and-Boundar
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Three-Dimensional (3D) dense captioning is an emerging vision-language
bridging task that aims to generate multiple detailed and accurate descriptions
for 3D scenes. It presents significant potential and challenges due to its
closer representation of the real world compared to 2D visual captioning, as
well as complexities in data collection and processing of 3D point cloud
sources. Despite the popularity and success of existing methods, there is a
lack of comprehensive surveys summarizing the advancements in this field, which
hinders its progress. In this paper, we provide a comprehensive review of 3D
dense captioning, covering task definition, architecture classification,
dataset analysis, evaluation metrics, and in-depth prosperity discussions.
Based on a synthesis of previous literature, we refine a standard pipeline that
serves as a common paradigm for existing methods. We also introduce a clear
taxonomy of existing models, summarize technologies involved in different
modules, and conduct detailed experiment analysis. Instead of a chronological
order introduction, we categorize the methods into different classes to
facilitate exploration and analysis of the differences and connections among
existing techniques. We also provide a reading guideline to assist readers with
different backgrounds and purposes in reading efficiently. Furthermore, we
propose a series of promising future directions for 3D dense captioning by
identifying challenges and aligning them with the development of related tasks,
offering valuable insights and inspiring future research in this field. Our aim
is to provide a comprehensive understanding of 3D dense captioning, foster
further investigations, and contribute to the development of novel applications
in multimedia and related domains
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video
Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias
issue, which is caused by the uneven temporal distribution of the target
moments for samples with similar semantic components in input videos or query
texts. Existing methods resort to utilizing prior knowledge about bias to
artificially break this uneven distribution, which only removes a limited
amount of significant language biases. In this work, we propose the
bias-conflict sample synthesis and adversarial removal debias strategy
(BSSARD), which dynamically generates bias-conflict samples by explicitly
leveraging potentially spurious correlations between single-modality features
and the temporal position of the target moments. Through adversarial training,
its bias generators continuously introduce biases and generate bias-conflict
samples to deceive its grounding model. Meanwhile, the grounding model
continuously eliminates the introduced biases, which requires it to model
multi-modality alignment information. BSSARD will cover most kinds of coupling
relationships and disrupt language and visual biases simultaneously. Extensive
experiments on Charades-CD and ActivityNet-CD demonstrate the promising
debiasing capability of BSSARD. Source codes are available at
https://github.com/qzhb/BSSARD.Comment: accepted by AAAI 202
- …