303 research outputs found

    Examining Teaching Charisma and Its Relation to Student Engagement

    Get PDF
    This study focuses on the factor of teaching charisma which comprises four key constructs: knowledge, character traits, teaching techniques, and humor. Participants were collected from 17 regular education classrooms within 6 colleges or universities in central Taiwan. The results revealed that the Inventory of Teaching Charisma in the College Classroom (ITCCC) is a psychometrically valid instrument which can accurately assess students’ perceptions of the quality of a teacher’s teaching in a professional course. Furthermore, a strong positive relationship between teacher’s charisma and student engagement was found and three factors of the teaching charisma can jointly predict student engagement in the professional subject. The importance of the teacher’s charisma in enhancing student engagement is confirmed

    F3Net: Fusion, Feedback and Focus for Salient Object Detection

    Full text link
    Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural networks. However, because of the different receptive fields of different convolutional layers, there exists big differences between features generated by these layers. Common feature fusion strategies (addition or concatenation) ignore these differences and may cause suboptimal solutions. In this paper, we propose the F3Net to solve above problem, which mainly consists of cross feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing a new pixel position aware loss (PPA). Specifically, CFM aims to selectively aggregate multi-level features. Different from addition and concatenation, CFM adaptively selects complementary components from input features before fusion, which can effectively avoid introducing too much redundant information that may destroy the original features. Besides, CFD adopts a multi-stage feedback mechanism, where features closed to supervision will be introduced to the output of previous layers to supplement them and eliminate the differences between features. These refined features will go through multiple similar iterations before generating the final saliency maps. Furthermore, different from binary cross entropy, the proposed PPA loss doesn't treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details. Hard pixels from boundaries or error-prone parts will be given more attention to emphasize their importance. F3Net is able to segment salient object regions accurately and provide clear local details. Comprehensive experiments on five benchmark datasets demonstrate that F3Net outperforms state-of-the-art approaches on six evaluation metrics.Comment: Accepted by AAAI2020, https://github.com/weijun88/F3Ne

    General Greedy De-bias Learning

    Full text link
    Neural networks often make predictions relying on the spurious correlations from the datasets rather than the intrinsic properties of the task of interest, facing sharp degradation on out-of-distribution (OOD) test data. Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios. Others implicitly identify the dataset bias by special design low capability biased models or losses, but they degrade when the training and testing data are from the same distribution. In this paper, we propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model. The base model is encouraged to focus on examples that are hard to solve with biased models, thus remaining robust against spurious correlations in the test stage. GGD largely improves models' OOD generalization ability on various tasks, but sometimes over-estimates the bias level and degrades on the in-distribution test. We further re-analyze the ensemble process of GGD and introduce the Curriculum Regularization inspired by curriculum learning, which achieves a good trade-off between in-distribution and out-of-distribution performance. Extensive experiments on image classification, adversarial question answering, and visual question answering demonstrate the effectiveness of our method. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.Comment: This work has been submitted to IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    ALID: Scalable Dominant Cluster Detection

    Full text link
    Detecting dominant clusters is important in many analytic applications. The state-of-the-art methods find dense subgraphs on the affinity graph as the dominant clusters. However, the time and space complexity of those methods are dominated by the construction of the affinity graph, which is quadratic with respect to the number of data points, and thus impractical on large data sets. To tackle the challenge, in this paper, we apply Evolutionary Game Theory (EGT) and develop a scalable algorithm, Approximate Localized Infection Immunization Dynamics (ALID). The major idea is to perform Localized Infection Immunization Dynamics (LID) to find dense subgraph within local range of the affinity graph. LID is further scaled up with guaranteed high efficiency and detection quality by an estimated Region of Interest (ROI) and a carefully designed Candidate Infective Vertex Search method (CIVS). ALID only constructs small local affinity graphs and has a time complexity of O(C(a^*+ {\delta})n) and a space complexity of O(a^*(a^*+ {\delta})), where a^* is the size of the largest dominant cluster and C << n and {\delta} << n are small constants. We demonstrate by extensive experiments on both synthetic data and real world data that ALID achieves state-of-the-art detection quality with much lower time and space cost on single machine. We also demonstrate the encouraging parallelization performance of ALID by implementing the Parallel ALID (PALID) on Apache Spark. PALID processes 50 million SIFT data points in 2.29 hours, achieving a speedup ratio of 7.51 with 8 executors

    R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

    Full text link
    Recent text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images given text-prompts as input. However, these models fail to convey appropriate spatial composition specified by a layout instruction. In this work, we probe into zero-shot grounded T2I generation with diffusion models, that is, generating images corresponding to the input layout information without training auxiliary modules or finetuning diffusion models. We propose a Region and Boundary (R&B) aware cross-attention guidance approach that gradually modulates the attention maps of diffusion model during generative process, and assists the model to synthesize images (1) with high fidelity, (2) highly compatible with textual input, and (3) interpreting layout instructions accurately. Specifically, we leverage the discrete sampling to bridge the gap between consecutive attention maps and discrete layout constraints, and design a region-aware loss to refine the generative layout during diffusion process. We further propose a boundary-aware loss to strengthen object discriminability within the corresponding regions. Experimental results show that our method outperforms existing state-of-the-art zero-shot grounded T2I generation methods by a large margin both qualitatively and quantitatively on several benchmarks.Comment: Preprint. Under review. Project page: https://sagileo.github.io/Region-and-Boundar

    A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

    Full text link
    Three-Dimensional (3D) dense captioning is an emerging vision-language bridging task that aims to generate multiple detailed and accurate descriptions for 3D scenes. It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning, as well as complexities in data collection and processing of 3D point cloud sources. Despite the popularity and success of existing methods, there is a lack of comprehensive surveys summarizing the advancements in this field, which hinders its progress. In this paper, we provide a comprehensive review of 3D dense captioning, covering task definition, architecture classification, dataset analysis, evaluation metrics, and in-depth prosperity discussions. Based on a synthesis of previous literature, we refine a standard pipeline that serves as a common paradigm for existing methods. We also introduce a clear taxonomy of existing models, summarize technologies involved in different modules, and conduct detailed experiment analysis. Instead of a chronological order introduction, we categorize the methods into different classes to facilitate exploration and analysis of the differences and connections among existing techniques. We also provide a reading guideline to assist readers with different backgrounds and purposes in reading efficiently. Furthermore, we propose a series of promising future directions for 3D dense captioning by identifying challenges and aligning them with the development of related tasks, offering valuable insights and inspiring future research in this field. Our aim is to provide a comprehensive understanding of 3D dense captioning, foster further investigations, and contribute to the development of novel applications in multimedia and related domains

    Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video

    Full text link
    Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue, which is caused by the uneven temporal distribution of the target moments for samples with similar semantic components in input videos or query texts. Existing methods resort to utilizing prior knowledge about bias to artificially break this uneven distribution, which only removes a limited amount of significant language biases. In this work, we propose the bias-conflict sample synthesis and adversarial removal debias strategy (BSSARD), which dynamically generates bias-conflict samples by explicitly leveraging potentially spurious correlations between single-modality features and the temporal position of the target moments. Through adversarial training, its bias generators continuously introduce biases and generate bias-conflict samples to deceive its grounding model. Meanwhile, the grounding model continuously eliminates the introduced biases, which requires it to model multi-modality alignment information. BSSARD will cover most kinds of coupling relationships and disrupt language and visual biases simultaneously. Extensive experiments on Charades-CD and ActivityNet-CD demonstrate the promising debiasing capability of BSSARD. Source codes are available at https://github.com/qzhb/BSSARD.Comment: accepted by AAAI 202
    • …
    corecore