153 research outputs found
Die Rolle von Lernaufgaben zur Förderung sprachlicher Interaktion im Anfängerunterricht Chinesisch als neu einsetzende Fremdsprache
Gegenstand der vorliegenden Arbeit sind kompetenzorientierte Lernaufgaben, sprachliche Interaktion und Chinesisch als Fremdsprache. Der Schwerpunkt der Forschung liegt in der Unterrichtsreihe, die in drei verschiedenen Kursen der Einführungsphase (EF) der Oberstufe erfolgte. Jeder Kurs wurde in 15 Unterrichtsstunden mit einer komplexen Lernaufgabe unterrichtet. Die Arbeit geht der Frage nach, welche Rolle Lernaufgaben zur Förderung sprachlicher Interaktion im ChaF-Unterricht spielen. Zur Beantwortung dieser Frage wurden zunächst ein Projekt mit einer Lernaufgabe durchgeführt und schließlich eine schriftliche Befragung als methodologisches Vorgehen in allen Gruppen angewandt. Die Ergebnisse machen deutlich, dass Lernaufgaben neue Aspekte für den herkömmlichen lehrwerkorientierten Fremdsprachen- bzw. ChaFUnterricht schaffen und auch in diesem Kontext die Lehrwerke ergänzen können
IMPACT OF PARTICIPATION IN THE EDUCATIONAL ROBOTICS COMPETITION FROM THE PARENT'S VIEWPOINT: A MIXED METHOD
Go Beyond Point Pairs: A General and Accurate Sim2Real Object Pose Voting Method with Efficient Online Synthetic Training
Object pose estimation is an important topic in 3D vision. Though most
current state-of-the-art method that trains on real-world pose annotations
achieve good results, the cost of such real-world training data is too high. In
this paper, we propose a novel method for sim-to-real pose estimation, which is
effective on both instance-level and category-level settings. The proposed
method is based on the point-pair voting scheme from CPPF to vote for object
centers, orientations, and scales. Unlike naive point pairs, to enrich the
context provided by each voting unit, we introduce N-point tuples to fuse
features from more than two points. Besides, a novel vote selection module is
leveraged in order to discard those `bad' votes. Experiments show that our
proposed method greatly advances the performance on both instance-level and
category-level scenarios. Our method further narrows the gap between
sim-to-real and real-training methods by generating synthetic training data
online efficiently, while all previous sim-to-real methods need to generate
data offline, because of their complex background synthesizing or
photo-realistic rendering. Code repository:
https://github.com/qq456cvb/BeyondPPF
DiffNAS: Bootstrapping Diffusion Models by Prompting for Better Architectures
Diffusion models have recently exhibited remarkable performance on synthetic
data. After a diffusion path is selected, a base model, such as UNet, operates
as a denoising autoencoder, primarily predicting noises that need to be
eliminated step by step. Consequently, it is crucial to employ a model that
aligns with the expected budgets to facilitate superior synthetic performance.
In this paper, we meticulously analyze the diffusion model and engineer a base
model search approach, denoted "DiffNAS". Specifically, we leverage GPT-4 as a
supernet to expedite the search, supplemented with a search memory to enhance
the results. Moreover, we employ RFID as a proxy to promptly rank the
experimental outcomes produced by GPT-4. We also adopt a rapid-convergence
training strategy to boost search efficiency. Rigorous experimentation
corroborates that our algorithm can augment the search efficiency by 2 times
under GPT-based scenarios, while also attaining a performance of 2.82 with 0.37
improvement in FID on CIFAR10 relative to the benchmark IDDPM algorithm
GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose
3D human mesh recovery from a 2D pose plays an important role in various
applications. However, it is hard for existing methods to simultaneously
capture the multiple relations during the evolution from skeleton to mesh,
including joint-joint, joint-vertex and vertex-vertex relations, which often
leads to implausible results. To address this issue, we propose a novel
solution, called GATOR, that contains an encoder of Graph-Aware Transformer
(GAT) and a decoder with Motion-Disentangled Regression (MDR) to explore these
multiple relations. Specifically, GAT combines a GCN and a graph-aware
self-attention in parallel to capture physical and hidden joint-joint
relations. Furthermore, MDR models joint-vertex and vertex-vertex interactions
to explore joint and vertex relations. Based on the clustering characteristics
of vertex offset fields, MDR regresses the vertices by composing the predicted
base motions. Extensive experiments show that GATOR achieves state-of-the-art
performance on two challenging benchmarks.Comment: Accepted by ICASSP 202
Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
Despite significant progress in single image-based 3D human mesh recovery,
accurately and smoothly recovering 3D human motion from a video remains
challenging. Existing video-based methods generally recover human mesh by
estimating the complex pose and shape parameters from coupled image features,
whose high complexity and low representation ability often result in
inconsistent pose motion and limited shape patterns. To alleviate this issue,
we introduce 3D pose as the intermediary and propose a Pose and Mesh
Co-Evolution network (PMCE) that decouples this task into two parts: 1)
video-based 3D human pose estimation and 2) mesh vertices regression from the
estimated 3D pose and temporal image feature. Specifically, we propose a
two-stream encoder that estimates mid-frame 3D pose and extracts a temporal
image feature from the input image sequence. In addition, we design a
co-evolution decoder that performs pose and mesh interactions with the
image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the
human body shape. Extensive experiments demonstrate that the proposed PMCE
outperforms previous state-of-the-art methods in terms of both per-frame
accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M,
and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.Comment: Accepted by ICCV 2023. Project page: https://kasvii.github.io/PMC
Self-supervised Guided Hypergraph Feature Propagation for Semi-supervised Classification with Missing Node Features
Graph neural networks (GNNs) with missing node features have recently
received increasing interest. Such missing node features seriously hurt the
performance of the existing GNNs. Some recent methods have been proposed to
reconstruct the missing node features by the information propagation among
nodes with known and unknown attributes. Although these methods have achieved
superior performance, how to exactly exploit the complex data correlations
among nodes to reconstruct missing node features is still a great challenge. To
solve the above problem, we propose a self-supervised guided hypergraph feature
propagation (SGHFP). Specifically, the feature hypergraph is first generated
according to the node features with missing information. And then, the
reconstructed node features produced by the previous iteration are fed to a
two-layer GNNs to construct a pseudo-label hypergraph. Before each iteration,
the constructed feature hypergraph and pseudo-label hypergraph are fused
effectively, which can better preserve the higher-order data correlations among
nodes. After then, we apply the fused hypergraph to the feature propagation for
reconstructing missing features. Finally, the reconstructed node features by
multi-iteration optimization are applied to the downstream semi-supervised
classification task. Extensive experiments demonstrate that the proposed SGHFP
outperforms the existing semi-supervised classification with missing node
feature methods.Comment: Accepted by 48th IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP 2023
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Visual commonsense understanding requires Vision Language (VL) models to not
only understand image and text but also cross-reference in-between to fully
integrate and achieve comprehension of the visual scene described. Recently,
various approaches have been developed and have achieved high performance on
visual commonsense benchmarks. However, it is unclear whether the models really
understand the visual scene and underlying commonsense knowledge due to limited
evaluation data resources. To provide an in-depth analysis, we present a
Multimodal Evaluation (ME) pipeline to automatically generate question-answer
pairs to test models' understanding of the visual scene, text, and related
knowledge. We then take a step further to show that training with the ME data
boosts the model's performance in standard VCR evaluation. Lastly, our in-depth
analysis and comparison reveal interesting findings: (1) semantically low-level
information can assist the learning of high-level information but not the
opposite; (2) visual information is generally under utilization compared with
text.Comment: Accepted to EMNLP 2022 Long Pape
Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models
Phishing attacks pose a significant threat to Internet users, with
cybercriminals elaborately replicating the visual appearance of legitimate
websites to deceive victims. Visual similarity-based detection systems have
emerged as an effective countermeasure, but their effectiveness and robustness
in real-world scenarios have been unexplored. In this paper, we comprehensively
scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing
models using a large-scale dataset of 450K real-world phishing websites. Our
analysis reveals that while certain models maintain high accuracy, others
exhibit notably lower performance than results on curated datasets,
highlighting the importance of real-world evaluation. In addition, we observe
the real-world tactic of manipulating visual components that phishing attackers
employ to circumvent the detection systems. To assess the resilience of
existing models against adversarial attacks and robustness, we apply visible
and perturbation-based manipulations to website logos, which adversaries
typically target. We then evaluate the models' robustness in handling these
adversarial samples. Our findings reveal vulnerabilities in several models,
emphasizing the need for more robust visual similarity techniques capable of
withstanding sophisticated evasion attempts. We provide actionable insights for
enhancing the security of phishing defense systems, encouraging proactive
actions. To the best of our knowledge, this work represents the first
large-scale, systematic evaluation of visual similarity-based models for
phishing detection in real-world settings, necessitating the development of
more effective and robust defenses.Comment: 12 page
- …