113 research outputs found
Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection
Human-Object Interaction (HOI) detection is a core task for high-level image
understanding. Recently, Detection Transformer (DETR)-based HOI detectors have
become popular due to their superior performance and efficient structure.
However, these approaches typically adopt fixed HOI queries for all testing
images, which is vulnerable to the location change of objects in one specific
image. Accordingly, in this paper, we propose to enhance DETR's robustness by
mining hard-positive queries, which are forced to make correct predictions
using partial visual cues. First, we explicitly compose hard-positive queries
according to the ground-truth (GT) position of labeled human-object pairs for
each training image. Specifically, we shift the GT bounding boxes of each
labeled human-object pair so that the shifted boxes cover only a certain
portion of the GT ones. We encode the coordinates of the shifted boxes for each
labeled human-object pair into an HOI query. Second, we implicitly construct
another set of hard-positive queries by masking the top scores in
cross-attention maps of the decoder layers. The masked attention maps then only
cover partial important cues for HOI predictions. Finally, an alternate
strategy is proposed that efficiently combines both types of hard queries. In
each iteration, both DETR's learnable queries and one selected type of
hard-positive queries are adopted for loss computation. Experimental results
show that our proposed approach can be widely applied to existing DETR-based
HOI detectors. Moreover, we consistently achieve state-of-the-art performance
on three benchmarks: HICO-DET, V-COCO, and HOI-A. Code is available at
https://github.com/MuchHair/HQM.Comment: Accepted by ECCV202
Fluid Transformers and Creative Analogies: Exploring Large Language Models' Capacity for Augmenting Cross-Domain Analogical Creativity
Cross-domain analogical reasoning is a core creative ability that can be
challenging for humans. Recent work has shown some proofs-of concept of Large
language Models' (LLMs) ability to generate cross-domain analogies. However,
the reliability and potential usefulness of this capacity for augmenting human
creative work has received little systematic exploration. In this paper, we
systematically explore LLMs capacity to augment cross-domain analogical
reasoning. Across three studies, we found: 1) LLM-generated cross-domain
analogies were frequently judged as helpful in the context of a problem
reformulation task (median 4 out of 5 helpfulness rating), and frequently (~80%
of cases) led to observable changes in problem formulations, and 2) there was
an upper bound of 25% of outputs bring rated as potentially harmful, with a
majority due to potentially upsetting content, rather than biased or toxic
content. These results demonstrate the potential utility -- and risks -- of
LLMs for augmenting cross-domain analogical creativity
AI Chatbots as Multi-Role Pedagogical Agents: Transforming Engagement in CS Education
This study investigates the use of Artificial Intelligence (AI)-powered,
multi-role chatbots as a means to enhance learning experiences and foster
engagement in computer science education. Leveraging a design-based research
approach, we develop, implement, and evaluate a novel learning environment
enriched with four distinct chatbot roles: Instructor Bot, Peer Bot, Career
Advising Bot, and Emotional Supporter Bot. These roles, designed around the
tenets of Self-Determination Theory, cater to the three innate psychological
needs of learners - competence, autonomy, and relatedness. Additionally, the
system embraces an inquiry-based learning paradigm, encouraging students to ask
questions, seek solutions, and explore their curiosities.
We test this system in a higher education context over a period of one month
with 200 participating students, comparing outcomes with conditions involving a
human tutor and a single chatbot. Our research utilizes a mixed-methods
approach, encompassing quantitative measures such as chat log sequence
analysis, and qualitative methods including surveys and focus group interviews.
By integrating cutting-edge Natural Language Processing techniques such as
topic modelling and sentiment analysis, we offer an in-depth understanding of
the system's impact on learner engagement, motivation, and inquiry-based
learning.
This study, through its rigorous design and innovative approach, provides
significant insights into the potential of AI-empowered, multi-role chatbots in
reshaping the landscape of computer science education and fostering an
engaging, supportive, and motivating learning environment
Targeted online password guessing:an underestimated threat
While trawling online/offline password guessing has been intensively studied, only a few studies have examined targeted online guessing, where an attacker guesses a specific victim's password for a service, by exploiting the victim's personal information such as one sister password leaked from her another account and some personally identifiable information (PII). A key challenge for targeted online guessing is to choose the most effective password candidates, while the number of guess attempts allowed by a server's lockout or throttling mechanisms is typically very small. We propose TarGuess, a framework that systematically characterizes typical targeted guessing scenarios with seven sound mathematical models, each of which is based on varied kinds of data available to an attacker. These models allow us to design novel and efficient guessing algorithms. Extensive experiments on 10 large real-world password datasets show the effectiveness of TarGuess. Particularly, TarGuess I~IV capture the four most representative scenarios and within 100 guesses: (1) TarGuess-I outperforms its foremost counterpart by 142% against security-savvy users and by 46% against normal users; (2) TarGuess-II outperforms its foremost counterpart by 169% on security-savvy users and by 72% against normal users; and (3) Both TarGuess-III and IV gain success rates over 73% against normal users and over 32% against security-savvy users. TarGuess-III and IV, for the first time, address the issue of cross-site online guessing when given the victim's one sister password and some PII
Monad: Towards Cost-effective Specialization for Chiplet-based Spatial Accelerators
Advanced packaging offers a new design paradigm in the post-Moore era, where
many small chiplets can be assembled into a large system. Based on
heterogeneous integration, a chiplet-based accelerator can be highly
specialized for a specific workload, demonstrating extreme efficiency and cost
reduction. To fully leverage this potential, it is critical to explore both the
architectural design space for individual chiplets and different integration
options to assemble these chiplets, which have yet to be fully exploited by
existing proposals. This paper proposes Monad, a cost-aware specialization
approach for chiplet-based spatial accelerators that explores the tradeoffs
between PPA and fabrication costs. To evaluate a specialized system, we
introduce a modeling framework considering the non-uniformity in dataflow,
pipelining, and communications when executing multiple tensor workloads on
different chiplets. We propose to combine the architecture and integration
design space by uniformly encoding the design aspects for both spaces and
exploring them with a systematic ML-based approach. The experiments demonstrate
that Monad can achieve an average of 16% and 30% EDP reduction compared with
the state-of-the-art chiplet-based accelerators, Simba and NN-Baton,
respectively.Comment: To be published in ICCAD 202
STDA-Meta: A Meta-Learning Framework for Few-Shot Traffic Prediction
As the development of cities, traffic congestion becomes an increasingly
pressing issue, and traffic prediction is a classic method to relieve that
issue. Traffic prediction is one specific application of spatio-temporal
prediction learning, like taxi scheduling, weather prediction, and ship
trajectory prediction. Against these problems, classical spatio-temporal
prediction learning methods including deep learning, require large amounts of
training data. In reality, some newly developed cities with insufficient
sensors would not hold that assumption, and the data scarcity makes predictive
performance worse. In such situation, the learning method on insufficient data
is known as few-shot learning (FSL), and the FSL of traffic prediction remains
challenges. On the one hand, graph structures' irregularity and dynamic nature
of graphs cannot hold the performance of spatio-temporal learning method. On
the other hand, conventional domain adaptation methods cannot work well on
insufficient training data, when transferring knowledge from different domains
to the intended target domain.To address these challenges, we propose a novel
spatio-temporal domain adaptation (STDA) method that learns transferable
spatio-temporal meta-knowledge from data-sufficient cities in an adversarial
manner. This learned meta-knowledge can improve the prediction performance of
data-scarce cities. Specifically, we train the STDA model using a
Model-Agnostic Meta-Learning (MAML) based episode learning process, which is a
model-agnostic meta-learning framework that enables the model to solve new
learning tasks using only a small number of training samples. We conduct
numerous experiments on four traffic prediction datasets, and our results show
that the prediction performance of our model has improved by 7\% compared to
baseline models on the two metrics of MAE and RMSE
Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation
To explore how humans can best leverage LLMs for writing and how interacting
with these models affects feelings of ownership and trust in the writing
process, we compared common human-AI interaction types (e.g., guiding system,
selecting from system outputs, post-editing outputs) in the context of
LLM-assisted news headline generation. While LLMs alone can generate
satisfactory news headlines, on average, human control is needed to fix
undesirable model outputs. Of the interaction methods, guiding and selecting
model output added the most benefit with the lowest cost (in time and effort).
Further, AI assistance did not harm participants' perception of control
compared to freeform editing
UniSparse: An Intermediate Language for General Sparse Format Customization
The ongoing trend of hardware specialization has led to a growing use of
custom data formats when processing sparse workloads, which are typically
memory-bound. These formats facilitate optimized software/hardware
implementations by utilizing sparsity pattern- or target-aware data structures
and layouts to enhance memory access latency and bandwidth utilization.
However, existing sparse tensor programming models and compilers offer little
or no support for productively customizing the sparse formats. Additionally,
because these frameworks represent formats using a limited set of per-dimension
attributes, they lack the flexibility to accommodate numerous new variations of
custom sparse data structures and layouts. To overcome this deficiency, we
propose UniSparse, an intermediate language that provides a unified abstraction
for representing and customizing sparse formats. Unlike the existing
attribute-based frameworks, UniSparse decouples the logical representation of
the sparse tensor (i.e., the data structure) from its low-level memory layout,
enabling the customization of both. As a result, a rich set of format
customizations can be succinctly expressed in a small set of well-defined
query, mutation, and layout primitives. We also develop a compiler leveraging
the MLIR infrastructure, which supports adaptive customization of formats, and
automatic code generation of format conversion and compute operations for
heterogeneous architectures. We demonstrate the efficacy of our approach
through experiments running commonly-used sparse linear algebra operations with
specialized formats on multiple different hardware targets, including an Intel
CPU, an NVIDIA GPU, an AMD Xilinx FPGA, and a simulated processing-in-memory
(PIM) device.Comment: to be published in OOPSLA'2
- …