68 research outputs found
GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection
With the rapid development of deep generative models (such as Generative
Adversarial Networks and Diffusion models), AI-synthesized images are now of
such high quality that humans can hardly distinguish them from pristine ones.
Although existing detection methods have shown high performance in specific
evaluation settings, e.g., on images from seen models or on images without
real-world post-processing, they tend to suffer serious performance degradation
in real-world scenarios where testing images can be generated by more powerful
generation models or combined with various post-processing operations. To
address this issue, we propose a Global and Local Feature Fusion (GLFF)
framework to learn rich and discriminative representations by combining
multi-scale global features from the whole image with refined local features
from informative patches for AI synthesized image detection. GLFF fuses
information from two branches: the global branch to extract multi-scale
semantic features and the local branch to select informative patches for
detailed local artifacts extraction. Due to the lack of a synthesized image
dataset simulating real-world applications for evaluation, we further create a
challenging fake image dataset, named DeepFakeFaceForensics (DF 3 ), which
contains 6 state-of-the-art generation models and a variety of post-processing
techniques to approach the real-world scenarios. Experimental results
demonstrate the superiority of our method to the state-of-the-art methods on
the proposed DF 3 dataset and three other open-source datasets.Comment: 13 pages, 6 figures, 8 table
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
In recent years, numerous effective multi-object tracking (MOT) methods are
developed because of the wide range of applications. Existing performance
evaluations of MOT methods usually separate the object tracking step from the
object detection step by using the same fixed object detection results for
comparisons. In this work, we perform a comprehensive quantitative study on the
effects of object detection accuracy to the overall MOT performance, using the
new large-scale University at Albany DETection and tRACking (UA-DETRAC)
benchmark dataset. The UA-DETRAC benchmark dataset consists of 100 challenging
video sequences captured from real-world traffic scenes (over 140,000 frames
with rich annotations, including occlusion, weather, vehicle category,
truncation, and vehicle bounding boxes) for object detection, object tracking
and MOT system. We evaluate complete MOT systems constructed from combinations
of state-of-the-art object detection and object tracking methods. Our analysis
shows the complex effects of object detection accuracy on MOT system
performance. Based on these observations, we propose new evaluation tools and
metrics for MOT systems that consider both object detection and object tracking
for comprehensive analysis.Comment: 18 pages, 11 figures, accepted by CVI
A Mixture Model for Random Responding Behavior in Forced-Choice Noncognitive Assessment:Implication and Application in Organizational Research
For various reasons, respondents to forced-choice assessments (typically used for noncognitive psychological constructs) may respond randomly to individual items due to indecision or globally due to disengagement. Thus, random responding is a complex source of measurement bias and threatens the reliability of forced-choice assessments, which are essential in high-stakes organizational testing scenarios, such as hiring decisions. The traditional measurement models rely heavily on nonrandom, construct-relevant responses to yield accurate parameter estimates. When survey data contain many random responses, fitting traditional models may deliver biased results, which could attenuate measurement reliability. This study presents a new forced-choice measure-based mixture item response theory model (called M-TCIR) for simultaneously modeling normal and random responses (distinguishing completely and incompletely random). The feasibility of the M-TCIR was investigated via two Monte Carlo simulation studies. In addition, one empirical dataset was analyzed to illustrate the applicability of the M-TCIR in practice. The results revealed that most model parameters were adequately recovered, and the M-TCIR was a viable alternative to model both aberrant and normal responses with high efficiency.</p
Electrolyte influence on sorption behaviours of Direct Blue 71 dye on ramie fibre
Ramie loose fibre was dyed using Direct Blue 71 dye at 70, 80, 90 and 100°C without and with NaCl electrolyte in order to investigate the distinction of dye sorption behaviours. The results show that the dye exhaustion increases with addition of NaCl and shortens the equilibrium dyeing time. The dye adsorption process of dyeing without and with NaCl followed pseudo second-order kinetics, but the rate constant of sorption is larger for the latter compared to the former
MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis
Aspect-based sentiment analysis is a long-standing research interest in the
field of opinion mining, and in recent years, researchers have gradually
shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA
tasks. However, the datasets currently used in the research are limited to
individual elements of specific tasks, usually focusing on in-domain settings,
ignoring implicit aspects and opinions, and with a small data scale. To address
these issues, we propose a large-scale Multi-Element Multi-Domain dataset
(MEMD) that covers the four elements across five domains, including nearly
20,000 review sentences and 30,000 quadruples annotated with explicit and
implicit aspects and opinions for ABSA research. Meanwhile, we evaluate
generative and non-generative baselines on multiple ABSA subtasks under the
open domain setting, and the results show that open domain ABSA as well as
mining implicit aspects and opinions remain ongoing challenges to be addressed.
The datasets are publicly released at \url{https://github.com/NUSTM/MEMD-ABSA}
A Unified Framework for Modality-Agnostic Deepfakes Detection
As AI-generated content (AIGC) thrives, deepfakes have expanded from
single-modality falsification to cross-modal fake content creation, where
either audio or visual components can be manipulated. While using two unimodal
detectors can detect audio-visual deepfakes, cross-modal forgery clues could be
overlooked. Existing multimodal deepfake detection methods typically establish
correspondence between the audio and visual modalities for binary real/fake
classification, and require the co-occurrence of both modalities. However, in
real-world multi-modal applications, missing modality scenarios may occur where
either modality is unavailable. In such cases, audio-visual detection methods
are less practical than two independent unimodal methods. Consequently, the
detector can not always obtain the number or type of manipulated modalities
beforehand, necessitating a fake-modality-agnostic audio-visual detector. In
this work, we introduce a comprehensive framework that is agnostic to fake
modalities, which facilitates the identification of multimodal deepfakes and
handles situations with missing modalities, regardless of the manipulations
embedded in audio, video, or even cross-modal forms. To enhance the modeling of
cross-modal forgery clues, we employ audio-visual speech recognition (AVSR) as
a preliminary task. This efficiently extracts speech correlations across
modalities, a feature challenging for deepfakes to replicate. Additionally, we
propose a dual-label detection approach that follows the structure of AVSR to
support the independent detection of each modality. Extensive experiments on
three audio-visual datasets show that our scheme outperforms state-of-the-art
detection methods with promising performance on modality-agnostic audio/video
deepfakes.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for
grounding a variety of entities, such as object instances, agents, and regions,
with free-form text-based queries. Unlike conventional semantic-based object
localization approaches, our system facilitates context-aware entity
localization, allowing for queries such as ``pick up a cup on a kitchen table"
or ``navigate to a sofa on which someone is sitting". In contrast to existing
research on 3D scene graphs, OVSG supports free-form text input and
open-vocabulary querying. Through a series of comparative experiments using the
ScanNet dataset and a self-collected dataset, we demonstrate that our proposed
approach significantly surpasses the performance of previous semantic-based
localization techniques. Moreover, we highlight the practical application of
OVSG in real-world robot navigation and manipulation experiments.Comment: The code and dataset used for evaluation can be found at
https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG.
This paper has been accepted by CoRL202
- …