10,260 research outputs found
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
Deep Memory Networks for Attitude Identification
We consider the task of identifying attitudes towards a given set of entities
from text. Conventionally, this task is decomposed into two separate subtasks:
target detection that identifies whether each entity is mentioned in the text,
either explicitly or implicitly, and polarity classification that classifies
the exact sentiment towards an identified entity (the target) into positive,
negative, or neutral.
Instead, we show that attitude identification can be solved with an
end-to-end machine learning architecture, in which the two subtasks are
interleaved by a deep memory network. In this way, signals produced in target
detection provide clues for polarity classification, and reversely, the
predicted polarity provides feedback to the identification of targets.
Moreover, the treatments for the set of targets also influence each other --
the learned representations may share the same semantics for some targets but
vary for others. The proposed deep memory network, the AttNet, outperforms
methods that do not consider the interactions between the subtasks or those
among the targets, including conventional machine learning methods and the
state-of-the-art deep learning models.Comment: Accepted to WSDM'1
CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval
Text-based Person Retrieval (TPR) aims to retrieve the target person images
given a textual query. The primary challenge lies in bridging the substantial
gap between vision and language modalities, especially when dealing with
limited large-scale datasets. In this paper, we introduce a CLIP-based
Synergistic Knowledge Transfer (CSKT) approach for TPR. Specifically, to
explore the CLIP's knowledge on input side, we first propose a Bidirectional
Prompts Transferring (BPT) module constructed by text-to-image and
image-to-text bidirectional prompts and coupling projections. Secondly, Dual
Adapters Transferring (DAT) is designed to transfer knowledge on output side of
Multi-Head Attention (MHA) in vision and language. This synergistic two-way
collaborative mechanism promotes the early-stage feature fusion and efficiently
exploits the existing knowledge of CLIP. CSKT outperforms the state-of-the-art
approaches across three benchmark datasets when the training parameters merely
account for 7.4% of the entire model, demonstrating its remarkable efficiency,
effectiveness and generalization.Comment: ICASSP2024(accepted). minor typos revision compared to version 1 in
arxi
- …