36,806 research outputs found
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
Finding Relevant Answers in Software Forums
AbstractāOnline software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin
- ā¦