42,843 research outputs found
ANSWER SELECTION USING WORD ALIGNMENT BASED ON PART OF SPEECH TAGGING IN COMMUNITY QUESTION ANSWERING
Community Question-Answering (CQA) is one of online forums where user allowed to ask question and the other users can reply or answer the related question or problems. Due to CQA has no restrictions in conveying questions or answers, there are comments that are not appropriate with the problems. To solve these problems, combining lexical and semantic features has been developed in the previous research. But, the approach more adequate for similarity task rather than question answering. According to this problem, there are several problems that can be enhanced. First, vector representation counts exactly matched words, it does not effective to cover other words that have relatedness between two pairing words. Second, noun overlap for similarity measure in pairing words can’t define that the two words are similar. So, it must be defined that the pairing POS tag is the same meaning or relatedness. In this study, unsupervised lexical and semantic similarity method employed with a different approach from the previous method in verbatim and contextual similarities. The data was taken from SemEval 2017 competition which focuses on Question-Answer Similarity task. The experiment result for precision (Mean Average Precision) score shows the significant improvement from 0.6742 to 0.6845, 1.03 % higher than previous research in CQA. This improvement comes from lexical similarity, which is not just from noun pattern but also taken from verb pattern. Furthermore, semantic similarity has an important role in determining which words that have the same pattern and meaning to define relevancy between them
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
Semantic similarity based retrieval is playing an increasingly important role
in many IR systems such as modern web search, question-answering, similar
document retrieval etc. Improvements in retrieval of semantically similar
content are very significant to applications like Quora, Stack Overflow, Siri
etc. We propose a novel unsupervised model for semantic similarity based
content retrieval, where we construct semantic flow graphs for each query, and
introduce the concept of "soft seeding" in graph based semi-supervised learning
(SSL) to convert this into an unsupervised model.
We demonstrate the effectiveness of our model on an equivalent question
retrieval problem on the Stack Exchange QA dataset, where our unsupervised
approach significantly outperforms the state-of-the-art unsupervised models,
and produces comparable results to the best supervised models. Our research
provides a method to tackle semantic similarity based retrieval without any
training data, and allows seamless extension to different domain QA
communities, as well as to other semantic equivalence tasks.Comment: Published in Proceedings of the 2017 ACM Conference on Information
and Knowledge Management (CIKM '17
Talking to the crowd: What do people react to in online discussions?
This paper addresses the question of how language use affects community
reaction to comments in online discussion forums, and the relative importance
of the message vs. the messenger. A new comment ranking task is proposed based
on community annotated karma in Reddit discussions, which controls for topic
and timing of comments. Experimental work with discussion threads from six
subreddits shows that the importance of different types of language features
varies with the community of interest
- …