42,843 research outputs found

    ANSWER SELECTION USING WORD ALIGNMENT BASED ON PART OF SPEECH TAGGING IN COMMUNITY QUESTION ANSWERING

    Get PDF
    Community Question-Answering (CQA) is one of online forums where user allowed to ask question and the other users can reply or answer the related question or problems. Due to CQA has no restrictions in conveying questions or answers, there are comments that are not appropriate with the problems. To solve these problems, combining lexical and semantic features has been developed in the previous research. But, the approach more adequate for similarity task rather than question answering. According to this problem, there are several problems that can be enhanced. First, vector representation counts exactly matched words, it does not effective to cover other words that have relatedness between two pairing words. Second, noun overlap for similarity measure in pairing words can’t define that the two words are similar. So, it must be defined that the pairing POS tag is the same meaning or relatedness. In this study, unsupervised lexical and semantic similarity method employed with a different approach from the previous method in verbatim and contextual similarities. The data was taken from SemEval 2017 competition which focuses on Question-Answer Similarity task. The experiment result for precision (Mean Average Precision) score shows the significant improvement from 0.6742 to 0.6845, 1.03 % higher than previous research in CQA. This improvement comes from lexical similarity, which is not just from noun pattern but also taken from verb pattern. Furthermore, semantic similarity has an important role in determining which words that have the same pattern and meaning to define relevancy between them

    REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums

    Get PDF
    How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the "topics" of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest

    Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval

    Full text link
    Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant to applications like Quora, Stack Overflow, Siri etc. We propose a novel unsupervised model for semantic similarity based content retrieval, where we construct semantic flow graphs for each query, and introduce the concept of "soft seeding" in graph based semi-supervised learning (SSL) to convert this into an unsupervised model. We demonstrate the effectiveness of our model on an equivalent question retrieval problem on the Stack Exchange QA dataset, where our unsupervised approach significantly outperforms the state-of-the-art unsupervised models, and produces comparable results to the best supervised models. Our research provides a method to tackle semantic similarity based retrieval without any training data, and allows seamless extension to different domain QA communities, as well as to other semantic equivalence tasks.Comment: Published in Proceedings of the 2017 ACM Conference on Information and Knowledge Management (CIKM '17

    Talking to the crowd: What do people react to in online discussions?

    Full text link
    This paper addresses the question of how language use affects community reaction to comments in online discussion forums, and the relative importance of the message vs. the messenger. A new comment ranking task is proposed based on community annotated karma in Reddit discussions, which controls for topic and timing of comments. Experimental work with discussion threads from six subreddits shows that the importance of different types of language features varies with the community of interest
    • …
    corecore