Search CORE

12 research outputs found

A Large-Scale Community Questions Classification Accounting for Category Similarity: An Exploratory?

Author: Braslavski P.
Lezina G.
Браславский П. И.
Лезина Г.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The paper reports on a large-scale topical categorization of questions from a Russian community question answering (CQA) service [email protected]. We used a data set containing all the questions (more than 11 millions) asked by [email protected] users in 2012. This is the first study on question categorization dealing with non-English data of this size. The study focuses on adjusting category structure in order to get more robust classification results. We investigate several approaches to measure similarity between categories: the share of identical questions, language models, and user activity. The results show that the proposed approach is promising.14-07-00589; RFBR; Russian Foundation for Basic Research

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries

Author: Braslavski P.
Hagen M.
Lezina G.
Stein B.
Voelske M.
Браславский П. И.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users’ needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers’ querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3–4% of the total search traffic. Since questions are such a small part of the query stream and are more likely to be unique than shorter queries, clickthrough information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy. To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer. Our findings reveal what people ask a search engine and also how this contrasts behavior on a CQA platform

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Automatic Identification of Ineffective Online Student Questions in Computing Education

Author: Barnes Bradley
Branch Robert Maribe
Galyardt April
Hao Qiang
Wright Ewan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/03/2019
Field of study

This Research Full Paper explores automatic identification of ineffective learning questions in the context of large-scale computer science classes. The immediate and accurate identification of ineffective learning questions opens the door to possible automated facilitation on a large scale, such as alerting learners to revise questions and providing adaptive question revision suggestions. To achieve this, 983 questions were collected from a question & answer platform implemented by an introductory programming course over three semesters in a large research university in the Southeastern United States. Questions were firstly manually classified into three hierarchical categories: 1) learning-irrelevant questions, 2) effective learning-relevant questions, 3) ineffective learningrelevant questions. The inter-rater reliability of the manual classification (Cohen's Kappa) was .88. Four different machine learning algorithms were then used to automatically classify the questions, including Naive Bayes Multinomial, Logistic Regression, Support Vector Machines, and Boosted Decision Tree. Both flat and single path strategies were explored, and the most effective algorithms under both strategies were identified and discussed. This study contributes to the automatic determination of learning question quality in computer science, and provides evidence for the feasibility of automated facilitation of online question & answer in large scale computer science classes

arXiv.org e-Print Archive

Crossref

Hierarchical Text Classification with Reinforced Label Assignment

Author: Han Jiawei
Mao Yuning
Ren Xiang
Tian Jingjing
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

While existing hierarchical text classification (HTC) methods attempt to capture label hierarchies for model training, they either make local decisions regarding each label or completely ignore the hierarchy information during inference. To solve the mismatch between training and inference as well as modeling label dependencies in a more principled way, we formulate HTC as a Markov decision process and propose to learn a Label Assignment Policy via deep reinforcement learning to determine where to place an object and when to stop the assignment process. The proposed method, HiLAP, explores the hierarchy during both training and inference time in a consistent manner and makes inter-dependent decisions. As a general framework, HiLAP can incorporate different neural encoders as base models for end-to-end training. Experiments on five public datasets and four base models show that HiLAP yields an average improvement of 33.4% in Macro-F1 over flat classifiers and outperforms state-of-the-art HTC methods by a large margin. Data and code can be found at https://github.com/morningmoni/HiLAP.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

So fast so good : An analysis of answer quality and answer speed in community question answering sites

Author: Barry
Berlanga
Blooma
Christensen
Cohen
Deng
Fichman
Fu
Fung
Gazan
Goffman
Harper
Jansen
Kahn
Kelly
Kim
Kim
Kitzie
Krippendorf
Laukkanen
Lin
Neuendorf
Nisbett
Park
Potter
Qu
Raban
Roussinov
Saracevic
Schamber
Schau
Shachaf
Shah
Shah
Spink
Surowiecki
Tractinsky
Varnum
Weber
Wu
Zheng
Publication venue
Publication date: 30/07/2013
Field of study

The authors investigate the interplay between answer quality and answer speed across question types in community question-answering sites (CQAs). The research questions addressed are the following: (a) How do answer quality and answer speed vary across question types? (b) How do the relationships between answer quality and answer speed vary across question types? (c) How do the best quality answers and the fastest answers differ in terms of answer quality and answer speed across question types? (d) How do trends in answer quality vary over time across question types? From the posting of 3,000 questions in six CQAs, 5,356 answers were harvested and analyzed. There was a significant difference in answer quality and answer speed across question types, and there were generally no significant relationships between answer quality and answer speed. The best quality answers had better overall answer quality than the fastest answers but generally took longer to arrive. In addition, although the trend in answer quality had been mostly random across all question types, the quality of answers appeared to improve gradually when given time. By highlighting the subtle nuances in answer quality and answer speed across question types, this study is an attempt to explore a territory of CQA research that has hitherto been relatively uncharted

Crossref

White Rose Research Online

Question Answering System : A Review On Question Analysis, Document Processing, And Answer Extraction Techniques

Author: Azmi Mohd Sanusi
Suryana Nanna
Utomo Fandy Setyo
Publication venue: JATIT & Little Lion Scientific (LLS)
Publication date: 01/01/2017
Field of study

Question Answering System could automatically provide an answer to a question posed by human in natural languages. This system consists of question analysis, document processing, and answer extraction module. Question Analysis module has task to translate query into a form that can be processed by document processing module. Document processing is a technique for identifying candidate documents, containing answer relevant to the user query. Furthermore, answer extraction module receives the set of passages from document processing module, then determine the best answers to user. Challenge to optimize Question Answering framework is to increase the performance of all modules in the framework. The performance of all modules that has not been optimized has led to the less accurate answer from question answering systems. Based on this issues, the objective of this study is to review the current state of question analysis, document processing, and answer extraction techniques. Result from this study reveals the potential research issues, namely morphology analysis, question classification, and term weighting algorithm for question classification

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Facilitating Efficient Information Seeking in Social Media

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Online social media is popular due to its real-time nature, extensive connectivity and a large user base. This motivates users to employ social media for seeking information by reaching out to their large number of social connections. Information seeking can manifest in the form of requests for personal and time-critical information or gathering perspectives on important issues. Social media platforms are not designed for resource seeking and experience large volumes of messages, leading to requests not being fulfilled satisfactorily. Designing frameworks to facilitate efficient information seeking in social media will help users to obtain appropriate assistance for their needs and help platforms to increase user satisfaction. Several challenges exist in the way of facilitating information seeking in social media. First, the characteristics affecting the user’s response time for a question are not known, making it hard to identify prompt responders. Second, the social context in which the user has asked the question has to be determined to find personalized responders. Third, users employ rhetorical requests, which are statements having the syntax of questions, and systems assisting information seeking might be hindered from focusing on genuine questions. Fouth, social media advocates of political campaigns employ nuanced strategies to prevent users from obtaining balanced perspectives on issues of public importance. Sociological and linguistic studies on user behavior while making or responding to information seeking requests provides concepts drawing from which we can address these challenges. We propose methods to estimate the response time of the user for a given question to identify prompt responders. We compute the question specific social context an asker shares with his social connections to identify personalized responders. We draw from theories of political mobilization to model the behaviors arising from the strategies of people trying to skew perspectives. We identify rhetorical questions by modeling user motivations to post them.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

A study on improving the quality of community Q&A websites using machine learning

Author: Shimada Tatsuro
シマダタツロウ
島田達朗
Publication venue: 慶應義塾大学大学院理工学研究科
Publication date
Field of study

KeiO Academic Resource Archive