Search CORE

6 research outputs found

ExpFinder: An Ensemble Expert Finding Model Integrating $N$ -gram Vector Space Model and $\mu$ CO-HITS

Author: Aryani Amir
Du Hung
Forkan Abdur Rahim Mohammad
Jayaraman Prem Prakash
Kang Yong-Bin
Sellis Timos
Publication venue
Publication date: 17/01/2021
Field of study

Finding an expert plays a crucial role in driving successful collaborations and speeding up high-quality research development and innovations. However, the rapid growth of scientific publications and digital expertise data makes identifying the right experts a challenging problem. Existing approaches for finding experts given a topic can be categorised into information retrieval techniques based on vector space models, document language models, and graph-based models. In this paper, we propose

\textit{ExpFinder}

, a new ensemble model for expert finding, that integrates a novel

N

-gram vector space model, denoted as

n

VSM, and a graph-based model, denoted as \textit{\muCO-HITS}, that is a proposed variation of the CO-HITS algorithm. The key of

n

VSM is to exploit recent inverse document frequency weighting method for

N

-gram words and

\textit{ExpFinder}

incorporates

n

VSM into \textit{\muCO-HITS} to achieve expert finding. We comprehensively evaluate

\textit{ExpFinder}

on four different datasets from the academic domains in comparison with six different expert finding models. The evaluation results show that

\textit{ExpFinder}

is a highly effective model for expert finding, substantially outperforming all the compared models in 19% to 160.2%.Comment: 15 pages, 18 figures, "for source code on Github, see https://github.com/Yongbinkang/ExpFinder", "Submitted to IEEE Transactions on Knowledge and Data Engineering

arXiv.org e-Print Archive

Comparando a eficácia na recuperação de questionários: QSMatching vs Vector model vs Fuzzy

Author: Friedrich Dorneles Carina
Henrique de Souza Richard
Publication venue: 'Universidade Federal do Estado do Rio de Janeiro UNIRIO'
Publication date: 17/04/2019
Field of study

Elaborar um questionário útil representa uma tarefa importante para a pesquisa descritiva. Perguntas mal elaboradas podem levar a respostas com interpretações sem sentido, sutis ou ingênuas. Portanto, pode ser interessante reutilizar, parcial ou totalmente, questionários já criados com o mesmo propósito. Neste trabalho, comparamos o QSMatching com os modelos vetorial e fuzzy para calcular a similaridade entre questionários e, consequentemente, obter uma ordenação de questionários de acordo com a consulta do usuário. Para verificar a efetividade, foi realizado um experimento comparando as abordagens QSMatching, modelo vetorial e fuzzy. O resultado da análise do experimento mostra que o QSMatching é mais efetivo que outros modelos para recuperação de questionários

Universidade Federal do Estado do Rio de Janeiro: Portal de Revistas da UNIRIO

Analysis of community question‐answering issues via machine learning and deep learning: State‐of‐the‐art review

Author: Banerjee Snehasish
Gutub Adnan
Roy Pradeep Kumar
Saumya Sunil
Singh Jyoti Prakash
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 04/05/2022
Field of study

Over the last couple of decades, community question-answering sites (CQAs) have been a topic of much academic interest. Scholars have often leveraged traditional machine learning (ML) and deep learning (DL) to explore the ever-growing volume of content that CQAs engender. To clarify the current state of the CQA literature that has used ML and DL, this paper reports a systematic literature review. The goal is to summarise and synthesise the major themes of CQA research related to (i) questions, (ii) answers and (iii) users. The final review included 133 articles. Dominant research themes include question quality, answer quality, and expert identification. In terms of dataset, some of the most widely studied platforms include Yahoo! Answers, Stack Exchange and Stack Overflow. The scope of most articles was confined to just one platform with few cross-platform investigations. Articles with ML outnumber those with DL. Nonetheless, the use of DL in CQA research is on an upward trajectory. A number of research directions are proposed

White Rose Research Online

Understanding and exploiting user intent in community question answering

Author: Chen Long
Publication venue
Publication date
Field of study

A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation