20 research outputs found

    Analysis of community question‐answering issues via machine learning and deep learning: State‐of‐the‐art review

    Get PDF
    Over the last couple of decades, community question-answering sites (CQAs) have been a topic of much academic interest. Scholars have often leveraged traditional machine learning (ML) and deep learning (DL) to explore the ever-growing volume of content that CQAs engender. To clarify the current state of the CQA literature that has used ML and DL, this paper reports a systematic literature review. The goal is to summarise and synthesise the major themes of CQA research related to (i) questions, (ii) answers and (iii) users. The final review included 133 articles. Dominant research themes include question quality, answer quality, and expert identification. In terms of dataset, some of the most widely studied platforms include Yahoo! Answers, Stack Exchange and Stack Overflow. The scope of most articles was confined to just one platform with few cross-platform investigations. Articles with ML outnumber those with DL. Nonetheless, the use of DL in CQA research is on an upward trajectory. A number of research directions are proposed

    Hierarchical Expert Recommendation on Community Question Answering Platforms

    Get PDF
    The community question answering (CQA) platforms, such as Stack Overflow, have become the primary source of answers to most questions in various topics. CQA platforms offer an opportunity for sharing and acquiring knowledge at a low cost, where users, many of whom are experts in a specific topic, can potentially provide high-quality solutions to a given question. Many recommendation methods have been proposed to match questions to potential good answerers. However, most existing methods have focused on modelling the user-question interaction — a user might answer multiple questions and a question might be answered by multiple users — using simple collaborative filtering approaches, overlooking the rich information in the question’s title and body when modelling the users’ expertise. This project fills the research gap by thoroughly examining machine learning and deep learning approaches that can be applied to the expert recommendation problem. It proposes a Hierarchical Expert Recommendation (HER) model, a deep learning recommender system that recommends experts to answer a given question in the CQA platform. Although choosing a deep learning over a machine learning solution for this problem can be justified considering the degree of complexity of the available datasets, we assess performance of each family of methods and evaluate the trade-off between them to pick the perfect fit for our problem. We analyzed various machine learning algorithms to determine their performances in the expert recommendation problem, which narrows down the potential ways for tackling this problem using traditional recommendation methods. Furthermore, we investigate the recommendation models based on matrix factorization to establish the baselines for our proposed model and shed light on the weaknesses and strengths of matrix- based solutions, which shape our final deep learning model. In the last section, we introduce the Hierarchical Expert Recommendation System (HER) that utilizes hierarchical attention-based neural networks to rep- resent the questions better and ultimately model the users’ expertise through user-question interactions. We conducted extensive experiments on a large real-world Stack Overflow dataset and benchmarked HER against the state-of-the-art baselines. The results from our extensive experiments show that HER outperforms the state-of-the-art baselines in recommending experts to answer questions in Stack Overflow

    SOCIALQ&A: A NOVEL APPROACH TO NOTIFIYING THE CORRECT USERS IN QUESTION AND ANSWERING SYSTEMS

    Get PDF
    Question and Answering (Q&A) systems are currently in use by a large number of Internet users. Q&A systems play a vital role in our daily life as an important platform for information and knowledge sharing. Hence, much research has been devoted to improving the performance of Q&A systems, with a focus on improving the quality of answers provided by users, reducing the wait time for users who ask questions, using a knowledge base to provide answers via text mining, and directing questions to appropriate users. Due to the growing popularity of Q&A systems, the number of questions in the system can become very large; thus, it is unlikely for an answer provider to simply stumble upon a question that he/she can answer properly. The primary objective of this research is to improve the quality of answers and to decrease wait times by forwarding questions to users who exhibit an interest or expertise in the area to which the question belongs. To that end, this research studies how to leverage social networks to enhance the performance of Q&A systems. We have proposed SocialQ&A, a social network based Q&A system that identifies and notifies the users who are most likely to answer a question. SocialQ&A incorporates three major components: User Interest Analyzer, Question Categorizer, and Question- User Mapper. The User Interest Analyzer associates each user with a vector of interest categories. The Question Categorizer algorithm associates a vector of interest categories to each question. Then, based on user interest and user social connectedness, the Question-User Mapper identifies a list of potential answer providers for each question. We have also implemented a real-world prototype for SocialQ&A and analyzed the data from questions/answers obtained from the prototype. Results suggest that social networks can be leveraged to improve the quality of answers and reduce the wait time for answers. Thus, this research provides a promising direction to improve the performance of Q&A systems

    Activity archetypes in question-and-answer (Q8A) websites—A study of 50 Stack Exchange instances

    Get PDF
    Millions of users on the Internet discuss a variety of topics on Question-and-Answer (Q&A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve self-sustaining levels of activity, while others fail to attract users and either never grow beyond being a small niche community or become inactive. Hence, it is imperative to not only better understand but also to distill deciding factors and rules that define and govern sustainable Q&A instances. We aim to empower community managers with quantitative methods for them to better understand, control and foster their communities, and thus contribute to making the Web a more efficient place to exchange information. To that end, we extract, model and cluster user activity-based time series from 5050 randomly selected Q&A instances from the Stack Exchange network to characterize user behavior. We find four distinct types of user activity temporal patterns, which vary primarily according to the users' activity frequency. Finally, by breaking down total activity in our 50 Q&A instances by the previously identified user activity profiles, we classify those 50 Q&A instances into three different activity profiles. Our parsimonious categorization of Q&A instances aligns with the stage of development and maturity of the underlying communities, and can potentially help operators of such instances: We not only quantitatively assess progress of Q&A instances, but we also derive practical implications for optimizing Q&A community building efforts, as we e.g. recommend which user types to focus on at different developmental stages of a Q&A community

    Justification for Class 3 Permit Modification, Corrective Action Complete with Controls, Solid Waste Management Unit 76, Mixed Waste Landfill, Sandia National Laboratories/New Mexico, EPA ID Number NM5890110518 Volumes I through VIII

    Get PDF
    The Department of Energy/National Nuclear Security Administration (DOE) and Sandia Corporation (Sandia) are submitting a request for a Class 3 Modification to Module IV of Hazardous Waste Permit NM5890110518-1 (the Permit). DOE and Sandia are requesting that the New Mexico Environment Department (NMED) designate solid waste management unit (SWMU) 76 as approved for Corrective Action Complete status. NMED made a preliminary determination in October 2014 that corrective action is complete at this SWMU. SWMU 76, known as the Mixed Waste Landfill (MWL), is a 2.6-acre site at Sandia National Laboratories, located on Kirtland Air Force Base immediately southeast of Albuquerque, New Mexico. Radioactive wastes and mixed wastes (radioactive wastes that are also hazardous wastes) were disposed of in the MWL from March 1959 through December 1988. The meximum depth of burial is approximately 25 feet below the ground surface. Groundwater occurs approximately 500 feet below the ground surface at the MWL. DOE and Sandia have implemented corrective measures at SWMU 76 in accordance with the requirements of the Permit; an April 2004 Compliance Order on Consent between NMED, DOE, and Sandia; and the plans approved by NMED. On January 8, 2014, NMED approved a long-term monitoring and maintenance plan (LTMMP) for SWMU 76. DOE and Sandia have implemented the approved LTMMP, maintaining the controls established through the corrective measures. The permit modification request consists of a letter with two enclosures: 1. A brief history or corrective action at SWMU 76 2. An index of the supporting documents that comprise the justification for the permit modification request. The supporting documents are included in an 8-volume set: Justification for Class 3 Permit Modification for Corrective Action Complete With Controls, Solid Waste Management Unit 76, Mixed Waste Landfill. Volume/pages: I/858. II/420. III/556. IV/1128. V/848. VI/1110. VII/914. VIII/866

    Understanding patient experience from online medium

    Get PDF
    Improving patient experience at hospitals leads to better health outcomes. To improve this, we must first understand and interpret patients' written feedback. Patient-generated texts such as patient reviews found on RateMD, or online health forums found on WebMD are venues where patients post about their experiences. Due to the massive amounts of patient-generated texts that exist online, an automated approach to identifying the topics from patient experience taxonomy is the only realistic option to analyze these texts. However, not only is there a lack of annotated taxonomy on these media, but also word usage is colloquial, making it challenging to apply standardized NLP technique to identify the topics that are present in the patient-generated texts. Furthermore, patients may describe multiple topics in the patient-generated texts which drastically increases the complexity of the task. In this thesis, we address the challenges in comprehensively and automatically understanding the patient experience from patient-generated texts. We first built a set of rich semantic features to represent the corpus which helps capture meanings that may not typically be captured by the bag-of-words (BOW) model. Unlike the BOW model, semantic feature representation captures the context and in-depth meaning behind each word in the corpus. To the best of our knowledge, no existing work in understanding patient experience from patient-generated texts delves into which semantic features help capture the characteristics of the corpus. Furthermore, patients generally talk about multiple topics when they write in patient-generated texts, and these are frequently interdependent of each other. There are two types of topic interdependencies, those that are semantically similar, and those that are not. We built a constraint-based deep neural network classifier to capture the two types of topic interdependencies and empirically show the classification performance improvement over the baseline approaches. Past research has also indicated that patient experiences differ depending on patient segments [1-4]. The segments can be based on demographics, for instance, by race, gender, or geographical location. Similarly, the segments can be based on health status, for example, whether or not the patient is taking medication, whether or not the patient has a particular disease, or whether or not the patient is readmitted to the hospital. To better understand patient experiences, we built an automated approach to identify patient segments with a focus on whether the person has stopped taking the medication or not. The technique used to identify the patient segment is general enough that we envision the approach to be applicable to other types of patient segments. With a comprehensive understanding of patient experiences, we envision an application system where clinicians can directly read the most relevant patient-generated texts that pertain to their interest. The system can capture topics from patient experience taxonomy that is of interest to each clinician or designated expert, and we believe the system is one of many approaches that can ultimately help improve the patient experience

    Knowledge aggregation in people recommender systems : matching skills to tasks

    Get PDF
    People recommender systems (PRS) are a special type of RS. They are often adopted to identify people capable of performing a task. Recommending people poses several challenges not exhibited in traditional RS. Elements such as availability, overload, unresponsiveness, and bad recommendations can have adverse effects. This thesis explores how people’s preferences can be elicited for single-event matchmaking under uncertainty and how to align them with appropriate tasks. Different methodologies are introduced to profile people, each based on the nature of the information from which it was obtained. These methodologies are developed into three use cases to illustrate the challenges of PRS and the steps taken to address them. Each one emphasizes the priorities of the matching process and the constraints under which these recommendations are made. First, multi-criteria profiles are derived completely from heterogeneous sources in an implicit manner characterizing users from multiple perspectives and multi-dimensional points-of-view without influence from the user. The profiles are introduced to the conference reviewer assignment problem. Attention is given to distribute people across items in order reduce potential overloading of a person, and neglect or rejection of a task. Second, people’s areas of interest are inferred from their resumes and expressed in terms of their uncertainty avoiding explicit elicitation from an individual or outsider. The profile is applied to a personnel selection problem where emphasis is placed on the preferences of the candidate leading to an asymmetric matching process. Third, profiles are created by integrating implicit information and explicitly stated attributes. A model is developed to classify citizens according to their lifestyles which maintains the original information in the data set throughout the cluster formation. These use cases serve as pilot tests for generalization to real-life implementations. Areas for future application are discussed from new perspectives.Els sistemes de recomanació de persones (PRS) són un tipus especial de sistemes recomanadors (RS). Sovint s’utilitzen per identificar persones per a realitzar una tasca. La recomanació de persones comporta diversos reptes no exposats en la RS tradicional. Elements com la disponibilitat, la sobrecàrrega, la falta de resposta i les recomanacions incorrectes poden tenir efectes adversos. En aquesta tesi s'explora com es poden obtenir les preferències dels usuaris per a la definició d'assignacions sota incertesa i com aquestes assignacions es poden alinear amb tasques definides. S'introdueixen diferents metodologies per definir el perfil d’usuaris, cadascun en funció de la naturalesa de la informació necessària. Aquestes metodologies es desenvolupen i s’apliquen en tres casos d’ús per il·lustrar els reptes dels PRS i els passos realitzats per abordar-los. Cadascun destaca les prioritats del procés, l’encaix de les recomanacions i les seves limitacions. En el primer cas, els perfils es deriven de variables heterogènies de manera implícita per tal de caracteritzar als usuaris des de múltiples perspectives i punts de vista multidimensionals sense la influència explícita de l’usuari. Això s’aplica al problema d'assignació d’avaluadors per a articles de conferències. Es presta especial atenció al fet de distribuir els avaluadors entre articles per tal de reduir la sobrecàrrega potencial d'una persona i el neguit o el rebuig a la tasca. En el segon cas, les àrees d’interès per a caracteritzar les persones es dedueixen dels seus currículums i s’expressen en termes d’incertesa evitant que els interessos es demanin explícitament a les persones. El sistema s'aplica a un problema de selecció de personal on es posa èmfasi en les preferències del candidat que condueixen a un procés d’encaix asimètric. En el tercer cas, els perfils dels usuaris es defineixen integrant informació implícita i atributs indicats explícitament. Es desenvolupa un model per classificar els ciutadans segons els seus estils de vida que manté la informació original del conjunt de dades del clúster al que ell pertany. Finalment, s’analitzen aquests casos com a proves pilot per generalitzar implementacions en futurs casos reals. Es discuteixen les àrees d'aplicació futures i noves perspectives.Postprint (published version

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading

    Materials experiment carrier concepts definition study. Volume 2: Technical report, part 2

    Get PDF
    A materials experiment carrier (MEC) that provides effective accommodation of the given baseline materials processing in space (MPS) payloads and demonstration of the MPS platform concept for high priority materials processing science, multidiscipline MPS investigations, host carrier for commercial MPS payloads, and system economy of orbital operations is defined. The study flow of task work is shown. Study tasks featured analysis and trades to identify the MEC system concept options
    corecore