815 research outputs found

    Bots, Seeds and People: Web Archives as Infrastructure

    Full text link
    The field of web archiving provides a unique mix of human and automated agents collaborating to achieve the preservation of the web. Centuries old theories of archival appraisal are being transplanted into the sociotechnical environment of the World Wide Web with varying degrees of success. The work of the archivist and bots in contact with the material of the web present a distinctive and understudied CSCW shaped problem. To investigate this space we conducted semi-structured interviews with archivists and technologists who were directly involved in the selection of content from the web for archives. These semi-structured interviews identified thematic areas that inform the appraisal process in web archives, some of which are encoded in heuristics and algorithms. Making the infrastructure of web archives legible to the archivist, the automated agents and the future researcher is presented as a challenge to the CSCW and archival community

    On the Helpfulness of Answering Developer Questions on Discord with Similar Conversations and Posts from the Past

    Full text link
    A big part of software developers’ time is spent finding answers to their coding-task-related questions. To answer their questions, developers usually perform web searches, ask questions on Q&A websites, or, more recently, in chat communities. Yet, many of these questions have frequently already been answered in previous chat conversations or other online communities. Automatically identifying and then suggesting these previous answers to the askers could, thus, save time and effort. In an empirical analysis, we first explored the frequency of repeating questions on the Discord chat platform and assessed our approach to identify them automatically. The approach was then evaluated with real-world developers in a field experiment, through which we received 142 ratings on the helpfulness of the suggestions we provided to help answer 277 questions that developers posted in four Discord communities. We further collected qualitative feedback through 53 surveys and 10 follow-up interviews. We found that the suggestions were considered helpful in 40% of the cases, that suggesting Stack Overflow posts is more often considered helpful than past Discord conversations, and that developers have difficulties describing their problems as search queries and, thus, prefer describing them as natural language questions in online communities

    Software expert discovery via knowledge domain embeddings in a collaborative network

    Full text link
    © 2018 Elsevier B.V. Community Question Answering (CQA) websites can be claimed as the most major venues for knowledge sharing, and the most effective way of exchanging knowledge at present. Considering that massive amount of users are participating online and generating huge amount data, management of knowledge here systematically can be challenging. Expert recommendation is one of the major challenges, as it highlights users in CQA with potential expertise, which may help match unresolved questions with existing high quality answers while at the same time may help external services like human resource systems as another reference to evaluate their candidates. In this paper, we in this work we propose to exploring experts in CQA websites. We take advantage of recent distributed word representation technology to help summarize text chunks, and in a semantic view exploiting the relationships between natural language phrases to extract latent knowledge domains. By domains, the users’ expertise is determined on their historical performance, and a rank can be compute to given recommendation accordingly. In particular, Stack Overflow is chosen as our dataset to test and evaluate our work, where inclusive experiment shows our competence

    Understanding and exploiting user intent in community question answering

    Get PDF
    A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation

    Connecting Researchers with Companies for University-Industry Collaboration

    Get PDF
    Nowadays, companies are spending more time and money to enhance their innovation ability to respond to the increasing market competition. The pressure makes companies seek help from external knowledge, especially those from academia. Unfortunately, there is a gap between knowledge seekers (companies) and suppliers (researchers) due to the scattered and asymmetric information. To facilitate shared economy, various platforms are designed to connect the two parties. In this context, we design a researcher recommendation system to promote their collaboration (e.g. patent license, collaborative research, contract research and consultancy) based on a research social network with complete information about both researchers and companies. In the recommendation system, we evaluate researchers from three aspects, including expertise relevance, quality and trustworthiness. The experiment result shows that our system performs well in recommending suitable researchers for companies. The recommendation system has been implemented on an innovation platform, InnoCity.

    Analyse temporelle et sémantique des réseaux sociaux typés à partir du contenu de sites généré par des utilisateurs sur le Web

    Get PDF
    We propose an approach to detect topics, overlapping communities of interest, expertise, trends andactivities in user-generated content sites and in particular in question-answering forums such asStackOverFlow. We first describe QASM (Question & Answer Social Media), a system based on socialnetwork analysis to manage the two main resources in question-answering sites: users and contents. Wealso introduce the QASM vocabulary used to formalize both the level of interest and the expertise ofusers on topics. We then propose an efficient approach to detect communities of interest. It relies onanother method to enrich questions with a more general tag when needed. We compared threedetection methods on a dataset extracted from the popular Q&A site StackOverflow. Our method basedon topic modeling and user membership assignment is shown to be much simpler and faster whilepreserving the quality of the detection. We then propose an additional method to automatically generatea label for a detected topic by analyzing the meaning and links of its bag of words. We conduct a userstudy to compare different algorithms to choose the label. Finally we extend our probabilistic graphicalmodel to jointly model topics, expertise, activities and trends. We performed experiments with realworlddata to confirm the effectiveness of our joint model, studying the users’ behaviors and topicsdynamicsNous proposons une approche pour dĂ©tecter les sujets, les communautĂ©s d'intĂ©rĂȘt non disjointes,l'expertise, les tendances et les activitĂ©s dans des sites oĂč le contenu est gĂ©nĂ©rĂ© par les utilisateurs et enparticulier dans des forums de questions-rĂ©ponses tels que StackOverFlow. Nous dĂ©crivons d'abordQASM (Questions & RĂ©ponses dans des mĂ©dias sociaux), un systĂšme basĂ© sur l'analyse de rĂ©seauxsociaux pour gĂ©rer les deux principales ressources d’un site de questions-rĂ©ponses: les utilisateurs et lecontenu. Nous prĂ©sentons Ă©galement le vocabulaire QASM utilisĂ© pour formaliser Ă  la fois le niveaud'intĂ©rĂȘt et l'expertise des utilisateurs. Nous proposons ensuite une approche efficace pour dĂ©tecter lescommunautĂ©s d'intĂ©rĂȘts. Elle repose sur une autre mĂ©thode pour enrichir les questions avec un tag plusgĂ©nĂ©ral en cas de besoin. Nous comparons trois mĂ©thodes de dĂ©tection sur un jeu de donnĂ©es extrait dusite populaire StackOverflow. Notre mĂ©thode basĂ©e sur le se rĂ©vĂšle ĂȘtre beaucoup plus simple et plusrapide, tout en prĂ©servant la qualitĂ© de la dĂ©tection. Nous proposons en complĂ©ment une mĂ©thode pourgĂ©nĂ©rer automatiquement un label pour un sujet dĂ©tectĂ© en analysant le sens et les liens de ses mots-clefs.Nous menons alors une Ă©tude pour comparer diffĂ©rents algorithmes pour gĂ©nĂ©rer ce label. Enfin, nousĂ©tendons notre modĂšle de graphes probabilistes pour modĂ©liser conjointement les sujets, l'expertise, lesactivitĂ©s et les tendances. Nous le validons sur des donnĂ©es du monde rĂ©el pour confirmer l'efficacitĂ© denotre modĂšle intĂ©grant les comportements des utilisateurs et la dynamique des sujet
    • 

    corecore