Search CORE

815 research outputs found

Bots, Seeds and People: Web Archives as Infrastructure

Author: Booms Hans
Botticelli Peter
Cook Terry
Couture Carol
Geertz Clifford
Jordan Brigitte
Mohr Gordon
Niu Jinfang
Seaver Nick
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/11/2016
Field of study

The field of web archiving provides a unique mix of human and automated agents collaborating to achieve the preservation of the web. Centuries old theories of archival appraisal are being transplanted into the sociotechnical environment of the World Wide Web with varying degrees of success. The work of the archivist and bots in contact with the material of the web present a distinctive and understudied CSCW shaped problem. To investigate this space we conducted semi-structured interviews with archivists and technologists who were directly involved in the selection of content from the web for archives. These semi-structured interviews identified thematic areas that inform the appraisal process in web archives, some of which are encoded in heuristics and algorithms. Making the infrastructure of web archives legible to the archivist, the automated agents and the future researcher is presented as a challenge to the CSCW and archival community

arXiv.org e-Print Archive

Crossref

On the Helpfulness of Answering Developer Questions on Discord with Similar Conversations and Posts from the Past

Author: Fritz Thomas
Lill Alexander
Meyer André N.
Publication venue: ACM Digital library
Publication date: 14/04/2024
Field of study

A big part of software developers’ time is spent finding answers to their coding-task-related questions. To answer their questions, developers usually perform web searches, ask questions on Q&A websites, or, more recently, in chat communities. Yet, many of these questions have frequently already been answered in previous chat conversations or other online communities. Automatically identifying and then suggesting these previous answers to the askers could, thus, save time and effort. In an empirical analysis, we first explored the frequency of repeating questions on the Discord chat platform and assessed our approach to identify them automatically. The approach was then evaluated with real-world developers in a field experiment, through which we received 142 ratings on the helpfulness of the suggestions we provided to help answer 277 questions that developers posted in four Discord communities. We further collected qualitative feedback through 53 surveys and 10 follow-up interviews. We found that the suggestions were considered helpful in 40% of the cases, that suggesting Stack Overflow posts is more often considered helpful than past Discord conversations, and that developers have difficulties describing their problems as search queries and, thus, prefer describing them as natural language questions in online communities

ZORA

Software expert discovery via knowledge domain embeddings in a collaborative network

Author: Benatallah B
Huang C
Wang X
Yao L
Zhang X
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

© 2018 Elsevier B.V. Community Question Answering (CQA) websites can be claimed as the most major venues for knowledge sharing, and the most effective way of exchanging knowledge at present. Considering that massive amount of users are participating online and generating huge amount data, management of knowledge here systematically can be challenging. Expert recommendation is one of the major challenges, as it highlights users in CQA with potential expertise, which may help match unresolved questions with existing high quality answers while at the same time may help external services like human resource systems as another reference to evaluate their candidates. In this paper, we in this work we propose to exploring experts in CQA websites. We take advantage of recent distributed word representation technology to help summarize text chunks, and in a semantic view exploiting the relationships between natural language phrases to extract latent knowledge domains. By domains, the users’ expertise is determined on their historical performance, and a rank can be compute to given recommendation accordingly. In particular, Stack Overflow is chosen as our dataset to test and evaluate our work, where inclusive experiment shows our competence

OPUS - University of Technology Sydney

UNSWorks

Understanding and exploiting user intent in community question answering

Author: Chen Long
Publication venue
Publication date
Field of study

A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation

Birkbeck Institutional Research Online

Connecting Researchers with Companies for University-Industry Collaboration

Author: Deng Weiwei
Liao Xiuwu
Ma Jian
Wang Qi
Zhang Mingyu
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2017
Field of study

Nowadays, companies are spending more time and money to enhance their innovation ability to respond to the increasing market competition. The pressure makes companies seek help from external knowledge, especially those from academia. Unfortunately, there is a gap between knowledge seekers (companies) and suppliers (researchers) due to the scattered and asymmetric information. To facilitate shared economy, various platforms are designed to connect the two parties. In this context, we design a researcher recommendation system to promote their collaboration (e.g. patent license, collaborative research, contract research and consultancy) based on a research social network with complete information about both researchers and companies. In the recommendation system, we evaluate researchers from three aspects, including expertise relevance, quality and trustworthiness. The experiment result shows that our system performs well in recommending suitable researchers for companies. The recommendation system has been implemented on an innovation platform, InnoCity.

Crossref

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

The Power and Potentials of Flexible Query Answering Systems:A Critical and Comprehensive Analysis

Author: Andreasen Troels
Bordogna Gloria
De Tré Guy
Kacprzyk Janusz
Larsen Henrik Legind
Zadrozny Slawomir
Publication venue: SSRN
Publication date: 01/12/2022
Field of study

Roskilde Universitet

Analyse temporelle et sémantique des réseaux sociaux typés à partir du contenu de sites généré par des utilisateurs sur le Web

Author: Meng Zide
Publication venue: HAL CCSD
Publication date: 07/11/2016
Field of study

We propose an approach to detect topics, overlapping communities of interest, expertise, trends andactivities in user-generated content sites and in particular in question-answering forums such asStackOverFlow. We first describe QASM (Question & Answer Social Media), a system based on socialnetwork analysis to manage the two main resources in question-answering sites: users and contents. Wealso introduce the QASM vocabulary used to formalize both the level of interest and the expertise ofusers on topics. We then propose an efficient approach to detect communities of interest. It relies onanother method to enrich questions with a more general tag when needed. We compared threedetection methods on a dataset extracted from the popular Q&A site StackOverflow. Our method basedon topic modeling and user membership assignment is shown to be much simpler and faster whilepreserving the quality of the detection. We then propose an additional method to automatically generatea label for a detected topic by analyzing the meaning and links of its bag of words. We conduct a userstudy to compare different algorithms to choose the label. Finally we extend our probabilistic graphicalmodel to jointly model topics, expertise, activities and trends. We performed experiments with realworlddata to confirm the effectiveness of our joint model, studying the users’ behaviors and topicsdynamicsNous proposons une approche pour détecter les sujets, les communautés d'intérêt non disjointes,l'expertise, les tendances et les activités dans des sites où le contenu est généré par les utilisateurs et enparticulier dans des forums de questions-réponses tels que StackOverFlow. Nous décrivons d'abordQASM (Questions & Réponses dans des médias sociaux), un système basé sur l'analyse de réseauxsociaux pour gérer les deux principales ressources d’un site de questions-réponses: les utilisateurs et lecontenu. Nous présentons également le vocabulaire QASM utilisé pour formaliser à la fois le niveaud'intérêt et l'expertise des utilisateurs. Nous proposons ensuite une approche efficace pour détecter lescommunautés d'intérêts. Elle repose sur une autre méthode pour enrichir les questions avec un tag plusgénéral en cas de besoin. Nous comparons trois méthodes de détection sur un jeu de données extrait dusite populaire StackOverflow. Notre méthode basée sur le se révèle être beaucoup plus simple et plusrapide, tout en préservant la qualité de la détection. Nous proposons en complément une méthode pourgénérer automatiquement un label pour un sujet détecté en analysant le sens et les liens de ses mots-clefs.Nous menons alors une étude pour comparer différents algorithmes pour générer ce label. Enfin, nousétendons notre modèle de graphes probabilistes pour modéliser conjointement les sujets, l'expertise, lesactivités et les tendances. Nous le validons sur des données du monde réel pour confirmer l'efficacité denotre modèle intégrant les comportements des utilisateurs et la dynamique des sujet

INRIA a CCSD electronic archive server