2 research outputs found

    Towards a corpus for credibility assessment in software practitioner blog articles

    Get PDF
    Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state-of-practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with 'argumentation' and 'evidence' being two important criteria. We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss' Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document

    Semi-Automatic Mapping Technique Using Snowballing to Support Massive Literature Searches in Software Engineering

    Get PDF
    Systematic literature reviews represent an important methodology in Evidence-Based Software Engineering. To define the methodological route in these type of studies, in which a review of quantitative and qualitative aspects of primary studies is carried out to summarize the existing information regarding a particular topic, researchers use protocols that guide the construction of knowledge from research questions. This article presents a process that uses forward Snowballing, which identifies the articles cited in the paper under study and the number of citations as inclusion criteria to complement systematic literature reviews. A process that relies on software tools was designed to apply the Snowballing strategy and to identify the most cited works and those who cite them. To validate the process, a review identified in the literature was used. After comparing the results, new works that were not taken into account but made contributions to the subject of study emerged. The citation index represents the number of times a publication has been referenced in other documents and is used as a mechanism to analyze, measure, or quantitatively assess the impact of said publication on the scientific community. The present study showed how applying Snowballing along with other strategies enables the emergence of works that may be relevant for an investigation given the citations rate. That is, implementing this proposal will allow updating or expanding systematic literature studies through the new works evidenced
    corecore