7 research outputs found

    Mining and linking crowd-based software engineering how-to screencasts

    Get PDF
    In recent years, crowd-based content in the form of screencast videos has gained in popularity among software engineers. Screencasts are viewed and created for different purposes, such as a learning aid, being part of a software project’s documentation, or as a general knowledge sharing resource. For organizations to remain competitive in attracting and retaining their workforce, they must adapt to these technological and social changes in software engineering practices. In this thesis, we propose a novel methodology for mining and integrating crowd-based multi- media content in existing workflows to help provide software engineers of different levels of experience and roles access to a documentation they are familiar with or prefer. As a result, we first aim to gain insights on how a user’s background and the task to be performed influence the use of certain documentation media. We focus on tutorial screencasts to identify their important information sources and provide insights on their usage, advantages, and disadvantages from a practitioner’s perspective. To that end, we conduct a survey of software engineers. We discuss how software engineers benefit from screencasts as well as challenges they face in using screencasts as project documentation. Our survey results revealed that screencasts and question and answers sites are among the most popular crowd-based information sources used by software engineers. Also, the level of experience and the role or reason for resorting to a documentation source affects the types of documentation used by software engineers. The results of our survey support our motivation in this thesis and show that for screencasts, high quality content and a narrator are very important components for users. Unfortunately, the binary format of videos makes analyzing video content difficult. As a result, dissecting and filtering multimedia information based on its relevance to a given project is an inherently difficult task. Therefore, it is necessary to provide automated approaches for mining and linking this crowd-based multimedia documentation to their relevant software artifacts. In this thesis, we apply LDA-based (Latent Dirichlet Allocation) mining approaches that take as input a set of screencast artifacts, such as GUI (Graphical User Interface) text (labels) and spoken words, to perform information extraction and, therefore, increase the availability of both textual and multimedia documentation for various stakeholders of a software product. For example, this allows screencasts to be linked to other software artifacts such as source code to help software developers/maintainers have access to the implementation details of an application feature. We also present applications of our proposed methodology that include: 1) an LDA-based mining approach that extracts use case scenarios in text format from screencasts, 2) an LDA-based approach that links screencasts to their relevant artifacts (e.g., source code), and 3) a Semantic Web-based approach to establish direct links between vulnerability exploitation screencasts and their relevant vulnerability descriptions in the National Vulnerability Database (NVD) and indirectly link screencasts to their relevant Maven dependencies. To evaluate the applicability of the proposed approach, we report on empirical case studies conducted on existing screencasts that describe different use case scenarios of the WordPress and Firefox open source applications or vulnerability exploitation scenarios

    Defect Prediction on the Hardware Repository - A Case Study on the OpenRISC1000 Project

    Get PDF
    Software defect prediction is one of the most active research topics in the area of mining software engineering data. The software engineering data sources like the code repositories and the bug databases contain rich information about software development history. Mining these data can guide software developers for future development activities and help managers to improve the development process. Nowadays, the computer-engineering field has rapidly evolved from 1972 until present times to the modern chip design, which looks superficially and very much like software design. Hence, the main objective of this thesis is to check whether it would be possible to apply software defect prediction techniques on hardware repositories. In this thesis, we have applied various data mining methods (e.g., linear regression, logistic regression, random forests, and entropy) to predict the post-release bugs of OpenRISC 1000 projects. We have conducted two types of studies: classification (predicting buggy and non-buggy files) and ranking (predicting the buggiest files). In particular, the classification studies show promising results with an average precision and recall of up to 74% and 70% for projects written in Verilog and close to 100% for projects written in C

    Automatic Recall of Software Lessons Learned for Software Project Managers

    Get PDF
    Context: Lessons learned (LL) records constitute the software organization memory of successes and failures. LL are recorded within the organization repository for future reference to optimize planning, gain experience, and elevate market competitiveness. However, manually searching this repository is a daunting task, so it is often disregarded. This can lead to the repetition of previous mistakes or even missing potential opportunities. This, in turn, can negatively affect the organization’s profitability and competitiveness. Objective: We aim to present a novel solution that provides an automatic process to recall relevant LL and to push those LL to project managers. This will dramatically save the time and effort of manually searching the unstructured LL repositories and thus encourage the LL exploitation. Method: We exploit existing project artifacts to build the LL search queries on-the-fly in order to bypass the tedious manual searching. An empirical case study is conducted to build the automatic LL recall solution and evaluate its effectiveness. The study employs three of the most popular information retrieval models to construct the solution. Furthermore, a real-world dataset of 212 LL records from 30 different software projects is used for validation. Top-k and MAP well-known accuracy metrics are used as well. Results: Our case study results confirm the effectiveness of the automatic LL recall solution. Also, the results prove the success of using existing project artifacts to dynamically build the search query string. This is supported by a discerning accuracy of about 70% achieved in the case of top-k. Conclusion: The automatic LL recall solution is valid with high accuracy. It will eliminate the effort needed to manually search the LL repository. Therefore, this will positively encourage project managers to reuse the available LL knowledge – which will avoid old pitfalls and unleash hidden business opportunities

    Classificação de issues obtidas de repositórios de software: uma abordagem baseada em características textuais

    Get PDF
    The classification of issues in software maintenance repositories is currently done by software developers. However, this classification is conducted manually and is not free of errors, which cause problems in the distribution of issues to the maintenance teams. This happen because the developers, which usually are the proponents of the issues, have the bad habit of classifying them as bugs. This erroneous rating generates rework and other disadvantages to the teams. Therefore, the main objective of this study is to improve this classification, using an issue classification approach conducted in an automated manner. In turn, this approach was implemented based on machine learning tecniques. These tecniques show that keywords discriminant of issues types can be used as attributes of automatic classifiers for prediction of these issues. The approach was evaluated on five open source projects extracted from two widely used issue trackers, Jira and Bugzilla. Because they are old issue trackers, the chosen projects provided good number of issues for this study. These issues, about 7.000, were classified by human experts at work [Herzig, Just e Zeller 2013], producing a feedback which was used for this study. This present work produced an automatic issues classifier, with 81% of accuracy, able to predict them in types of bug, request for enhancement and improvement. The result of accuracy obtained by this classifier suggests that it can be used in delivery systems to treatment teams with the purpose of reducing rework that occurs in these teams because of the poor issues rating.Dissertação (Mestrado)A classificação das issues ou questões nos repositórios de manutenção de software é realizada atualmente pelos desenvolvedores de software. Entretanto, essa classificação manual não é livre de erros, os quais geram problemas na distribuição das issues para as equipes de tratamento. Isso acontece porque os desenvolvedores, geralmente os propositores das issues, possuem o mal hábito de classificá-las como bugs. Essas classificações errôneas produzem a distribuição de issues para uma equipe de tratamento de outro tipo de issue, gerando retrabalho para as equipes entre outras desvantagens. Por isso, o principal objetivo almejado com o estudo é a melhoria dessa classificação, utilizando de uma abordagem de classificação das issues realizada de maneira automatizada. Essa abordagem foi implementada com técnicas de Aprendizado de Máquina. Estas técnicas mostram que as palavras-chave discriminantes dos tipos de issues podem ser utilizadas como atributos de classificadores automáticos para a predição dessas issues. A abordagem foi avaliada sobre 5 projetos open source extraídos de 2 issue trackers conhecidos, Jira e Bugzilla. Por se tratarem de issue trackers de longa data, os projetos escolhidos forneceram boa quantidade de issues para este estudo. Essas issues, cerca de 7000, foram classificadas por especialistas humanos no trabalho [Herzig, Just e Zeller 2013], produzindo um gabarito utilizado para a realização deste estudo. Este trabalho produziu um classificador automático de issues, com acurácia de 81%, capaz de discriminá-las nos tipos bug, request for enhancement e improvement. O bom resultado de acurácia sugere que o classificador concebido possa ser utilizado em sistemas de encaminhamento de issues para as equipes de tratamento, com a Ąnalidade de diminuir retrabalho dessas equipes que ocorre em virtude da má classificação

    Mining Unstructured Software Repositories

    No full text
    corecore