1,782 research outputs found

    Software Entity Recognition with Noise-Robust Learning

    Full text link
    Recognizing software entities such as library names from free-form text is essential to enable many software engineering (SE) technologies, such as traceability link recovery, automated documentation, and API recommendation. While many approaches have been proposed to address this problem, they suffer from small entity vocabularies or noisy training data, hindering their ability to recognize software entities mentioned in sophisticated narratives. To address this challenge, we leverage the Wikipedia taxonomy to develop a comprehensive entity lexicon with 79K unique software entities in 12 fine-grained types, as well as a large labeled dataset of over 1.7M sentences. Then, we propose self-regularization, a noise-robust learning approach, to the training of our software entity recognition (SER) model by accounting for many dropouts. Results show that models trained with self-regularization outperform both their vanilla counterparts and state-of-the-art approaches on our Wikipedia benchmark and two Stack Overflow benchmarks. We release our models, data, and code for future research.Comment: ASE 202

    Mining Fix Patterns for FindBugs Violations

    Get PDF
    In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin

    On the use of Machine Learning and Deep Learning for Text Similarity and Categorization and its Application to Troubleshooting Automation

    Get PDF
    Troubleshooting is a labor-intensive task that includes repetitive solutions to similar problems. This task can be partially or fully automated using text-similarity matching to find previous solutions, lowering the workload of technicians. We develop a systematic literature review to identify the best approaches to solve the problem of troubleshooting automation and classify incidents effectively. We identify promising approaches and point in the direction of a comprehensive set of solutions that could be employed in solving the troubleshooting automation problem

    Comparative study on Judgment Text Classification for Transformer Based Models

    Full text link
    This work involves the usage of various NLP models to predict the winner of a particular judgment by the means of text extraction and summarization from a judgment document. These documents are useful when it comes to legal proceedings. One such advantage is that these can be used for citations and precedence reference in Lawsuits and cases which makes a strong argument for their case by the ones using it. When it comes to precedence, it is necessary to refer to an ample number of documents in order to collect legal points with respect to the case. However, reviewing these documents takes a long time to analyze due to the complex word structure and the size of the document. This work involves the comparative study of 6 different self-attention-based transformer models and how they perform when they are being tweaked in 4 different activation functions. These models which are trained with 200 judgement contexts and their results are being judged based on different benchmark parameters. These models finally have a confidence level up to 99% while predicting the judgment. This can be used to get a particular judgment document without spending too much time searching relevant cases and reading them completely.Comment: 28 pages with 9 figure

    Automatically responding to customers

    Get PDF

    Fairness Testing: A Comprehensive Survey and Analysis of Trends

    Full text link
    Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing
    corecore