Search CORE

1,782 research outputs found

Software Entity Recognition with Noise-Robust Learning

Author: Chen Muhao
Di Yifeng
Lee Joohan
Nguyen Tai
Zhang Tianyi
Publication venue
Publication date: 21/08/2023
Field of study

Recognizing software entities such as library names from free-form text is essential to enable many software engineering (SE) technologies, such as traceability link recovery, automated documentation, and API recommendation. While many approaches have been proposed to address this problem, they suffer from small entity vocabularies or noisy training data, hindering their ability to recognize software entities mentioned in sophisticated narratives. To address this challenge, we leverage the Wikipedia taxonomy to develop a comprehensive entity lexicon with 79K unique software entities in 12 fine-grained types, as well as a large labeled dataset of over 1.7M sentences. Then, we propose self-regularization, a noise-robust learning approach, to the training of our software entity recognition (SER) model by accounting for many dropouts. Results show that models trained with self-regularization outperform both their vanilla counterparts and state-of-the-art approaches on our Wikipedia benchmark and two Stack Overflow benchmarks. We release our models, data, and code for future research.Comment: ASE 202

arXiv.org e-Print Archive

Mining Fix Patterns for FindBugs Violations

Author: Bissyandé Tegawendé F.
Kim Dongsun
Liu Kui
Traon Yves Le
Yoo Shin
Publication venue
Publication date: 01/01/2018
Field of study

In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

On the use of Machine Learning and Deep Learning for Text Similarity and Categorization and its Application to Troubleshooting Automation

Author: Callegari Daniel
Couto Julia
Godoy Julia
Kniest Davi
Meneguzzi Felipe
Ruiz Duncan
Tomaz Laura
Publication venue: 'HICSS Conference Office'
Publication date: 03/01/2022
Field of study

Troubleshooting is a labor-intensive task that includes repetitive solutions to similar problems. This task can be partially or fully automated using text-similarity matching to find previous solutions, lowering the workload of technicians. We develop a systematic literature review to identify the best approaches to solve the problem of troubleshooting automation and classify incidents effectively. We identify promising approaches and point in the direction of a comprehensive set of solutions that could be employed in solving the troubleshooting automation problem

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Comparative study on Judgment Text Classification for Transformer Based Models

Author: A V Shrinivas
Kingston Stanley
MS Balamurugan
Prassanth
Rajagopal Manoj Kumar
Publication venue
Publication date: 17/04/2023
Field of study

This work involves the usage of various NLP models to predict the winner of a particular judgment by the means of text extraction and summarization from a judgment document. These documents are useful when it comes to legal proceedings. One such advantage is that these can be used for citations and precedence reference in Lawsuits and cases which makes a strong argument for their case by the ones using it. When it comes to precedence, it is necessary to refer to an ample number of documents in order to collect legal points with respect to the case. However, reviewing these documents takes a long time to analyze due to the complex word structure and the size of the document. This work involves the comparative study of 6 different self-attention-based transformer models and how they perform when they are being tweaked in 4 different activation functions. These models which are trained with 200 judgement contexts and their results are being judged based on different benchmark parameters. These models finally have a confidence level up to 99% while predicting the judgment. This can be used to get a particular judgment document without spending too much time searching relevant cases and reading them completely.Comment: 28 pages with 9 figure

arXiv.org e-Print Archive

Automatically responding to customers

Author: Huijzer T.H.
Publication venue
Publication date: 25/03/2019
Field of study

Pure OAI Repository

Fairness Testing: A Comprehensive Survey and Analysis of Trends

Author: Chen Zhenpeng
Harman Mark
Hort Max
Sarro Federica
Zhang Jie M.
Publication venue
Publication date: 19/07/2023
Field of study

Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing

arXiv.org e-Print Archive