517 research outputs found
Issues with SZZ: An empirical assessment of the state of practice of defect prediction data collection
Defect prediction research has a strong reliance on published data sets that
are shared between researchers. The SZZ algorithm is the de facto standard for
collecting defect labels for this kind of data and is used by most public data
sets. Thus, problems with the SZZ algorithm may have a strong indirect impact
on almost the complete state of the art of defect prediction. Recent research
uncovered potential problems in different parts of the SZZ algorithm. Within
this article, we provide an extensive empirical analysis of the defect labels
created with the SZZ algorithm. We used a combination of manual validation and
adopted or improved heuristics for the collection of defect data to establish
ground truth data for bug fixing commits, improved the heuristic for the
identification of inducing changes for defects, as well as the assignment of
bugs to releases. We conducted an empirical study on 398 releases of 38 Apache
projects and found that only half of the bug fixing commits determined by SZZ
are actually bug fixing. Moreover, if a six month time frame is used in
combination with SZZ to determine which bugs affect a release, one file is
incorrectly labeled as defective for every file that is correctly labeled as
defective. In addition, two defective files are missed. We also explored the
impact of the relatively small set of features that are available in most
defect prediction data sets, as there are multiple publications that indicate
that, e.g., churn related features are important for defect prediction. We
found that the difference of using more features is negligible.Comment: Submitted and under review. First three authors are equally
contributin
A Novel Self-Supervised Learning-Based Anomaly Node Detection Method Based on an Autoencoder in Wireless Sensor Networks
Due to the issue that existing wireless sensor network (WSN)-based anomaly
detection methods only consider and analyze temporal features, in this paper, a
self-supervised learning-based anomaly node detection method based on an
autoencoder is designed. This method integrates temporal WSN data flow feature
extraction, spatial position feature extraction and intermodal WSN correlation
feature extraction into the design of the autoencoder to make full use of the
spatial and temporal information of the WSN for anomaly detection. First, a
fully connected network is used to extract the temporal features of nodes by
considering a single mode from a local spatial perspective. Second, a graph
neural network (GNN) is used to introduce the WSN topology from a global
spatial perspective for anomaly detection and extract the spatial and temporal
features of the data flows of nodes and their neighbors by considering a single
mode. Then, the adaptive fusion method involving weighted summation is used to
extract the relevant features between different models. In addition, this paper
introduces a gated recurrent unit (GRU) to solve the long-term dependence
problem of the time dimension. Eventually, the reconstructed output of the
decoder and the hidden layer representation of the autoencoder are fed into a
fully connected network to calculate the anomaly probability of the current
system. Since the spatial feature extraction operation is advanced, the
designed method can be applied to the task of large-scale network anomaly
detection by adding a clustering operation. Experiments show that the designed
method outperforms the baselines, and the F1 score reaches 90.6%, which is 5.2%
higher than those of the existing anomaly detection methods based on
unsupervised reconstruction and prediction. Code and model are available at
https://github.com/GuetYe/anomaly_detection/GLS
プログラムの解析、テスト、修復のための表現学習
学位の種別: 課程博士審査委員会委員 : (主査)東京大学特任准教授 松尾 豊, 東京大学教授 和泉 潔, 東京大学准教授 阿部 力也, 東京大学准教授 森 純一郎, 国立情報学研究所教授 蓮尾 一郎University of Tokyo(東京大学
- …