20 research outputs found
Algorithmic bias amplification via temporal effects: The case of PageRank in evolving networks
Biases impair the effectiveness of algorithms. For example, the age bias of the widely-used PageRank algorithm impairs its ability to effectively rank nodes in growing networks. PageRank’s temporal bias cannot be fully explained by existing analytic results that predict a linear relation between the expected PageRank score and the indegree of a given node. We show that in evolving networks, under a mean-field approximation, the expected PageRank score of a node can be expressed as the product of the node’s indegree and a previously-neglected age factor which can “amplify” the indegree’s age bias. We use two well-known empirical networks to show that our analytic results explain the observed PageRank’s age bias and, when there is an age bias amplification, they enable estimates of the node PageRank score that are more accurate than estimates based solely on local structural information. Accuracy gains are larger in degree-degree correlated networks, as revealed by a growing directed network model with tunable assortativity. Our approach can be used to analytically study other kinds of ranking bias
Illegal Community Detection in Bitcoin Transaction Networks
Community detection is widely used in social networks to uncover groups of related vertices (nodes). In cryptocurrency transaction networks, community detection can help identify users that are most related to known illegal users. However, there are challenges in applying community detection in cryptocurrency transaction networks: (1) the use of pseudonymous addresses that are not directly linked to personal information make it difficult to interpret the detected communities; (2) on Bitcoin, a user usually owns multiple Bitcoin addresses, and nodes in transaction networks do not always represent users. Existing works on cluster analysis on Bitcoin transaction networks focus on addressing the later using different heuristics to cluster addresses that are controlled by the same user. This research focuses on illegal community detection containing one or more illegal Bitcoin addresses. We first investigate the structure of Bitcoin transaction networks and suitable community detection methods, then collect a set of illegal addresses and use them to label the detected communities. The results show that 0.06% of communities from daily transaction networks contain one or more illegal addresses when 2,313,344 illegal addresses are used to label the communities. The results also show that distance-based clustering methods and other methods depending on them, such as network representation learning, are not suitable for Bitcoin transaction networks while community quality optimization and label-propagation-based methods are the most suitable
Semi-Supervised Semantic Segmentation of Remote Sensing Images Based on Dual Cross-Entropy Consistency
Semantic segmentation is a growing topic in high-resolution remote sensing image processing. The information in remote sensing images is complex, and the effectiveness of most remote sensing image semantic segmentation methods depends on the number of labels; however, labeling images requires significant time and labor costs. To solve these problems, we propose a semi-supervised semantic segmentation method based on dual cross-entropy consistency and a teacher–student structure. First, we add a channel attention mechanism to the encoding network of the teacher model to reduce the predictive entropy of the pseudo label. Secondly, the two student networks share a common coding network to ensure consistent input information entropy, and a sharpening function is used to reduce the information entropy of unsupervised predictions for both student networks. Finally, we complete the alternate training of the models via two entropy-consistent tasks: (1) semi-supervising student prediction results via pseudo-labels generated from the teacher model, (2) cross-supervision between student models. Experimental results on publicly available datasets indicate that the suggested model can fully understand the hidden information in unlabeled images and reduce the information entropy in prediction, as well as reduce the number of required labeled images with guaranteed accuracy. This allows the new method to outperform the related semi-supervised semantic segmentation algorithm at half the proportion of labeled images
A Mobile Malware Detection Method Based on Malicious Subgraphs Mining
As mobile phone is widely used in social network communication, it attracts numerous malicious attacks, which seriously threaten users’ personal privacy and data security. To improve the resilience to attack technologies, structural information analysis has been widely applied in mobile malware detection. However, the rapid improvement of mobile applications has brought an impressive growth of their internal structure in scale and attack technologies. It makes the timely analysis of structural information and malicious feature generation a heavy burden. In this paper, we propose a new Android malware identification approach based on malicious subgraph mining to improve the detection performance of large-scale graph structure analysis. Firstly, function call graphs (FCGs), sensitive permissions, and application programming interfaces (APIs) are generated from the decompiled files of malware. Secondly, two kinds of malicious subgraphs are generated from malware’s decompiled files and put into the feature set. At last, test applications’ safety can be automatically identified and classified into malware families by matching their FCGs with malicious structural features. To evaluate our approach, a dataset of 11,520 malware and benign applications is established. Experimental results indicate that our approach has better performance than three previous works and Androguard
CM-Unet: A Novel Remote Sensing Image Segmentation Method Based on Improved U-Net
Semantic segmentation is an active research area for high-resolution (HR) remote sensing image processing. Most existing algorithms are better at segmenting different features. However, for complex scenes, many algorithms have insufficient segmentation accuracy. In this study, we propose a new method CM-Unet based on the U-Net framework to address the problems of holes, omissions, and fuzzy edge segmentation. First, we add the channel attention mechanism in the encoding network and the residual module to transmit information. Second, a multi-feature fusion mechanism is proposed in the decoding network, and an improved sub-pixel convolution method replaces the traditional upsampling operation. We conducted simulation experiments on the Potsdam, Vaihingen and GID datasets. The experimental results show that the proposed CM-Unet required segmentation time is approximately 62 ms/piece, the MIoU is 90.4% and the floating point operations (FLOPs) is 20.95 MFLOPs. Compared with U-Net, CM-Unet only increased the total number of parameters and floating point operations slightly, but achieved the best segmentation effect compared with the other models. CM-Unet can segment remote sensing images efficiently and accurately owing to its lower time consumption and space requirements; the precision of the segmentation results is better than other methods
Research of Software Defect Prediction Model Based on Complex Network and Graph Neural Network
The goal of software defect prediction is to make predictions by mining the historical data using models. Current software defect prediction models mainly focus on the code features of software modules. However, they ignore the connection between software modules. This paper proposed a software defect prediction framework based on graph neural network from a complex network perspective. Firstly, we consider the software as a graph, where nodes represent the classes, and edges represent the dependencies between the classes. Then, we divide the graph into multiple subgraphs using the community detection algorithm. Thirdly, the representation vectors of the nodes are learned through the improved graph neural network model. Lastly, we use the representation vector of node to classify the software defects. The proposed model is tested on the PROMISE dataset, using two graph convolution methods, based on the spectral domain and spatial domain in the graph neural network. The investigation indicated that both convolution methods showed an improvement in various metrics, such as accuracy, F-measure, and MCC (Matthews correlation coefficient) by 86.6%, 85.8%, and 73.5%, and 87.5%, 85.9%, and 75.5%, respectively. The average improvement of various metrics was noted as 9.0%, 10.5%, and 17.5%, and 6.3%, 7.0%, and 12.1%, respectively, compared with the benchmark models
Semi-Supervised Semantic Segmentation of Remote Sensing Images Based on Dual Cross-Entropy Consistency
Semantic segmentation is a growing topic in high-resolution remote sensing image processing. The information in remote sensing images is complex, and the effectiveness of most remote sensing image semantic segmentation methods depends on the number of labels; however, labeling images requires significant time and labor costs. To solve these problems, we propose a semi-supervised semantic segmentation method based on dual cross-entropy consistency and a teacher–student structure. First, we add a channel attention mechanism to the encoding network of the teacher model to reduce the predictive entropy of the pseudo label. Secondly, the two student networks share a common coding network to ensure consistent input information entropy, and a sharpening function is used to reduce the information entropy of unsupervised predictions for both student networks. Finally, we complete the alternate training of the models via two entropy-consistent tasks: (1) semi-supervising student prediction results via pseudo-labels generated from the teacher model, (2) cross-supervision between student models. Experimental results on publicly available datasets indicate that the suggested model can fully understand the hidden information in unlabeled images and reduce the information entropy in prediction, as well as reduce the number of required labeled images with guaranteed accuracy. This allows the new method to outperform the related semi-supervised semantic segmentation algorithm at half the proportion of labeled images
Depositional system constrained by the high-precision sequence framework and the source to sink systems: A case study from the First Member of the Liushagang Formation in the Weixinan Sag
Objective The exploration direction of the Weixinan Sag in the Beibuwan Basin has shifted from structural traps to lithological traps, and the key problem in searching for lithological traps is to clarify the distribution of sandstones. Methods In this study, the high-frequency sequence division and depositional system of the First Member of the Liushagang Formation in the Weixinan Sag were analyzed using zircon dating, logging, core and seismic data. Results Then, the types and distributions of depositional facies in the First Member of the Liushagang Formation in the Weixinan Sag were clarified. The results show that the First Member of the Liushagang Formation was deposited as a third-order sequence, which can be divided into three system tracts and eight parasequence sets. Based on the analysis of the source to sink systems and sedimentary facies, the sediments in the Weixinan Sag mainly sourced from the Wanshan provenance in the northwest, the Qixi provenance in the east, the Wexinan provenance in the southeast, and the Xinan provenance in the southwest. The First Member of the Liushagang Formation in the Weixinan Sag mainly contains three sedimentary facies types: delta, lacustrine and sublacustrine fan. The lowstand system tract is dominated by mid-deep lake and sublacustrine fan deposits, including turbidity channels, natural levee, and sheet lobes. The expanding system tract mainly contains the sedimentary microfacies of deep lacustrine mud. The highstand system tract consists of front-delta deposition, among which the subaqueous distributary channel and subaqueous distributary interchannel are widely developed, and sedimentary microfacies such as mouth bar and sheet sand are less developed. Conclusion Three types of sublacustrine fan are mainly developed in the B subsag, including the western delta progradation slump type, southern near source fault slope belt type, and eastern far source gentle slope type.Among them, the southern provenance system with large-scale and good reservoir-forming conditions is the most promising target for further exploration