25 research outputs found

    Uncertainty-wise software anti-patterns detection: A possibilistic evolutionary machine learning approach

    Get PDF
    Context: Code smells (a.k.a. anti-patterns) are manifestations of poor design solutions that can deteriorate software maintainability and evolution. Research gap: Existing works did not take into account the issue of uncertain class labels, which is an important inherent characteristic of the smells detection problem. More precisely, two human experts may have different degrees of uncertainty about the smelliness of a particular software class not only for the smell detection task but also for the smell type identification one. Unluckily, existing approaches usually reject and/or ignore uncertain data that correspond to software classes (i.e. dataset instances) with uncertain labels. Throwing away and/or disregarding the uncertainty factor could considerably degrade the detection/identification process effectiveness. From a solution approach viewpoint, there is no work in the literature that proposed a method that is able to detect and/or identify code smells while preserving the uncertainty aspect. Objective: The main goal of our research work is to handle the uncertainty factor, issued from human experts, in detecting and/or identifying code smells by proposing an evolutionary approach that is able to deal with anti-patterns classification with uncertain labels. Method: We suggest Bi-ADIPOK, as an effective search-based tool that is capable to tackle the previously mentioned challenge for both detection and identification cases. The proposed method corresponds to an EA (Evolutionary Algorithm) that optimizes a set of detectors encoded as PK-NNs (Possibilistic K-nearest neighbors) based on a bi-level hierarchy, in which the upper level role consists on finding the optimal PK-NNs parameters, while the lower level one is to generate the PK-NNs. A newly fitness function has been proposed fitness function PomAURPC-OVA_dist (Possibilistic modified Area Under Recall Precision Curve One-Versus-All_distance, abbreviated PAURPC_d in this paper). Bi-ADIPOK is able to deal with label uncertainty using some concepts stemming from the Possibility Theory. Furthermore, the PomAURPC-OVA_dist is capable to process the uncertainty issue even with imbalanced data. We notice that Bi-ADIPOK is first built and then validated using a possibilistic base of smell examples that simulates and mimics the subjectivity of software engineers opinions. Results: The statistical analysis of the obtained results on a set of comparative experiments with respect to four relevant state-of-the-art methods shows the merits of our proposal. The obtained detection results demonstrate that, for the uncertain environment, the PomAURPC-OVA_dist of Bi-ADIPOK ranges between 0.902 and 0.932 and its IAC lies between 0.9108 and 0.9407, while for the certain environment, the PomAURPC-OVA_dist lies between 0.928 and 0.955 and the IAC ranges between 0.9477 and 0.9622. Similarly, the identification results, for the uncertain environment, indicate that the PomAURPC-OVA_dist of Bi-ADIPOK varies between 0.8576 and 0.9273 and its IAC is between 0.8693 and 0.9318. For the certain environment, the PomAURPC-OVA_dist lies between 0.8613 and 0.9351 and the IAC values are between 0.8672 and 0.9476. With uncertain data, Bi-ADIPOK can find 35% more code smells than the second best approach (i.e., BLOP). Furthermore, Bi-ADIPOK has succeeded to reduce the number of false alarms (i.e., misclassified smelly instances) by 12%. In addition, our proposed approach can identify 43% more smell types than BLOP and reduces the number of false alarms by 32%. The same results have been obtained for the certain environment, demonstrating Bi-ADIPOK's ability to deal with such environment

    An indicator-based chemical reaction optimization algorithm for multi-objective search

    No full text

    Android malware detection as a Bi-level problem

    No full text
    International audienceMalware detection is still a very challenging topic in the cybersecurity field. This is mainly due to the use of obfuscation techniques. To solve this issue, researchers proposed to extract frequent API (Application Programming Interface) call sequences and then use them as behavior indicators. Several methods aiming at generating malware detection rules have been proposed with the goal to come up with a set of rules that is able to accurately detect malicious code patterns. However, the rules generation process heavily depends on the training database content which will affect the detection rate of the model when confronted to new variants of malicious patterns. In order to assess a rule's detection accuracy, we need to execute the rule on the whole malware database which makes the detection rule quality evaluation very sensitive to the database content. To solve this issue, we suggest in this paper to consider the detection rules generation process as a BLOP (Bi-Level Optimization Problem), where a lower-level optimization task is embedded within the upper-level one. The goal of the upper-level is to generate a set of detection rules in the form of: trees of combined patterns. Those rules are able to detect not only the real patterns from the base of examples but also the artificial patterns generated by the lower-level. The lower-level aims to generate a set of artificial malicious patterns that escape the rules of the upper-level. An efficient co-evolutionary algorithm is adopted as a search engine to ensure optimization at both levels. Such an automated competition between the two levels makes our new method BMD (Bi-level Malware Detection) able to produce effective detection rules that are capable of detecting new predictable malicious behaviors in addition to existing ones. Based on the statistical analysis of the experimental results, our BMD method has shown its merits when compared to several relevant state-of-the-art malware detection techniques on different Android malware datasets

    Code smell detection and identification in imbalanced environments: .

    No full text
    International audienceContext: Code smells are sub-optimal design choices that could lower software maintainability.Objective: Previous literature did not consider an important characteristic of the smell detection problem,namely data imbalance. When considering a high number of code smell types, the number of smelly classesis likely to largely exceed the number of non-smelly ones, and vice versa. Moreover, most studies did address the smell identification problem, which is more likely to present a higher imbalance as the number of smellyclasses is relatively much less than the number of non-smelly ones. Furthermore, an additional research gap in the literature consists in the fact that the number of smell type identification methods is very small comparedto the detection ones

    Code smell detection and identification in imbalanced environments: .

    No full text
    International audienceContext: Code smells are sub-optimal design choices that could lower software maintainability.Objective: Previous literature did not consider an important characteristic of the smell detection problem,namely data imbalance. When considering a high number of code smell types, the number of smelly classesis likely to largely exceed the number of non-smelly ones, and vice versa. Moreover, most studies did address the smell identification problem, which is more likely to present a higher imbalance as the number of smellyclasses is relatively much less than the number of non-smelly ones. Furthermore, an additional research gap in the literature consists in the fact that the number of smell type identification methods is very small comparedto the detection ones

    Automatic Mining of Numerical Classification Rules with Parliamentary Optimization Algorithm

    No full text
    In recent years, classification rules mining has been one of the most important data mining tasks. In this study, one of the newest social-based metaheuristic methods, Parliamentary Optimization Algorithm (POA), is firstly used for automatically mining of comprehensible and accurate classification rules within datasets which have numerical attributes. Four different numerical datasets have been selected from UCI data warehouse and classification rules of high quality have been obtained. Furthermore, the results obtained from designed POA have been compared with the results obtained from four different popular classification rules mining algorithms used in WEKA. Although POA is very new and no applications in complex data mining problems have been performed, the results seem promising. The used objective function is very flexible and many different objectives can easily be added to. The intervals of the numerical attributes in the rules have been automatically found without any a priori process, as done in other classification rules mining algorithms, which causes the modification of datasets
    corecore