9 research outputs found

    Towards Automated Classification of Code Review Feedback to Support Analytics

    Full text link
    Background: As improving code review (CR) effectiveness is a priority for many software development organizations, projects have deployed CR analytics platforms to identify potential improvement areas. The number of issues identified, which is a crucial metric to measure CR effectiveness, can be misleading if all issues are placed in the same bin. Therefore, a finer-grained classification of issues identified during CRs can provide actionable insights to improve CR effectiveness. Although a recent work by Fregnan et al. proposed automated models to classify CR-induced changes, we have noticed two potential improvement areas -- i) classifying comments that do not induce changes and ii) using deep neural networks (DNN) in conjunction with code context to improve performances. Aims: This study aims to develop an automated CR comment classifier that leverages DNN models to achieve a more reliable performance than Fregnan et al. Method: Using a manually labeled dataset of 1,828 CR comments, we trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics to classify CR comments into one of the five high-level categories proposed by Turzo and Bosu. Results: Based on our 10-fold cross-validation-based evaluations of multiple combinations of tokenization approaches, we found a model using CodeBERT achieving the best accuracy of 59.3%. Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy. Conclusion: Besides facilitating improved CR analytics, our proposed model can be useful for developers in prioritizing code review feedback and selecting reviewers

    Enriching the exploration of the mUED model with event shape variables at the CERN LHC

    Get PDF
    We propose a new search strategy based on the event shape variables for new physics models where the separations among the masses of the particles in the spectrum are small. Collider signature of these models, characterised by low pTp_T leptons/jets and low missing pTp_T, are known to be difficult to look for. The conventional search strategies involving hard cuts may not work in such situations. As a case study, we have investigated the hitherto neglected jets + missing ETE_T signature -known to be a challenging one - arising from the pair productions and decay of n=1n =1 KK-excitations of gluons and quarks in the minimal Universal Extra Dimension (mUED) model. Judicious use of the event shape variables, enables us to reduce the Standard Model backgrounds to a negligible level . We have shown that in mUED, R1R^{-1} upto 850 GeV850 ~\rm GeV, can be explored or ruled out with 12 fb1^{-1} of integrated luminosity at the 7 TeV run of the LHC. We also discuss the prospects of employing these variables for searching other beyond standard model physics with compressed or partially compressed spectra.Comment: 16 pages, 3 figue

    Efficient Substring Discovery Using Suffix, LCP Array and Algorithm-Architecture Interaction

    No full text
    Preprocessing of database is inevitable to extract information from large databases like biological sequences of gene or protein. Discovery of patterns becomes very time efficient when we preprocess the database in the form suffix array. Due to inherent organization of data in suffix array and it’s secondary data structure longest common prefix (LCP) array (Manber and Myers 1990) only a limited portion of the database is accessed during the searching operation which results in outcome of plenty of information in very less amount of time depending on the size of the database. Unlike exact pattern matching here we preprocess the database instead of pattern. We found suffix and LCP array as a perfect tool to compute N-grams (substring) in various dimensions. Since past couple of decades there has been significant research on construction of suffix and LCP array. Comparatively the research of properly utilizing this prospective data structures to retrieve the substring information from various perspectives have remained almost unfocussed. Our main focus in this work was to develop a number of algorithms for computing present and missing N-grams in a text in linear time and present them non-redundantly for large databases. Finding information of present and missing N-grams and their time efficient non-redundant representation in large genome sequences can lead to new discovery in biology in the future. We have implemented and applied all our algorithms on various genome and proteome sequences and found interesting results. They were also tested for performance and other hardware parameter measurements on various platforms in order to suggest appropriate architecture for this kind of application

    A FAST Pattern Matching Algorithm

    No full text
    The advent of digital computers has made the routine use of pattern-matching possible in various applications.This has also stimulated the development of many algorithms. In this paper, we propose a new algorithm that offers improved performance compared to those reported in the literature so far. The new algorithm has been evolved after analyzing the well-known algorithms such as Boyer-Moore, Quick-search, Raita, and Horspool. The overall performance of the proposed algorithm has been improved using the shift provided by the Quick-search bad-character and by defining a fixed order of comparison. These result in the reduction of the character comparison effort at each attempt. The best- and the worst- case time complexities are also presented in this paper. Most importantly, the proposed method has been compared with the other widely used algorithms. It is interesting to note that the new algorithm works consistently better for any alphabet size

    Национальное государство и процесс глобализации

    Get PDF
    Секция I. Международные отношения, внешняя политика и дипломатия Республики Беларус
    corecore