9 research outputs found
Towards Automated Classification of Code Review Feedback to Support Analytics
Background: As improving code review (CR) effectiveness is a priority for
many software development organizations, projects have deployed CR analytics
platforms to identify potential improvement areas. The number of issues
identified, which is a crucial metric to measure CR effectiveness, can be
misleading if all issues are placed in the same bin. Therefore, a finer-grained
classification of issues identified during CRs can provide actionable insights
to improve CR effectiveness. Although a recent work by Fregnan et al. proposed
automated models to classify CR-induced changes, we have noticed two potential
improvement areas -- i) classifying comments that do not induce changes and ii)
using deep neural networks (DNN) in conjunction with code context to improve
performances. Aims: This study aims to develop an automated CR comment
classifier that leverages DNN models to achieve a more reliable performance
than Fregnan et al. Method: Using a manually labeled dataset of 1,828 CR
comments, we trained and evaluated supervised learning-based DNN models
leveraging code context, comment text, and a set of code metrics to classify CR
comments into one of the five high-level categories proposed by Turzo and Bosu.
Results: Based on our 10-fold cross-validation-based evaluations of multiple
combinations of tokenization approaches, we found a model using CodeBERT
achieving the best accuracy of 59.3%. Our approach outperforms Fregnan et al.'s
approach by achieving 18.7% higher accuracy. Conclusion: Besides facilitating
improved CR analytics, our proposed model can be useful for developers in
prioritizing code review feedback and selecting reviewers
Enriching the exploration of the mUED model with event shape variables at the CERN LHC
We propose a new search strategy based on the event shape variables for new
physics models where the separations among the masses of the particles in the
spectrum are small. Collider signature of these models, characterised by low
leptons/jets and low missing , are known to be difficult to look
for. The conventional search strategies involving hard cuts may not work in
such situations. As a case study, we have investigated the hitherto neglected
jets + missing signature -known to be a challenging one - arising from
the pair productions and decay of KK-excitations of gluons and quarks in
the minimal Universal Extra Dimension (mUED) model. Judicious use of the event
shape variables, enables us to reduce the Standard Model backgrounds to a
negligible level . We have shown that in mUED, upto ,
can be explored or ruled out with 12 fb of integrated luminosity at the
7 TeV run of the LHC. We also discuss the prospects of employing these
variables for searching other beyond standard model physics with compressed or
partially compressed spectra.Comment: 16 pages, 3 figue
Efficient Substring Discovery Using Suffix, LCP Array and Algorithm-Architecture Interaction
Preprocessing of database is inevitable to extract information from large databases like biological sequences of gene or protein. Discovery of patterns becomes very time efficient when we preprocess the database in the form suffix array. Due to inherent organization of data in suffix array and it’s secondary data structure longest common prefix (LCP) array (Manber and Myers 1990) only a limited portion of the database is accessed during the searching operation which results in outcome of plenty of information in very less amount of time depending on the size of the database. Unlike exact pattern matching here we preprocess the database instead of pattern. We found suffix and LCP array as a perfect tool to compute N-grams (substring) in various dimensions. Since past couple of decades there has been significant research on construction of suffix and LCP array. Comparatively the research of properly utilizing this prospective data structures to retrieve the substring information from various perspectives have remained almost unfocussed. Our main focus in this work was to develop a number of algorithms for computing present and missing N-grams in a text in linear time and present them non-redundantly for large databases. Finding information of present and missing N-grams and their time efficient non-redundant representation in large genome sequences can lead to new discovery in biology in the future. We have implemented and applied all our algorithms on various genome and proteome sequences and found interesting results. They were also tested for performance and other hardware parameter measurements on various platforms in order to suggest appropriate architecture for this kind of application
A FAST Pattern Matching Algorithm
The advent of digital computers has made the routine use of pattern-matching possible in various applications.This has also stimulated the development of many algorithms. In this paper, we propose a new algorithm that offers improved performance compared to those reported in the literature so far. The new algorithm has been evolved after analyzing the well-known algorithms such as Boyer-Moore, Quick-search, Raita, and Horspool. The overall performance of the proposed algorithm has been improved using the shift provided by the Quick-search bad-character and by defining a fixed order of comparison. These result in the reduction of the character comparison effort at each attempt. The best- and the worst- case time complexities are also presented in this paper. Most importantly, the proposed method has been compared with the other widely used algorithms. It is interesting to note that the new algorithm works consistently better for any alphabet size
Национальное государство и процесс глобализации
Секция I. Международные отношения, внешняя политика и дипломатия Республики Беларус