Search CORE

9 research outputs found

Towards Automated Classification of Code Review Feedback to Support Analytics

Author: Bosu Amiangshu
Faysal Fahim
Iqbal Anindya
Poddar Ovi
Sarker Jaydeb
Turzo Asif Kamal
Publication venue
Publication date: 07/07/2023
Field of study

Background: As improving code review (CR) effectiveness is a priority for many software development organizations, projects have deployed CR analytics platforms to identify potential improvement areas. The number of issues identified, which is a crucial metric to measure CR effectiveness, can be misleading if all issues are placed in the same bin. Therefore, a finer-grained classification of issues identified during CRs can provide actionable insights to improve CR effectiveness. Although a recent work by Fregnan et al. proposed automated models to classify CR-induced changes, we have noticed two potential improvement areas -- i) classifying comments that do not induce changes and ii) using deep neural networks (DNN) in conjunction with code context to improve performances. Aims: This study aims to develop an automated CR comment classifier that leverages DNN models to achieve a more reliable performance than Fregnan et al. Method: Using a manually labeled dataset of 1,828 CR comments, we trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics to classify CR comments into one of the five high-level categories proposed by Turzo and Bosu. Results: Based on our 10-fold cross-validation-based evaluations of multiple combinations of tokenization approaches, we found a model using CodeBERT achieving the best accuracy of 59.3%. Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy. Conclusion: Besides facilitating improved CR analytics, our proposed model can be useful for developers in prioritizing code review feedback and selecting reviewers

arXiv.org e-Print Archive

Enriching the exploration of the mUED model with event shape variables at the CERN LHC

Author: Agashe
Amitava Datta
Anindya Datta
Antoniadis
Antoniadis
Antoniadis
Appelquist
Appelquist
Arkani-Hamed
Arkani-Hamed
Arkani-Hamed
Bandyopadhyay
Battaglia
Bhattacharyya
Bhattacharyya
Bhattacharyya
Bhattacharyya
Bhattacherjee
Bhattacherjee
Bhattacherjee
Bhattacherjee
Bhattacherjee
Bhattacherjee
Bhattacherjee
Buras
Buras
Carone
Chakraverty
Chatrchyan
Cheng
Cheng
Choudhury
Datta
Datta
Dey
Dey
Dienes
Dienes
Essig
Fan
Georgi
Guchait
Hossenfelder
Khachatryan
Kong
LeCompte
Macesanu
Macesanu
Mangano
Muck
Murayama
Nath
Oliver
Puchwein
Pumplin
Randall
Randall
Randall
Rizzo
Rizzo
Servant
Sjostrand
Strumia
Sujoy Poddar
von Gersdorff
Publication venue: 'Elsevier BV'
Publication date: 12/11/2011
Field of study

We propose a new search strategy based on the event shape variables for new physics models where the separations among the masses of the particles in the spectrum are small. Collider signature of these models, characterised by low

p_T

leptons/jets and low missing

p_T

, are known to be difficult to look for. The conventional search strategies involving hard cuts may not work in such situations. As a case study, we have investigated the hitherto neglected jets + missing

E_T

signature -known to be a challenging one - arising from the pair productions and decay of

n =1

KK-excitations of gluons and quarks in the minimal Universal Extra Dimension (mUED) model. Judicious use of the event shape variables, enables us to reduce the Standard Model backgrounds to a negligible level . We have shown that in mUED,

R^{-1}

upto

850 ~\rm GeV

, can be explored or ruled out with 12 fb

^{-1}

of integrated luminosity at the 7 TeV run of the LHC. We also discuss the prospects of employing these variables for searching other beyond standard model physics with compressed or partially compressed spectra.Comment: 16 pages, 3 figue

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Efficient Substring Discovery Using Suffix, LCP Array and Algorithm-Architecture Interaction

Author: Poddar Anindya
Publication venue: LSU Digital Commons
Publication date: 01/01/2011
Field of study

Preprocessing of database is inevitable to extract information from large databases like biological sequences of gene or protein. Discovery of patterns becomes very time efficient when we preprocess the database in the form suffix array. Due to inherent organization of data in suffix array and it’s secondary data structure longest common prefix (LCP) array (Manber and Myers 1990) only a limited portion of the database is accessed during the searching operation which results in outcome of plenty of information in very less amount of time depending on the size of the database. Unlike exact pattern matching here we preprocess the database instead of pattern. We found suffix and LCP array as a perfect tool to compute N-grams (substring) in various dimensions. Since past couple of decades there has been significant research on construction of suffix and LCP array. Comparatively the research of properly utilizing this prospective data structures to retrieve the substring information from various perspectives have remained almost unfocussed. Our main focus in this work was to develop a number of algorithms for computing present and missing N-grams in a text in linear time and present them non-redundantly for large databases. Finding information of present and missing N-grams and their time efficient non-redundant representation in large genome sequences can lead to new discovery in biology in the future. We have implemented and applied all our algorithms on various genome and proteome sequences and found interesting results. They were also tested for performance and other hardware parameter measurements on various platforms in order to suggest appropriate architecture for this kind of application

Louisiana State University

A FAST Pattern Matching Algorithm

Author: Aggarwal Sumit K
Balakrishnan N
Poddar Anindya
Sekar K
Sheik SS
Publication venue: American Chemical Society
Publication date: 26/07/2004
Field of study

The advent of digital computers has made the routine use of pattern-matching possible in various applications.This has also stimulated the development of many algorithms. In this paper, we propose a new algorithm that offers improved performance compared to those reported in the literature so far. The new algorithm has been evolved after analyzing the well-known algorithms such as Boyer-Moore, Quick-search, Raita, and Horspool. The overall performance of the proposed algorithm has been improved using the shift provided by the Quick-search bad-character and by defining a fixed order of comparison. These result in the reduction of the character comparison effort at each attempt. The best- and the worst- case time complexities are also presented in this paper. Most importantly, the proposed method has been compared with the other widely used algorithms. It is interesting to note that the new algorithm works consistently better for any alphabet size

Open Access Repository of IISc Research Publications