Search CORE

35 research outputs found

Stack Overflow in Github: Any Snippets There?

Author: Lopes Cristina
Martins Pedro
Saini Vaibhav
Yang Di
Publication venue
Publication date: 02/05/2017
Field of study

When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11 page

arXiv.org e-Print Archive

Crossref

SourcererCC: Scaling Code Clone Detection to Big Code

Author: Lopes Cristina V.
Roy Chanchal K.
Saini Vaibhav
Sajnani Hitesh
Svajlenko Jeffrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/12/2015
Field of study

Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

arXiv.org e-Print Archive

Crossref

Towards Automating Precision Studies of Clone Detectors

Author: Baldi Pierre
Farmahinifarahani Farima
Lopes Cristina
Lu Yadong
Martins Pedro
Saini Vaibhav
Sajnani Hitesh
Yang Di
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2018
Field of study

Current research in clone detection suffers from poor ecosystems for evaluating precision of clone detection tools. Corpora of labeled clones are scarce and incomplete, making evaluation labor intensive and idiosyncratic, and limiting inter tool comparison. Precision-assessment tools are simply lacking. We present a semi-automated approach to facilitate precision studies of clone detection tools. The approach merges automatic mechanisms of clone classification with manual validation of clone pairs. We demonstrate that the proposed automatic approach has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, we aggregate the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research.Comment: Accepted to be published in the 41st ACM/IEEE International Conference on Software Engineerin

arXiv.org e-Print Archive

Crossref

A Systematic Review and Comparative Meta-analysis of Non-destructive Fruit Maturity Detection Techniques

Author: Bamel Kiran
Bhatt Vaibhav
Garg Savita
Mishra Shashvat Kumar
Parmar Saloni
Rani Neetu
Saini Nitesh
Sharma Sourabh
Publication venue: Horizon e-Publishing Group
Publication date: 14/01/2024
Field of study

The global fruit industry is growing rapidly due to increased awareness of the health benefits associated with fruit consumption. Fruit maturity detection plays a crucial role in fruit logistics and maintenance, enabling farmers and fruit industries to grade fruits and develop sustainable policies for enhanced profitability and service quality. Non-destructive fruit maturity detection methods have gained significant attention, especially with advancements in machine vision and spectroscopic techniques. This systematic review provides a concise overview of the techniques and algorithms used in fruit quality grading by farmers and industries. The study reviewed 63 full-text articles published between 2012 and 2023 along with their bibliometric analysis. Qualitative analysis revealed that researchers from various disciplines contributed to this field, with techniques falling into 3 categories: machine vision (mathematical modelling or deep learning), spectroscopy and other miscellaneous approaches. There was a high level of diversity among these categories, as indicated by an I-square value of 88.37% in the heterogeneity analysis. Meta-analysis, using odds ratios as the effect measure, established the relationship between techniques and their accuracy. Machine vision showed a positive correlation with accuracy across different categories. Additionally, Egger's and Begg's tests were used to assess publication bias and no strong evidence of its occurrence was found. This study offers valuable insights into the advantages and limitations of various fruit maturity detection techniques. For employing statistical and meta-analytical methods, key factors such as accuracy and sample size have been considered. These findings will aid in the development of effective strategies for fruit quality assessment

Horizon e-Publishing Group (HePG): E-Journals

Plasmodium falciparum PhIL1-associated complex plays an essential role in merozoite reorientation and invasion of host erythrocytes.

Author: Agrawal Prakhar
Kaur Inderjeet
Malhotra Pawan
Mohmmed Asif
Saini Ekta
Sharma Vaibhav
Sheokand Pradeep Kumar
Singh Shailja
Publication venue: PLoS Pathog
Publication date: 01/07/2021
Field of study

The human malaria parasite, Plasmodium falciparum possesses unique gliding machinery referred to as the glideosome that powers its entry into the insect and vertebrate hosts. Several parasite proteins including Photosensitized INA-labelled protein 1 (PhIL1) have been shown to associate with glideosome machinery. Here we describe a novel PhIL1 associated protein complex that co-exists with the glideosome motor complex in the inner membrane complex of the merozoite. Using an experimental genetics approach, we characterized the role(s) of three proteins associated with PhIL1: a glideosome associated protein- PfGAPM2, an IMC structural protein- PfALV5, and an uncharacterized protein-referred here as PfPhIP (PhIL1 Interacting Protein). Parasites lacking PfPhIP or PfGAPM2 were unable to invade host RBCs. Additionally, the downregulation of PfPhIP resulted in significant defects in merozoite segmentation. Furthermore, the PfPhIP and PfGAPM2 depleted parasites showed abrogation of reorientation/gliding. However, initial attachment with host RBCs was not affected in these parasites. Together, the data presented here show that proteins of the PhIL1-associated complex play an important role in the orientation of P. falciparum merozoites following initial attachment, which is crucial for the formation of a tight junction and hence invasion of host erythrocytes

Enlighten

Apollo (Cambridge)

Comparative Study of RDBMS, NOSQL and Graph Databases

Author: Pandey Dr. Mrinal
Sachdeva Mr. Vaibhav
Saini Mr. Prince
Sharma Mr. Sahil
Publication venue: Journal of Computer Science Engineering and Software Testing (e-ISSN: 2581-6969)
Publication date: 03/10/2018
Field of study

The paper aims at analysis and comparison of various forms of databases particularly computer database Management System (RDBMS), Not solely SQL (NOSQL), Graph Databases. The Structured source language is employed by applications to access computer database systems containing informative during a semi declarative language whereas NOSQL databases area unit supported the key-value pairs. Graph info uses graph structures for resolution queries and to represent and store knowledge

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

MAT Journals

Inter-Relationship between Economic Growth, Savings and Inflation in Asia

Author: ? The World Bank
A S Deaton
Archana Dholakia
B B Aghevli
B Heer
B Motley
Brajesh Kumar
D Romer
G Mallik
J A Fabayo
J H Haslag
J Krieckhaus
J M Page
K G Saini
M ? Gillman
M Cardenas
M S Khan
M Sidrauski
M W Rosegrant
N ? Roubini
N Loayza
P ? Honohan
P C Athukorala
R Barro
R Grier
R H Dholakia
R H Dholakia
Ravindra H. Dholakia
S ? Jacobzone
S B Kaplan
S Edwards
S Fischer
S Paul
S Tuljapurkar
T ? Gylfason
V V Chari
Vaibhav Chaturvedi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

Crossref

Recommended from our members

Towards Accurate and Scalable Clone Detection using Software Metrics

Author: Saini Vaibhav Pratap Singh
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Code clone detection tools find exact or similar pieces of code, known as code clones. Code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type I) to purely semantic (Type IV). Most clone detectors reported in the literature, work well up to Type III, which accounts for syntactic differences. In between Type III and Type IV, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect { the Twilight Zone. Besides correctness, scalability has become a must-have requirement for modern clone detection tools. The increase in amount of source code in web-hosted open source repository services has presented opportunities to improve the state of the art in various modern use cases of clone detection such as detecting similar mobile applications, license violation detection, mining library candidates, code repair, and code search among others. Though these opportunities are exciting, scaling such vast corpora poses critical challenge.Over the years, many clone detection techniques and tools have been developed. One class of these techniques is based on software metrics. Metrics based clone detection has potential to identify clones in the Twilight Zone. For various reasons, however, metrics-based techniques are hard to scale to large datasets. My work highlights issues which prohibit metric based clone detection techniques to scale large datasets while maintaining high levels of correctness. The identification of these issues allowed me to rethink how metrics could be used for clone Detection. This dissertation starts by presenting an empirical study using software metrics to understand if metrics can be used to identify differences in cloned and non-cloned code. The study is followed by another large scale study to explore the extent of cloning in GitHub. Here, the dissertation highlights scalability challenges in clone detection and how they were addressed. The above two studies provided a strong base to use software metrics for clone detection in a scalable manner. To this end, the dissertation presents Oreo, a novel approach capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. This dissertation evaluates the recall of Oreo on BigCloneBench, a benchmark of real world code clones. In experiments to compare the detection performance of Oreo with other five state of the art clone detectors, we found that Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity, in a scalable manner. Further, to address the issues identified in precision evaluations, the dissertation presents InspectorClone, a semi automated approach to facilitate precision studies of clone detection tools. InspectorClone makes use of some of the concepts introduced in the design of Oreo to automatically resolve different types of clone pairs. Experiments demonstrate that InspectorClone has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, InspectorClone aggregates the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research. Finally, the dissertation concludes with a discussion on the lessons learned during the design and development of Oreo and lists down a few areas for the future work in code clone detection

eScholarship - University of California

Recommended from our members

Towards Accurate and Scalable Clone Detection Using Software Metrics

Author: Saini Vaibhav Pratap Singh
Publication venue: 'University of California, Irvine'
Publication date: 01/01/2018
Field of study

Code clone detection tools find exact or similar pieces of code, known as code clones. Code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type I) to purely semantic (Type IV). Most clone detectors reported in the literature, work well up to Type III, which accounts for syntactic differences. In between Type III and Type IV, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect—the Twilight Zone. Besides correctness, scalability has become a must-have requirement for modern clone detection tools. The increase in amount of source code in web-hosted open source repository services has presented opportunities to improve the state of the art in various modern use cases of clone detection such as detecting similar mobile applications, license violation detection, mining library candidates, code repair, and code search among others. Though these opportunities are exciting, scaling such vast corpora poses critical challenge. Over the years, many clone detection techniques and tools have been developed. One class of these techniques is based on software metrics. Metrics based clone detection has potential to identify clones in the Twilight Zone. For various reasons, however, metrics-based techniques are hard to scale to large datasets. My work highlights issues which prohibit metric based clone detection techniques to scale large datasets while maintaining high levels of correctness. The identification of these issues allowed me to rethink how metrics could be used for clone detection. This dissertation starts by presenting an empirical study using software metrics to understand if metrics can be used to identify differences in cloned and non-cloned code. The study is followed by another large scale study to explore the extent of cloning in GitHub. Here, the dissertation highlights scalability challenges in clone detection and how they were addressed. The above two studies provided a strong base to use software metrics for clone detection in a scalable manner. To this end, the dissertation presents Oreo, a novel approach capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. This dissertation evaluates the recall of Oreo on BigCloneBench, a benchmark of real world code clones. In experiments to compare the detection performance of Oreo with other five state of the art clone detectors, we found that Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity, in a scalable manner. Further, to address the issues identified in precision evaluations, the dissertation presents InspectorClone, a semi automated approach to facilitate precision studies of clone detection tools. InspectorClone makes use of some of the concepts introduced in the design of Oreo to automatically resolve different types of clone pairs. Experiments demonstrate that InspectorClone has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, InspectorClone aggregates the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research. Finally, the dissertation concludes with a discussion on the lessons learned during the design and development of Oreo and lists down a few areas for the future work in code clone detection

eScholarship - University of California

ProQuest OAI Repository