35 research outputs found
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
SourcererCC: Scaling Code Clone Detection to Big Code
Despite a decade of active research, there is a marked lack in clone
detectors that scale to very large repositories of source code, in particular
for detecting near-miss clones where significant editing activities may take
place in the cloned code. We present SourcererCC, a token-based clone detector
that targets three clone types, and exploits an index to achieve scalability to
large inter-project repositories using a standard workstation. SourcererCC uses
an optimized inverted-index to quickly query the potential clones of a given
code block. Filtering heuristics based on token ordering are used to
significantly reduce the size of the index, the number of code-block
comparisons needed to detect the clones, as well as the number of required
token-comparisons needed to judge a potential clone.
We evaluate the scalability, execution time, recall and precision of
SourcererCC, and compare it to four publicly available and state-of-the-art
tools. To measure recall, we use two recent benchmarks, (1) a large benchmark
of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of
thousands of fine-grained artificial clones. We find SourcererCC has both high
recall and precision, and is able to scale to a large inter-project repository
(250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised
Towards Automating Precision Studies of Clone Detectors
Current research in clone detection suffers from poor ecosystems for
evaluating precision of clone detection tools. Corpora of labeled clones are
scarce and incomplete, making evaluation labor intensive and idiosyncratic, and
limiting inter tool comparison. Precision-assessment tools are simply lacking.
We present a semi-automated approach to facilitate precision studies of clone
detection tools. The approach merges automatic mechanisms of clone
classification with manual validation of clone pairs. We demonstrate that the
proposed automatic approach has a very high precision and it significantly
reduces the number of clone pairs that need human validation during precision
experiments. Moreover, we aggregate the individual effort of multiple teams
into a single evolving dataset of labeled clone pairs, creating an important
asset for software clone research.Comment: Accepted to be published in the 41st ACM/IEEE International
Conference on Software Engineerin
A Systematic Review and Comparative Meta-analysis of Non-destructive Fruit Maturity Detection Techniques
The global fruit industry is growing rapidly due to increased awareness of the health benefits associated with fruit consumption. Fruit maturity detection plays a crucial role in fruit logistics and maintenance, enabling farmers and fruit industries to grade fruits and develop sustainable policies for enhanced profitability and service quality. Non-destructive fruit maturity detection methods have gained significant attention, especially with advancements in machine vision and spectroscopic techniques. This systematic review provides a concise overview of the techniques and algorithms used in fruit quality grading by farmers and industries. The study reviewed 63 full-text articles published between 2012 and 2023 along with their bibliometric analysis. Qualitative analysis revealed that researchers from various disciplines contributed to this field, with techniques falling into 3 categories: machine vision (mathematical modelling or deep learning), spectroscopy and other miscellaneous approaches. There was a high level of diversity among these categories, as indicated by an I-square value of 88.37% in the heterogeneity analysis. Meta-analysis, using odds ratios as the effect measure, established the relationship between techniques and their accuracy. Machine vision showed a positive correlation with accuracy across different categories. Additionally, Egger's and Begg's tests were used to assess publication bias and no strong evidence of its occurrence was found. This study offers valuable insights into the advantages and limitations of various fruit maturity detection techniques. For employing statistical and meta-analytical methods, key factors such as accuracy and sample size have been considered. These findings will aid in the development of effective strategies for fruit quality assessment
Plasmodium falciparum PhIL1-associated complex plays an essential role in merozoite reorientation and invasion of host erythrocytes.
The human malaria parasite, Plasmodium falciparum possesses unique gliding machinery referred to as the glideosome that powers its entry into the insect and vertebrate hosts. Several parasite proteins including Photosensitized INA-labelled protein 1 (PhIL1) have been shown to associate with glideosome machinery. Here we describe a novel PhIL1 associated protein complex that co-exists with the glideosome motor complex in the inner membrane complex of the merozoite. Using an experimental genetics approach, we characterized the role(s) of three proteins associated with PhIL1: a glideosome associated protein- PfGAPM2, an IMC structural protein- PfALV5, and an uncharacterized protein-referred here as PfPhIP (PhIL1 Interacting Protein). Parasites lacking PfPhIP or PfGAPM2 were unable to invade host RBCs. Additionally, the downregulation of PfPhIP resulted in significant defects in merozoite segmentation. Furthermore, the PfPhIP and PfGAPM2 depleted parasites showed abrogation of reorientation/gliding. However, initial attachment with host RBCs was not affected in these parasites. Together, the data presented here show that proteins of the PhIL1-associated complex play an important role in the orientation of P. falciparum merozoites following initial attachment, which is crucial for the formation of a tight junction and hence invasion of host erythrocytes
Comparative Study of RDBMS, NOSQL and Graph Databases
The paper aims at analysis and comparison of various forms of databases particularly computer database Management System (RDBMS), Not solely SQL (NOSQL), Graph Databases. The Structured source language is employed by applications to access computer database systems containing informative during a semi declarative language whereas NOSQL databases area unit supported the key-value pairs. Graph info uses graph structures for resolution queries and to represent and store knowledge
Recommended from our members
Towards Accurate and Scalable Clone Detection using Software Metrics
Code clone detection tools find exact or similar pieces of code, known as code clones. Code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type I) to purely semantic (Type IV). Most clone detectors reported in the literature, work well up to Type III, which accounts for syntactic differences. In between Type III and Type IV, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect { the Twilight Zone. Besides correctness, scalability has become a must-have requirement for modern clone detection tools. The increase in amount of source code in web-hosted open source repository services has presented opportunities to improve the state of the art in various modern use cases of clone detection such as detecting similar mobile applications, license violation detection, mining library candidates, code repair, and code search among others. Though these opportunities are exciting, scaling such vast corpora poses critical challenge.Over the years, many clone detection techniques and tools have been developed. One class of these techniques is based on software metrics. Metrics based clone detection has potential to identify clones in the Twilight Zone. For various reasons, however, metrics-based techniques are hard to scale to large datasets. My work highlights issues which prohibit metric based clone detection techniques to scale large datasets while maintaining high levels of correctness. The identification of these issues allowed me to rethink how metrics could be used for clone Detection. This dissertation starts by presenting an empirical study using software metrics to understand if metrics can be used to identify differences in cloned and non-cloned code. The study is followed by another large scale study to explore the extent of cloning in GitHub. Here, the dissertation highlights scalability challenges in clone detection and how they were addressed. The above two studies provided a strong base to use software metrics for clone detection in a scalable manner. To this end, the dissertation presents Oreo, a novel approach capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. This dissertation evaluates the recall of Oreo on BigCloneBench, a benchmark of real world code clones. In experiments to compare the detection performance of Oreo with other five state of the art clone detectors, we found that Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity, in a scalable manner. Further, to address the issues identified in precision evaluations, the dissertation presents InspectorClone, a semi automated approach to facilitate precision studies of clone detection tools. InspectorClone makes use of some of the concepts introduced in the design of Oreo to automatically resolve different types of clone pairs. Experiments demonstrate that InspectorClone has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, InspectorClone aggregates the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research. Finally, the dissertation concludes with a discussion on the lessons learned during the design and development of Oreo and lists down a few areas for the future work in code clone detection
Recommended from our members
Towards Accurate and Scalable Clone Detection Using Software Metrics
Code clone detection tools find exact or similar pieces of code, known as code clones. Code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type I) to purely semantic (Type IV). Most clone detectors reported in the literature, work well up to Type III, which accounts for syntactic differences. In between Type III and Type IV, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect—the Twilight Zone. Besides correctness, scalability has become a must-have requirement for modern clone detection tools. The increase in amount of source code in web-hosted open source repository services has presented opportunities to improve the state of the art in various modern use cases of clone detection such as detecting similar mobile applications, license violation detection, mining library candidates, code repair, and code search among others. Though these opportunities are exciting, scaling such vast corpora poses critical challenge. Over the years, many clone detection techniques and tools have been developed. One class of these techniques is based on software metrics. Metrics based clone detection has potential to identify clones in the Twilight Zone. For various reasons, however, metrics-based techniques are hard to scale to large datasets. My work highlights issues which prohibit metric based clone detection techniques to scale large datasets while maintaining high levels of correctness. The identification of these issues allowed me to rethink how metrics could be used for clone detection. This dissertation starts by presenting an empirical study using software metrics to understand if metrics can be used to identify differences in cloned and non-cloned code. The study is followed by another large scale study to explore the extent of cloning in GitHub. Here, the dissertation highlights scalability challenges in clone detection and how they were addressed. The above two studies provided a strong base to use software metrics for clone detection in a scalable manner. To this end, the dissertation presents Oreo, a novel approach capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. This dissertation evaluates the recall of Oreo on BigCloneBench, a benchmark of real world code clones. In experiments to compare the detection performance of Oreo with other five state of the art clone detectors, we found that Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity, in a scalable manner. Further, to address the issues identified in precision evaluations, the dissertation presents InspectorClone, a semi automated approach to facilitate precision studies of clone detection tools. InspectorClone makes use of some of the concepts introduced in the design of Oreo to automatically resolve different types of clone pairs. Experiments demonstrate that InspectorClone has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, InspectorClone aggregates the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research. Finally, the dissertation concludes with a discussion on the lessons learned during the design and development of Oreo and lists down a few areas for the future work in code clone detection