165 research outputs found

    SourcererCC: Scaling Code Clone Detection to Big Code

    Full text link
    Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

    CapillaryX: A Software Design Pattern for Analyzing Medical Images in Real-time using Deep Learning

    Get PDF
    Recent advances in digital imaging, e.g., increased number of pixels captured, have meant that the volume of data to be processed and analyzed from these images has also increased. Deep learning algorithms are state-of-the-art for analyzing such images, given their high accuracy when trained with a large data volume of data. Nevertheless, such analysis requires considerable computational power, making such algorithms time- and resource-demanding. Such high demands can be met by using third-party cloud service providers. However, analyzing medical images using such services raises several legal and privacy challenges and does not necessarily provide real-time results. This paper provides a computing architecture that locally and in parallel can analyze medical images in real-time using deep learning thus avoiding the legal and privacy challenges stemming from uploading data to a third-party cloud provider. To make local image processing efficient on modern multi-core processors, we utilize parallel execution to offset the resource-intensive demands of deep neural networks. We focus on a specific medical-industrial case study, namely the quantifying of blood vessels in microcirculation images for which we have developed a working system. It is currently used in an industrial, clinical research setting as part of an e-health application. Our results show that our system is approximately 78% faster than its serial system counterpart and 12% faster than a master-slave parallel system architecture

    CapillaryX: A Software Design Pattern for Analyzing Medical Images in Real-time using Deep Learning

    Get PDF
    Abstract Recent advances in digital imaging, e.g., increased number of pixels captured, have meant that the volume of data to be processed and analyzed from these images has also increased. Deep learning algorithms are state-of-the-art for analyzing such images, given their high accuracy when trained with a large data volume of data. Nevertheless, such analysis requires considerable computational power, making such algorithms time- and resource-demanding. Such high demands can be met by using third-party cloud service providers. However, analyzing medical images using such services raises several legal and privacy challenges and do not necessarily provide real-time results. This paper provides a computing architecture that locally and in parallel can analyze medical images in real-time using deep learning thus avoiding the legal and privacy challenges stemming from uploading data to a third-party cloud provider. To make local image processing efficient on modern multi-core processors, we utilize parallel execution to offset the resource- intensive demands of deep neural networks. We focus on a specific medical-industrial case study, namely the quantifying of blood vessels in microcirculation images for which we have developed a working system. It is currently used in an industrial, clinical research setting as part of an e-health application. Our results show that our system is approximately 78% faster than its serial system counterpart and 12% faster than a master-slave parallel system architecture

    Preserving the Quality of Architectural Tactics in Source Code

    Get PDF
    In any complex software system, strong interdependencies exist between requirements and software architecture. Requirements drive architectural choices while also being constrained by the existing architecture and by what is economically feasible. This makes it advisable to concurrently specify the requirements, to devise and compare alternative architectural design solutions, and ultimately to make a series of design decisions in order to satisfy each of the quality concerns. Unfortunately, anecdotal evidence has shown that architectural knowledge tends to be tacit in nature, stored in the heads of people, and lost over time. Therefore, developers often lack comprehensive knowledge of underlying architectural design decisions and inadvertently degrade the quality of the architecture while performing maintenance activities. In practice, this problem can be addressed through preserving the relationships between the requirements, architectural design decisions and their implementations in the source code, and then using this information to keep developers aware of critical architectural aspects of the code. This dissertation presents a novel approach that utilizes machine learning techniques to recover and preserve the relationships between architecturally significant requirements, architectural decisions and their realizations in the implemented code. Our approach for recovering architectural decisions includes the two primary stages of training and classification. In the first stage, the classifier is trained using code snippets of different architectural decisions collected from various software systems. During this phase, the classifier learns the terms that developers typically use to implement each architectural decision. These ``indicator terms\u27\u27 represent method names, variable names, comments, or the development APIs that developers inevitably use to implement various architectural decisions. A probabilistic weight is then computed for each potential indicator term with respect to each type of architectural decision. The weight estimates how strongly an indicator term represents a specific architectural tactics/decisions. For example, a term such as \emph{pulse} is highly representative of the heartbeat tactic but occurs infrequently in the authentication. After learning the indicator terms, the classifier can compute the likelihood that any given source file implements a specific architectural decision. The classifier was evaluated through several different experiments including classical cross-validation over code snippets of 50 open source projects and on the entire source code of a large scale software system. Results showed that classifier can reliably recognize a wide range of architectural decisions. The technique introduced in this dissertation is used to develop the Archie tool suite. Archie is a plug-in for Eclipse and is designed to detect wide range of architectural design decisions in the code and to protect them from potential degradation during maintenance activities. It has several features for performing change impact analysis of architectural concerns at both the code and design level and proactively keep developers informed of underlying architectural decisions during maintenance activities. Archie is at the stage of technology transfer at the US Department of Homeland Security where it is purely used to detect and monitor security choices. Furthermore, this outcome is integrated into the Department of Homeland Security\u27s Software Assurance Market Place (SWAMP) to advance research and development of secure software systems

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Model analytics and management

    Get PDF

    Model analytics and management

    Get PDF
    • …
    corecore