217 research outputs found

    CroLSSim: Cross‐language software similarity detector using hybrid approach of LSA‐based AST‐MDrep features and CNN‐LSTM model

    Get PDF
    Software similarity in different programming codes is a rapidly evolving field because of its numerous applications in software development, software cloning, software plagiarism, and software forensics. Currently, software researchers and developers search cross-language open-source repositories for similar applications for a variety of reasons, such as reusing programming code, analyzing different implementations, and looking for a better application. However, it is a challenging task because each programming language has a unique syntax and semantic structure. In this paper, a novel tool called Cross-Language Software Similarity (CroLSSim) is designed to detect similar software applications written in different programming codes. First, the Abstract Syntax Tree (AST) features are collected from different programming codes. These are high-quality features that can show the abstract view of each program. Then, Methods Description (MDrep) in combination with AST is used to examine the relationship among different method calls. Second, the Term Frequency Inverse Document Frequency approach is used to retrieve the local and global weights from AST-MDrep features. Third, the Latent Semantic Analysis-based features extraction and selection method is proposed to extract the semantic anchors in reduced dimensional space. Fourth, the Convolution Neural Network (CNN)-based features extraction method is proposed to mine the deep features. Finally, a hybrid deep learning model of CNN-Long-Short-Term Memory is designed to detect semantically similar software applications from these latent variables. The data set contains approximately 9.5K Java, 8.8K C#, and 7.4K C++ software applications obtained from GitHub. The proposed approach outperforms as compared with the state-of-the-art methods

    Latent Semantic Indexing (LSI) Based Distributed System and Search On Encrypted Data

    Get PDF
    Latent semantic indexing (LSI) was initially introduced to overcome the issues of synonymy and polysemy of the traditional vector space model (VSM). LSI, however, has challenges of its own, mainly scalability. Despite being introduced in 1990, there are few attempts that provide an efficient solution for LSI, most of the literature is focuses on LSI’s applications rather than improving the original algorithm. In this work we analyze the first framework to provide scalable implementation of LSI and report its performance on the distributed environment of RAAD. The possibility of adopting LSI in the field of searching over encrypted data is also investigated. The importance of that field is stemmed from the need for cloud computing as an effective computing paradigm that provides an affordable access to high computational power. Encryption is usually applied to prevent unauthorized access to the data (the host is assumed to be curious), however this limits accessibility to the data given that search over encryption is yet to catch with the latest techniques adopted by the Information Retrieval (IR) community. In this work we propose a system that uses LSI for indexing and free-query text for retrieving. The results show that the available LSI framework does scale on large datasets, however it had some limitations with respect to factors like dictionary size and memory limit. When replicating the exact settings of the baseline on RAAD, it performed relatively slower. This could be resulted by the fact that RAAD uses a distributed file system or because of network latency. The results also show that the proposed system for applying LSI on encrypted data retrieved documents in the same order as the baseline (unencrypted data)

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    Requirements Traceability: Recovering and Visualizing Traceability Links Between Requirements and Source Code of Object-oriented Software Systems

    Full text link
    Requirements traceability is an important activity to reach an effective requirements management method in the requirements engineering. Requirement-to-Code Traceability Links (RtC-TLs) shape the relations between requirement and source code artifacts. RtC-TLs can assist engineers to know which parts of software code implement a specific requirement. In addition, these links can assist engineers to keep a correct mental model of software, and decreasing the risk of code quality degradation when requirements change with time mainly in large sized and complex software. However, manually recovering and preserving of these TLs puts an additional burden on engineers and is error-prone, tedious, and costly task. This paper introduces YamenTrace, an automatic approach and implementation to recover and visualize RtC-TLs in Object-Oriented software based on Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of YamenTrace is that it exploits all code identifier names, comments, and relations in TLs recovery process. YamenTrace uses LSI to find textual similarity across software code and requirements. While FCA employs to cluster similar code and requirements together. Furthermore, YamenTrace gives a visualization of recovered TLs. To validate YamenTrace, it applied on three case studies. The findings of this evaluation prove the importance and performance of YamenTrace proposal as most of RtC-TLs were correctly recovered and visualized.Comment: 17 pages, 14 figure

    Model analytics and management

    Get PDF

    Model analytics and management

    Get PDF

    Tool-supported identification of functional concerns in object-oriented code

    Get PDF
    Concern identification aims to find the implementation of a functional concern in existing source code. In this work, concerns are described, using the Hierarchic Concern Model, as gray-boxes containing subconcerns, inputs, and outputs. The inputs and outputs are used as concern seeds to identify data-oriented abstractions of concern implementations, called concern skeletons. The identification approach is based on context free language reachability and supported by a tool, called CoDEx

    Definition of Descriptive and Diagnostic Measurements for Model Fragment Retrieval

    Full text link
    Tesis por compendio[ES] Hoy en día, el software existe en casi todo. Las empresas a menudo desarrollan y mantienen colecciones de sistemas de software personalizados que comparten algunas características entre ellos, pero que también tienen otras características particulares. Conforme el número de características y el número de variantes de un producto crece, el mantenimiento del software se vuelve cada vez más complejo. Para hacer frente a esta situación la Comunidad de Ingeniería del Software basada en Modelos está abordando una actividad clave: la Localización de Fragmentos de Modelo. Esta actividad consiste en la identificación de elementos del modelo que son relevantes para un requisito, una característica o un bug. Durante los últimos años se han propuesto muchos enfoques para abordar la identificación de los elementos del modelo que corresponden a una funcionalidad en particular. Sin embargo, existe una carencia a la hora de cómo se reportan las medidas del espacio de búsqueda, así como las medidas de la solución a encontrar. El objetivo de nuestra tesis radica en proporcionar a la comunidad dedicada a la actividad de localización de fragmentos de modelo una serie de medidas (tamaño, volumen, densidad, multiplicidad y dispersión) para reportar los problemas de localización de fragmentos de modelo. El uso de estas novedosas medidas ayuda a los investigadores durante la creación de nuevos enfoques, así como la mejora de aquellos enfoques ya existentes. Mediante el uso de dos casos de estudio reales e industriales, esta tesis pone en valor la importancia de estas medidas para comparar resultados de diferentes enfoques de una manera precisa. Los resultados de este trabajo han sido redactados y publicados en foros, conferencias y revistas especializadas en los temas y contexto de la investigación. Esta tesis se presenta como un compendio de artículos acorde a la regulación de la Universitat Politècnica de València. Este documento de tesis presenta los temas, el contexto y los objetivos de la investigación. Presenta las publicaciones académicas que se han publicado como resultado del trabajo y luego analiza los resultados de la investigación.[CA] Hui en dia, el programari existix en quasi tot. Les empreses sovint desenrotllen i mantenen col·leccions de sistemes de programari personalitzats que compartixen algunes característiques entre ells, però que també tenen altres característiques particulars. Conforme el nombre de característiques i el nombre de variants d'un producte creix, el manteniment del programari es torna cada vegada més complex. Per a fer front a esta situació la Comunitat d'Enginyeria del Programari basada en Models està abordant una activitat clau: la Localització de Fragments de Model. Esta activitat consistix en la identificació d'elements del model que són rellevants per a un requisit, una característica o un bug. Durant els últims anys s'han proposat molts enfocaments per a abordar la identificació dels elements del model que corresponen a una funcionalitat en particular. No obstant això, hi ha una carència a l'hora de com es reporten les mesures de l'espai de busca, així com les mesures de la solució a trobar. L'objectiu de la nostra tesi radica a proporcionar a la comunitat dedicada a l'activitat de localització de fragments de model una sèrie de mesures (grandària, volum, densitat, multiplicitat i dispersió) per a reportar els problemes de localització de fragments de model. L'ús d'estes noves mesures ajuda als investigadors durant la creació de nous enfocaments, així com la millora d'aquells enfocaments ja existents. Per mitjà de l'ús de dos casos d'estudi reals i industrials, esta tesi posa en valor la importància d'estes mesures per a comparar resultats de diferents enfocaments d'una manera precisa. Els resultats d'este treball han sigut redactats i publicats en fòrums, conferències i revistes especialitzades en els temes i context de la investigació. Esta tesi es presenta com un compendi d'articles d'acord amb la regulació de la Universitat Politècnica de València. Este document de tesi presenta els temes, el context i els objectius de la investigació. Presenta les publicacions acadèmiques que s'han publicat com resultat del treball i després analitza els resultats de la investigació.[EN] Nowadays, software exists in almost everything. Companies often develop and maintain a collection of custom-tailored software systems that share some common features but also support customer-specific ones. As the number of features and the number of product variants grows, software maintenance is becoming more and more complex. To keep pace with this situation, Model-Based Software Engineering Community is addressing a key-activity: Model Fragment Location (MFL). MFL aims at identifying model elements that are relevant to a requirement, feature, or bug. Many MFL approaches have been introduced in the last few years to address the identification of the model elements that correspond to a specific functionality. However, there is a lack of detail when the measurements about the search space (models) and the measurements about the solution to be found (model fragment) are reported. The goal of this thesis is to provide insights to MFL Research Community of how to improve the report of location problems. We propose using five measurements (size, volume, density, multiplicity, and dispersion) to report the location problems during MFL. The usage of these novel measurements support researchers during the creation of new MFL approaches and during the improvement of those existing ones. Using two different case studies, both real and industrial, we emphasize the importance of these measurements in order to compare results in a deeply way. The results of the research have been redacted and published in forums, conferences, and journals specialized in the topics and context of the research. This thesis is presented as compendium of articles according the regulations in Universitat Politècnica de València. This thesis document introduces the topics, context, and objectives of the research, presents the academic publications that have been published as a result of the work, and then discusses the outcomes of the investigation.Ballarin Naya, M. (2021). Definition of Descriptive and Diagnostic Measurements for Model Fragment Retrieval [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/171604TESISCompendi

    Software Evolution Understanding: Automatic Extraction of Software Identifiers Map for Object-Oriented Software Systems

    Get PDF
    Software companies usually develop a set of product variants within the same family that share certain functions and differ in others. Variations across software variants occur to meet different customer requirements. Thus, software product variants evolve overtime to cope with new requirements. A software engineer who deals with this family may find it difficult to understand the evolution scenarios that have taken place over time. In addition, software identifier names are important resources to understand the evolution scenarios in this family. This paper introduces an automatic approach called Juana’s approach to detect the evolution scenario across two product variants at the source code level and identifies the common and unique software identifier names across software variants source code. Juana’s approach refers to common and unique identifier names as a software identifiers map and computes it by comparing software variants to each other. Juana considers all software identifier names such as package, class, attribute, and method. The novelty of this approach is that it exploits common and unique identifier names across the source code of software variants, to understand the evolution scenarios across software family in an efficient way. For validity, Juana was applied on ArgoUML and Mobile Media software variants. The results of this evaluation validate the relevance and the performance of the approach as all evolution scenarios were correctly detected via a software identifiers map
    corecore