6 research outputs found

    Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification

    Get PDF
    Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language. To recognise the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognise algorithm classes across languages. We have instantiated the framework with several kinds of token-, tree- and graph-based neural networks that encode and learn various kinds of information in code. We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures. Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated. Our study points to a possible future research direction to tailor bilateral and multilateral neural networks that encode more relevant semantics for code learning, mining and analysis tasks

    SAR: Learning Cross-Language API Mappings with Little Knowledge

    Get PDF
    To save effort, developers often translate programs from one programming language to another, instead of implementing it from scratch. Translating application program interfaces (APIs) used in one language to functionally equivalent ones available in another language is an important aspect of program translation. Existing approaches facilitate the translation by automatically identifying the API mappings across programming languages. However, these approaches still require a large number of parallel corpora, ranging from pairs of APIs or code fragments that are functionally equivalent, to similar code comments. To minimize the need for parallel corpora, this paper aims at an automated approach that can map APIs across languages with much less a priori knowledge than other approaches. Our approach is based on a realization of the notion of domain adaption, combined with code embedding, to better align two vector spaces. Taking as input large sets of programs, our approach first generates numeric vector representations of the programs (including the APIs used in each language), and it adapts generative adversarial networks (GAN) to align the vectors in different spaces of two languages. For better alignment, we initialize the GAN with parameters derived from API mapping seeds that can be identified accurately with a simple automatic signature-based matching heuristic. Then the cross-language API mappings can be identified via nearest-neighbors queries in the aligned vector spaces. We have implemented the approach (SAR, named after three main technical components, Seeding, Adversarial training, and Refinement) in a prototype for mapping APIs across Java and C# programs. Our evaluation on about 2 million Java files and 1 million C# files shows that the approach can achieve 48% and 78% mapping accuracy in its top-1 and top-10 API mapping results respectively, with only 174 automatically identified seeds, which is more accurate than other approaches using the same or much more mapping seeds

    A framework to manage sensitive information during its migration between software platforms

    Get PDF
    Software migrations are mostly performed by organisations using migration teams. Such migration teams need to be aware of how sensitive information ought to be handled and protected during the implementation of the migration projects. There is a need to ensure that sensitive information is identified, classified and protected during the migration process. This thesis suggests how sensitive information in organisations can be handled and protected during migrations by using the migration from proprietary software to open source software to develop a management framework that can be used to manage such a migration process.A rudimentary management framework on information sensitivity during software migrations and a model on the security challenges during open source migrations are utilised to propose a preliminary management framework using a sequential explanatory mixed methods case study. The preliminary management framework resulting from the quantitative data analysis is enhanced and validated to conceptualise the final management framework on information sensitivity during software migrations at the end of the qualitative data analysis. The final management framework is validated and found to be significant, valid and reliable by using statistical techniques like Exploratory Factor Analysis, reliability analysis and multivariate analysis as well as a qualitative coding process.Information ScienceD. Litt. et Phil. (Information Systems

    Statistical learning of API mappings for language migration

    No full text
    corecore