Search CORE

6 research outputs found

Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification

Author: Bui Nghi
Jiang Lingxiao
Yu Yijun
Publication venue: IEEE Computer Society
Publication date: 01/01/2019
Field of study

Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language. To recognise the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognise algorithm classes across languages. We have instantiated the framework with several kinds of token-, tree- and graph-based neural networks that encode and learn various kinds of information in code. We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures. Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated. Our study points to a possible future research direction to tailor bilateral and multilateral neural networks that encode more relevant semantics for code learning, mining and analysis tasks

Crossref

Open Research Online (The Open University)

SAR: Learning Cross-Language API Mappings with Little Knowledge

Author: Ganin Yaroslav
Nguyen Anh Tuan
Nguyen Anh Tuan
Sutskever Ilya
Wang Yaxing
Wang Zhiguo
Xing Chao
Yan S.
Řehůřek Radim
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/06/2019
Field of study

To save effort, developers often translate programs from one programming language to another, instead of implementing it from scratch. Translating application program interfaces (APIs) used in one language to functionally equivalent ones available in another language is an important aspect of program translation. Existing approaches facilitate the translation by automatically identifying the API mappings across programming languages. However, these approaches still require a large number of parallel corpora, ranging from pairs of APIs or code fragments that are functionally equivalent, to similar code comments. To minimize the need for parallel corpora, this paper aims at an automated approach that can map APIs across languages with much less a priori knowledge than other approaches. Our approach is based on a realization of the notion of domain adaption, combined with code embedding, to better align two vector spaces. Taking as input large sets of programs, our approach first generates numeric vector representations of the programs (including the APIs used in each language), and it adapts generative adversarial networks (GAN) to align the vectors in different spaces of two languages. For better alignment, we initialize the GAN with parameters derived from API mapping seeds that can be identified accurately with a simple automatic signature-based matching heuristic. Then the cross-language API mappings can be identified via nearest-neighbors queries in the aligned vector spaces. We have implemented the approach (SAR, named after three main technical components, Seeding, Adversarial training, and Refinement) in a prototype for mapping APIs across Java and C# programs. Our evaluation on about 2 million Java files and 1 million C# files shows that the approach can achieve 48% and 78% mapping accuracy in its top-1 and top-10 API mapping results respectively, with only 174 automatically identified seeds, which is more accurate than other approaches using the same or much more mapping seeds

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Open Research Online (The Open University)

Bilateral dependency neural networks for cross-language algorithm classification

Author: BUI Duy Quoc Nghi
JIANG Lingxiao
YU Yijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2019
Field of study

Institutional Knowledge at Singapore Management University

Novel deep learning methods combined with static analysis for source code processing

Author: BUI Duy Quoc Nghi
Publication venue: Singapore Management University
Publication date: 01/08/2020
Field of study

Institutional Knowledge at Singapore Management University

A framework to manage sensitive information during its migration between software platforms

Author: Ajigini Olusegun Ademolu
Publication venue
Publication date: 01/06/2016
Field of study

Software migrations are mostly performed by organisations using migration teams. Such migration teams need to be aware of how sensitive information ought to be handled and protected during the implementation of the migration projects. There is a need to ensure that sensitive information is identified, classified and protected during the migration process. This thesis suggests how sensitive information in organisations can be handled and protected during migrations by using the migration from proprietary software to open source software to develop a management framework that can be used to manage such a migration process.A rudimentary management framework on information sensitivity during software migrations and a model on the security challenges during open source migrations are utilised to propose a preliminary management framework using a sequential explanatory mixed methods case study. The preliminary management framework resulting from the quantitative data analysis is enhanced and validated to conceptualise the final management framework on information sensitivity during software migrations at the end of the qualitative data analysis. The final management framework is validated and found to be significant, valid and reliable by using statistical techniques like Exploratory Factor Analysis, reliability analysis and multivariate analysis as well as a qualitative coding process.Information ScienceD. Litt. et Phil. (Information Systems

Unisa Institutional Repository

Statistical learning of API mappings for language migration

Author: Chow K.
Koehn P.
Meng S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref