Search CORE

787 research outputs found

Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks

Author: Bui Nghi D. Q.
Jiang Lingxiao
Yu Yijun
Publication venue
Publication date: 29/11/2017
Field of study

Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement a same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision

arXiv.org e-Print Archive

Open Research Online

Recommended from our members

AutoFocus: Interpreting Attention-based Neural Networks by Code Perturbation

Author: Bui Nghi D. Q.
Jiang Lingxiao
Yu Yijun
Publication venue
Publication date
Field of study

Despite being adopted in software engineering tasks, deep neural networks are treated mostly as a black box due to the difficulty in interpreting how the networks infer the outputs from the inputs. To address this problem, we propose AutoFocus, an automated approach for rating and visualizing the importance of input elements based on their effects on the outputs of the networks. The approach is built on our hypotheses that (1) attention mechanisms incorporated into neural networks can generate discriminative scores for various input elements and (2) the discriminative scores reflect the effects of input elements on the outputs of the networks. This paper verifies the hypotheses by applying AutoFocus on the task of algorithm classification (i.e., given a program source code as input, determine the algorithm implemented by the program). AutoFocus identifies and perturbs code elements in a program systematically, and quantifies the effects of the perturbed elements on the network’s classification results. Based on evaluation on more than 1000 programs for 10 different sorting algorithms, we observe that the attention scores are highly correlated to the effects of the perturbed code elements. Such a correlation provides a strong basis for the uses of attention scores to interpret the relations between code elements and the algorithm classification results of a neural network, and we believe that visualizing code elements in an input program ranked according to their attention scores can facilitate faster program comprehension with reduced code

Open Research Online

Comparison of Multi-Criteria Decision Making, Statistics, and Machine Learning Models for Landslide Susceptibility Mapping in Van Yen District, Yen Bai Province, Vietnam

Author: Bui D. P.
Bui D. Q.
Ha H.
Khuc T. D.
Pham T. T. T.
Tran T. H. M.
Tran V. A.
Truong X. Q.
Yordanov V.
Publication venue
Publication date: 01/01/2023
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies

Author: Bui Nghi D. Q.
Dau Anh T. V.
Guo Jin L. C.
Publication venue
Publication date: 02/02/2024
Field of study

Comments within source code are essential for developers to comprehend the code's purpose and ensure its correct usage. However, as codebases evolve, maintaining an accurate alignment between the comments and the code becomes increasingly challenging. Recognizing the growing interest in automated solutions for detecting and correcting differences between code and its accompanying comments, current methods rely primarily on heuristic rules. In contrast, this paper presents DocChecker, a tool powered by deep learning. DocChecker is adept at identifying inconsistencies between code and comments, and it can also generate synthetic comments. This capability enables the tool to detect and correct instances where comments do not accurately reflect their corresponding code segments. We demonstrate the effectiveness of DocChecker using the Just-In-Time and CodeXGlue datasets in different settings. Particularly, DocChecker achieves a new State-of-the-art result of 72.3% accuracy on the Inconsistency Code-Comment Detection (ICCD) task and 33.64 BLEU-4 on the code summarization task against other Large Language Models (LLMs), even surpassing GPT 3.5 and CodeLlama. DocChecker is accessible for use and evaluation. It can be found on our GitHub https://github.com/FSoft-AI4Code/DocChecker and as an Online Tool http://4.193.50.237:5000/. For a more comprehensive understanding of its functionality, a demonstration video is available on YouTube https://youtu.be/FqnPmd531xw

arXiv.org e-Print Archive

Biological activities of in vitro liverwort Marchantia polymorpha L. extracts

Author: BUI Anh L.
PHAN Hoang N.
QUACH Phuong N. D.
TRAN Tan Q.
Publication venue: University of Agricultural Sciences and Veterinary Medicine (UASVM), Cluj-Napoca, Romania
Publication date: 30/06/2020
Field of study

To overcome the problems in liverwort collecting such as small size and easily mixed with other species in the wild, we have successfully cultivated Marchantia polymorpha L. under in vitro conditions in the previous study. The aim of this study is to evaluate the biological activities of this in vitro biomass as a confirmation of the sufficient protocol in cultivation this species. Cultured biomass was dried at a temperature of 45-50 oC to constant weight and ground into a fine powder. The coarse powder was extracted with organic solvents of increasing polarization including n-hexane, chloroform, ethyl acetate, and ethanol using the maceration technique. Four extracts were investigated antioxidant (iron reduction power, DPPH), antibacterial (agar diffusion), tyrosinase inhibitory activity, anti-proliferation on MCF-7 cells. Additionally, the presence of natural metabolite groups of the extracts was detected by using specific reagents. For antioxidant activity, ethyl acetate fraction extract had the highest iron reducing power and DPPH free radical scavenging ability with IC50 = 439.31 µg ml-1. All three n-hexane, chloroform, and ethyl acetate extracts possessed resistance to the bacterial strain tested. At a concentration of 2 mg ml-1, n-hexane and chloroform extracts had the highest percentage of tyrosinase inhibition (69.54 and 69.10%, respectively). The n-hexane extract is a potent extract that inhibits the proliferation of MCF-7 cells with the lowest IC50 of 38.15 µg ml-1. A preliminary chemical composition survey showed that the cultured biomass liverwort contains many bioactive compounds, particularly the compounds of range of non- and less-polarized fractions

Notulae Botanicae Horti Agrobotanici Cluj-Napoca

Preparation and Foliar Application of Oligochitosan - Nanosilica on the Enhancement of Soybean Seed Yield

Author: Du B. D. (Bui)
Hien N. Q. (Nguyen)
Phu D. V. (Dang)
Tam H. V. (Hoang)
Tuan L. N. (Le)
Publication venue: 'Infogain Publication'
Publication date: 01/02/2017
Field of study

Oligochitosan with weight average molecu-lar weight (Mw) of 5000 g/mol was prepared by gamma Co-60 radiation degradation of 4% chitosan solution containing 0.5% H2O2 at 21 kGy. Nanosilica with size of 10 – 30 nm was synthesized by calcination of acid treated rice husk at 700o C for 2 h. The mixture of 2% oligo-chitosan-2% nanosilica was prepared by dispersion of nanosilica in oligochitosan solution. Oligochitosan, nanosilica and their mixture were characterized by gel permeation chromatography (GPC), transmission electr-on microscopy (TEM), X-ray diffraction (XRD), energy dispersive x-ray spectroscopy (EDX), Ultraviolet-visible spectroscopy (UV-Vis), and Furrier transform infrared spectroscopy (FT-IR). Effect of foliar application of oli-gochitosan and oligochitosan-nanosilica on soybean seed yield was conducted in experimental field. Results indi-cated that soybean seed yield increased 10.5 and 17.0% for oligochitosan and oligochitosan-nanosilica, respect-tively for the control. Radiation degraded oligo-chitosan and its mixture with nanosilica can be potentially used for cultivation of soybean with enhanced seed yield

Neliti

Zero-shot Object-Level OOD Detection with Context-Aware Inpainting

Author: Bui Khanh-Huyen
Le Dung D.
Liu Zhenzhen
Nguyen Quang-Huy
Weinberger Kilian Q.
Zhou Jin Peng
Publication venue
Publication date: 06/02/2024
Field of study

Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses an off-the-shelf diffusion model to replace detected objects with inpainting. RONIN conditions the inpainting process with the predicted ID label, drawing the input object closer to the in-distribution domain. As a result, the reconstructed object is very close to the original in the ID cases and far in the OOD cases, allowing RONIN to effectively distinguish ID and OOD samples. Throughout extensive experiments, we demonstrate that RONIN achieves competitive results compared to previous approaches across several datasets, both in zero-shot and non-zero-shot settings

arXiv.org e-Print Archive

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

Author: Bui Nghi D. Q.
Gotmare Akhilesh Deepak
Hoi Steven C. H.
Le Hung
Li Junnan
Wang Yue
Publication venue
Publication date: 31/05/2023
Field of study

Code intelligence plays a key role in transforming modern software engineering. Recently, deep learning-based models, especially Transformer-based large language models (LLMs), have demonstrated remarkable potential in tackling these tasks by leveraging massive open-source code data and programming language features. However, the development and deployment of such models often require expertise in both machine learning and software engineering, creating a barrier for the model adoption. In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. Following the principles of modular design and extensible framework, we design CodeTF with a unified interface to enable rapid access and development across different types of models, datasets and tasks. Our library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes. In this paper, we describe the design principles, the architecture, key modules and components, and compare with other related library tools. Finally, we hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering, providing a comprehensive open-source solution for developers, researchers, and practitioners.Comment: Ongoing work - Draft Previe

arXiv.org e-Print Archive