787 research outputs found
Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks
Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement a same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision
Recommended from our members
AutoFocus: Interpreting Attention-based Neural Networks by Code Perturbation
Despite being adopted in software engineering tasks, deep neural networks are treated mostly as a black box due to the difficulty in interpreting how the networks infer the outputs from the inputs. To address this problem, we propose AutoFocus, an automated approach for rating and visualizing the importance of input elements based on their effects on the outputs of the networks. The approach is built on our hypotheses that (1) attention mechanisms incorporated into neural networks can generate discriminative scores for various input elements and (2) the discriminative scores reflect the effects of input elements on the outputs of the networks. This paper verifies the hypotheses by applying AutoFocus on the task of algorithm classification (i.e., given a program source code as input, determine the algorithm implemented by the program). AutoFocus identifies and perturbs code elements in a program systematically, and quantifies the effects of the perturbed elements on the network’s classification results. Based on evaluation on more than 1000 programs for 10 different sorting algorithms, we observe that the attention scores are highly correlated to the effects of the perturbed code elements. Such a correlation provides a strong basis for the uses of attention scores to interpret the relations between code elements and the algorithm classification results of a neural network, and we believe that visualizing code elements in an input program ranked according to their attention scores can facilitate faster program comprehension with reduced code
DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies
Comments within source code are essential for developers to comprehend the
code's purpose and ensure its correct usage. However, as codebases evolve,
maintaining an accurate alignment between the comments and the code becomes
increasingly challenging. Recognizing the growing interest in automated
solutions for detecting and correcting differences between code and its
accompanying comments, current methods rely primarily on heuristic rules. In
contrast, this paper presents DocChecker, a tool powered by deep learning.
DocChecker is adept at identifying inconsistencies between code and comments,
and it can also generate synthetic comments. This capability enables the tool
to detect and correct instances where comments do not accurately reflect their
corresponding code segments. We demonstrate the effectiveness of DocChecker
using the Just-In-Time and CodeXGlue datasets in different settings.
Particularly, DocChecker achieves a new State-of-the-art result of 72.3%
accuracy on the Inconsistency Code-Comment Detection (ICCD) task and 33.64
BLEU-4 on the code summarization task against other Large Language Models
(LLMs), even surpassing GPT 3.5 and CodeLlama.
DocChecker is accessible for use and evaluation. It can be found on our
GitHub https://github.com/FSoft-AI4Code/DocChecker and as an Online Tool
http://4.193.50.237:5000/. For a more comprehensive understanding of its
functionality, a demonstration video is available on YouTube
https://youtu.be/FqnPmd531xw
Biological activities of in vitro liverwort Marchantia polymorpha L. extracts
To overcome the problems in liverwort collecting such as small size and easily mixed with other species in the wild, we have successfully cultivated Marchantia polymorpha L. under in vitro conditions in the previous study. The aim of this study is to evaluate the biological activities of this in vitro biomass as a confirmation of the sufficient protocol in cultivation this species. Cultured biomass was dried at a temperature of 45-50 oC to constant weight and ground into a fine powder. The coarse powder was extracted with organic solvents of increasing polarization including n-hexane, chloroform, ethyl acetate, and ethanol using the maceration technique. Four extracts were investigated antioxidant (iron reduction power, DPPH), antibacterial (agar diffusion), tyrosinase inhibitory activity, anti-proliferation on MCF-7 cells. Additionally, the presence of natural metabolite groups of the extracts was detected by using specific reagents. For antioxidant activity, ethyl acetate fraction extract had the highest iron reducing power and DPPH free radical scavenging ability with IC50 = 439.31 µg ml-1. All three n-hexane, chloroform, and ethyl acetate extracts possessed resistance to the bacterial strain tested. At a concentration of 2 mg ml-1, n-hexane and chloroform extracts had the highest percentage of tyrosinase inhibition (69.54 and 69.10%, respectively). The n-hexane extract is a potent extract that inhibits the proliferation of MCF-7 cells with the lowest IC50 of 38.15 µg ml-1. A preliminary chemical composition survey showed that the cultured biomass liverwort contains many bioactive compounds, particularly the compounds of range of non- and less-polarized fractions
Preparation and Foliar Application of Oligochitosan - Nanosilica on the Enhancement of Soybean Seed Yield
Oligochitosan with weight average molecu-lar weight (Mw) of 5000 g/mol was prepared by gamma Co-60 radiation degradation of 4% chitosan solution containing 0.5% H2O2 at 21 kGy. Nanosilica with size of 10 – 30 nm was synthesized by calcination of acid treated rice husk at 700o C for 2 h. The mixture of 2% oligo-chitosan-2% nanosilica was prepared by dispersion of nanosilica in oligochitosan solution. Oligochitosan, nanosilica and their mixture were characterized by gel permeation chromatography (GPC), transmission electr-on microscopy (TEM), X-ray diffraction (XRD), energy dispersive x-ray spectroscopy (EDX), Ultraviolet-visible spectroscopy (UV-Vis), and Furrier transform infrared spectroscopy (FT-IR). Effect of foliar application of oli-gochitosan and oligochitosan-nanosilica on soybean seed yield was conducted in experimental field. Results indi-cated that soybean seed yield increased 10.5 and 17.0% for oligochitosan and oligochitosan-nanosilica, respect-tively for the control. Radiation degraded oligo-chitosan and its mixture with nanosilica can be potentially used for cultivation of soybean with enhanced seed yield
Zero-shot Object-Level OOD Detection with Context-Aware Inpainting
Machine learning algorithms are increasingly provided as black-box cloud
services or pre-trained models, without access to their training data. This
motivates the problem of zero-shot out-of-distribution (OOD) detection.
Concretely, we aim to detect OOD objects that do not belong to the classifier's
label set but are erroneously classified as in-distribution (ID) objects. Our
approach, RONIN, uses an off-the-shelf diffusion model to replace detected
objects with inpainting. RONIN conditions the inpainting process with the
predicted ID label, drawing the input object closer to the in-distribution
domain. As a result, the reconstructed object is very close to the original in
the ID cases and far in the OOD cases, allowing RONIN to effectively
distinguish ID and OOD samples. Throughout extensive experiments, we
demonstrate that RONIN achieves competitive results compared to previous
approaches across several datasets, both in zero-shot and non-zero-shot
settings
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Code intelligence plays a key role in transforming modern software
engineering. Recently, deep learning-based models, especially Transformer-based
large language models (LLMs), have demonstrated remarkable potential in
tackling these tasks by leveraging massive open-source code data and
programming language features. However, the development and deployment of such
models often require expertise in both machine learning and software
engineering, creating a barrier for the model adoption. In this paper, we
present CodeTF, an open-source Transformer-based library for state-of-the-art
Code LLMs and code intelligence. Following the principles of modular design and
extensible framework, we design CodeTF with a unified interface to enable rapid
access and development across different types of models, datasets and tasks.
Our library supports a collection of pretrained Code LLM models and popular
code benchmarks, including a standardized interface to train and serve code
LLMs efficiently, and data features such as language-specific parsers and
utility functions for extracting code attributes. In this paper, we describe
the design principles, the architecture, key modules and components, and
compare with other related library tools. Finally, we hope CodeTF is able to
bridge the gap between machine learning/generative AI and software engineering,
providing a comprehensive open-source solution for developers, researchers, and
practitioners.Comment: Ongoing work - Draft Previe
- …