787 research outputs found

    Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks

    Get PDF
    Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement a same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision

    DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies

    Full text link
    Comments within source code are essential for developers to comprehend the code's purpose and ensure its correct usage. However, as codebases evolve, maintaining an accurate alignment between the comments and the code becomes increasingly challenging. Recognizing the growing interest in automated solutions for detecting and correcting differences between code and its accompanying comments, current methods rely primarily on heuristic rules. In contrast, this paper presents DocChecker, a tool powered by deep learning. DocChecker is adept at identifying inconsistencies between code and comments, and it can also generate synthetic comments. This capability enables the tool to detect and correct instances where comments do not accurately reflect their corresponding code segments. We demonstrate the effectiveness of DocChecker using the Just-In-Time and CodeXGlue datasets in different settings. Particularly, DocChecker achieves a new State-of-the-art result of 72.3% accuracy on the Inconsistency Code-Comment Detection (ICCD) task and 33.64 BLEU-4 on the code summarization task against other Large Language Models (LLMs), even surpassing GPT 3.5 and CodeLlama. DocChecker is accessible for use and evaluation. It can be found on our GitHub https://github.com/FSoft-AI4Code/DocChecker and as an Online Tool http://4.193.50.237:5000/. For a more comprehensive understanding of its functionality, a demonstration video is available on YouTube https://youtu.be/FqnPmd531xw

    Biological activities of in vitro liverwort Marchantia polymorpha L. extracts

    Get PDF
    To overcome the problems in liverwort collecting such as small size and easily mixed with other species in the wild, we have successfully cultivated Marchantia polymorpha L. under in vitro conditions in the previous study. The aim of this study is to evaluate the biological activities of this in vitro biomass as a confirmation of the sufficient protocol in cultivation this species. Cultured biomass was dried at a temperature of 45-50 oC to constant weight and ground into a fine powder. The coarse powder was extracted with organic solvents of increasing polarization including n-hexane, chloroform, ethyl acetate, and ethanol using the maceration technique. Four extracts were investigated antioxidant (iron reduction power, DPPH), antibacterial (agar diffusion), tyrosinase inhibitory activity, anti-proliferation on MCF-7 cells. Additionally, the presence of natural metabolite groups of the extracts was detected by using specific reagents. For antioxidant activity, ethyl acetate fraction extract had the highest iron reducing power and DPPH free radical scavenging ability with IC50 = 439.31 µg ml-1. All three n-hexane, chloroform, and ethyl acetate extracts possessed resistance to the bacterial strain tested. At a concentration of 2 mg ml-1, n-hexane and chloroform extracts had the highest percentage of tyrosinase inhibition (69.54 and 69.10%, respectively). The n-hexane extract is a potent extract that inhibits the proliferation of MCF-7 cells with the lowest IC50 of 38.15 µg ml-1. A preliminary chemical composition survey showed that the cultured biomass liverwort contains many bioactive compounds, particularly the compounds of range of non- and less-polarized fractions

    Preparation and Foliar Application of Oligochitosan - Nanosilica on the Enhancement of Soybean Seed Yield

    Full text link
    Oligochitosan with weight average molecu-lar weight (Mw) of 5000 g/mol was prepared by gamma Co-60 radiation degradation of 4% chitosan solution containing 0.5% H2O2 at 21 kGy. Nanosilica with size of 10 – 30 nm was synthesized by calcination of acid treated rice husk at 700o C for 2 h. The mixture of 2% oligo-chitosan-2% nanosilica was prepared by dispersion of nanosilica in oligochitosan solution. Oligochitosan, nanosilica and their mixture were characterized by gel permeation chromatography (GPC), transmission electr-on microscopy (TEM), X-ray diffraction (XRD), energy dispersive x-ray spectroscopy (EDX), Ultraviolet-visible spectroscopy (UV-Vis), and Furrier transform infrared spectroscopy (FT-IR). Effect of foliar application of oli-gochitosan and oligochitosan-nanosilica on soybean seed yield was conducted in experimental field. Results indi-cated that soybean seed yield increased 10.5 and 17.0% for oligochitosan and oligochitosan-nanosilica, respect-tively for the control. Radiation degraded oligo-chitosan and its mixture with nanosilica can be potentially used for cultivation of soybean with enhanced seed yield

    Zero-shot Object-Level OOD Detection with Context-Aware Inpainting

    Full text link
    Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses an off-the-shelf diffusion model to replace detected objects with inpainting. RONIN conditions the inpainting process with the predicted ID label, drawing the input object closer to the in-distribution domain. As a result, the reconstructed object is very close to the original in the ID cases and far in the OOD cases, allowing RONIN to effectively distinguish ID and OOD samples. Throughout extensive experiments, we demonstrate that RONIN achieves competitive results compared to previous approaches across several datasets, both in zero-shot and non-zero-shot settings

    CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

    Full text link
    Code intelligence plays a key role in transforming modern software engineering. Recently, deep learning-based models, especially Transformer-based large language models (LLMs), have demonstrated remarkable potential in tackling these tasks by leveraging massive open-source code data and programming language features. However, the development and deployment of such models often require expertise in both machine learning and software engineering, creating a barrier for the model adoption. In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. Following the principles of modular design and extensible framework, we design CodeTF with a unified interface to enable rapid access and development across different types of models, datasets and tasks. Our library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes. In this paper, we describe the design principles, the architecture, key modules and components, and compare with other related library tools. Finally, we hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering, providing a comprehensive open-source solution for developers, researchers, and practitioners.Comment: Ongoing work - Draft Previe
    corecore