773 research outputs found
Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks
Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement a same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision
Recommended from our members
AutoFocus: Interpreting Attention-based Neural Networks by Code Perturbation
Despite being adopted in software engineering tasks, deep neural networks are treated mostly as a black box due to the difficulty in interpreting how the networks infer the outputs from the inputs. To address this problem, we propose AutoFocus, an automated approach for rating and visualizing the importance of input elements based on their effects on the outputs of the networks. The approach is built on our hypotheses that (1) attention mechanisms incorporated into neural networks can generate discriminative scores for various input elements and (2) the discriminative scores reflect the effects of input elements on the outputs of the networks. This paper verifies the hypotheses by applying AutoFocus on the task of algorithm classification (i.e., given a program source code as input, determine the algorithm implemented by the program). AutoFocus identifies and perturbs code elements in a program systematically, and quantifies the effects of the perturbed elements on the network’s classification results. Based on evaluation on more than 1000 programs for 10 different sorting algorithms, we observe that the attention scores are highly correlated to the effects of the perturbed code elements. Such a correlation provides a strong basis for the uses of attention scores to interpret the relations between code elements and the algorithm classification results of a neural network, and we believe that visualizing code elements in an input program ranked according to their attention scores can facilitate faster program comprehension with reduced code
Preparation and Foliar Application of Oligochitosan - Nanosilica on the Enhancement of Soybean Seed Yield
Oligochitosan with weight average molecu-lar weight (Mw) of 5000 g/mol was prepared by gamma Co-60 radiation degradation of 4% chitosan solution containing 0.5% H2O2 at 21 kGy. Nanosilica with size of 10 – 30 nm was synthesized by calcination of acid treated rice husk at 700o C for 2 h. The mixture of 2% oligo-chitosan-2% nanosilica was prepared by dispersion of nanosilica in oligochitosan solution. Oligochitosan, nanosilica and their mixture were characterized by gel permeation chromatography (GPC), transmission electr-on microscopy (TEM), X-ray diffraction (XRD), energy dispersive x-ray spectroscopy (EDX), Ultraviolet-visible spectroscopy (UV-Vis), and Furrier transform infrared spectroscopy (FT-IR). Effect of foliar application of oli-gochitosan and oligochitosan-nanosilica on soybean seed yield was conducted in experimental field. Results indi-cated that soybean seed yield increased 10.5 and 17.0% for oligochitosan and oligochitosan-nanosilica, respect-tively for the control. Radiation degraded oligo-chitosan and its mixture with nanosilica can be potentially used for cultivation of soybean with enhanced seed yield
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Code intelligence plays a key role in transforming modern software
engineering. Recently, deep learning-based models, especially Transformer-based
large language models (LLMs), have demonstrated remarkable potential in
tackling these tasks by leveraging massive open-source code data and
programming language features. However, the development and deployment of such
models often require expertise in both machine learning and software
engineering, creating a barrier for the model adoption. In this paper, we
present CodeTF, an open-source Transformer-based library for state-of-the-art
Code LLMs and code intelligence. Following the principles of modular design and
extensible framework, we design CodeTF with a unified interface to enable rapid
access and development across different types of models, datasets and tasks.
Our library supports a collection of pretrained Code LLM models and popular
code benchmarks, including a standardized interface to train and serve code
LLMs efficiently, and data features such as language-specific parsers and
utility functions for extracting code attributes. In this paper, we describe
the design principles, the architecture, key modules and components, and
compare with other related library tools. Finally, we hope CodeTF is able to
bridge the gap between machine learning/generative AI and software engineering,
providing a comprehensive open-source solution for developers, researchers, and
practitioners.Comment: Ongoing work - Draft Previe
Pathogenicity of an H5N1 avian influenza virus isolated in Vietnam in 2012 and reliability of conjunctival samples for diagnosis of infection
The continued spread of highly pathogenic avian influenza virus (HPAIV) subtype H5N1 among poultry in Vietnam poses a potential threat to animals and public health. To evaluate the pathogenicity of a 2012 H5N1 HPAIV isolate and to assess the utility of conjunctival swabs for viral detection and isolation in surveillance, an experimental infection with HPAIV subtype H5N1 was carried out in domestic ducks. Ducks were infected with 10[superscript 7.2] TCID[subscript 50] of A/duck/Vietnam/QB1207/2012 (H5N1), which was isolated from a moribund domestic duck. In the infected ducks, clinical signs of disease, including neurological disorder, were observed. Ducks started to die at 3 days-post-infection (dpi), and the study mortality reached 67%. Viruses were recovered from oropharyngeal and conjunctival swabs until 7 dpi and from cloacal swabs until 4 dpi. In the ducks that died or were sacrificed on 3, 5, or 6 dpi, viruses were recovered from lung, brain, heart, pancreas and intestine, among which the highest virus titers were in the lung, brain or heart. Results of virus titration were confirmed by real-time RT-PCR. Genetic and phylogenetic analysis of the HA gene revealed that the isolate belongs to clade 2.3.2.1 similarly to the H5N1 viruses isolated in Vietnam in 2012. The present study demonstrated that this recent HPAI H5N1 virus of clade 2.3.2.1 could replicate efficiently in the systemic organs, including the brain, and cause severe disease with neurological symptoms in domestic ducks. Therefore, this HPAI H5N1 virus seems to retain the neurotrophic feature and has further developed properties of shedding virus from the oropharynx and conjunctiva in addition to the cloaca, potentially posing a higher risk of virus spread through cross-contact and/or environmental transmission. Continued surveillance and diagnostic programs using conjunctival swabs in the field would further verify the apparent reliability of conjunctival samples for the detection of AIV.Japan Society for the Promotion of Science (Grant-in-Aid for Bilateral Joint Projects)Heiwa Nakajima FoundationNational Institute of Allergy and Infectious Diseases (U.S.) (Contract HHSN2662007000010C
Methane emission factors from vietnamese rice production: Pooling data of 36 field sites for meta-analysis
Rice production is a significant source of greenhouse gas (GHG) emissions in the national budget of many Asian countries, but the extent of emissions varies strongly across agro-environmental zones. It is important to understand these differences in order to improve the national GHG inventory and effectively target mitigation options. This study presents a meta-analysis of CH4 database emission factors (EFs) from 36 field sites across the rice growing areas of Vietnam and covering 73 cropping seasons. The EFs were developed from field measurements using the closed chamber technique. The analysis for calculating baseline EFs in North, Central and South Vietnam in line with the Intergovernmental Panel on Climate Change (IPCC) Tier 2 methodology was specified for the three cropping seasons being early-(E), mid-(M) and late-year (L) seasons. Calculated average CH EFs are given in kg ha d and reflect the distinct seasons in North (E: 2.21; L: 3.89), Central (E: 2.84; M+L: 3.13) and South Vietnam (E: 1.72; M: 2.80; L: 3.58). Derived from the available data of the edapho-hydrological zones of the Mekong River Delta, season-based EFs are more useful than zone-based EFs. In totality, these average EFs indicate an enormous variability of GHG emissions in Vietnamese rice production and represent much higher values than the IPCC default. Seasonal EFs from Vietnam exceeded IPCC defaults given for Southeast Asia corresponding to 160% (E), 240% (M) and 290% (L) of the medium value, respectively
Class based Influence Functions for Error Detection
Influence functions (IFs) are a powerful tool for detecting anomalous
examples in large scale datasets. However, they are unstable when applied to
deep networks. In this paper, we provide an explanation for the instability of
IFs and develop a solution to this problem. We show that IFs are unreliable
when the two data points belong to two different classes. Our solution
leverages class information to improve the stability of IFs. Extensive
experiments show that our modification significantly improves the performance
and stability of IFs while incurring no additional computational cost.Comment: Thang Nguyen-Duc, Hoang Thanh-Tung, and Quan Hung Tran are co-first
authors of this paper. 12 pages, 12 figures. Accepted to ACL 202
Direct frequency comb measurement of OD + CO → DOCO kinetics
The kinetics of the hydroxyl radical (OH) + carbon monoxide (CO) reaction, which is fundamental to both atmospheric and combustion chemistry, are complex because of the formation of the hydrocarboxyl radical (HOCO) intermediate. Despite extensive studies of this reaction, HOCO has not been observed under thermal reaction conditions. Exploiting the sensitive, broadband, and high-resolution capabilities of time-resolved cavity-enhanced direct frequency comb spectroscopy, we observed deuteroxyl radical (OD) + CO reaction kinetics and detected stabilized trans-DOCO, the deuterated analog of trans-HOCO. By simultaneously measuring the time-dependent concentrations of the trans-DOCO and OD species, we observed unambiguous low-pressure termolecular dependence of the reaction rate coefficients for N_2 and CO bath gases. These results confirm the HOCO formation mechanism and quantify its yield
- …