Search CORE

4,062 research outputs found

Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification

Author: Bui Nghi
Jiang Lingxiao
Yu Yijun
Publication venue: IEEE Computer Society
Publication date: 01/01/2019
Field of study

Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language. To recognise the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognise algorithm classes across languages. We have instantiated the framework with several kinds of token-, tree- and graph-based neural networks that encode and learn various kinds of information in code. We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures. Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated. Our study points to a possible future research direction to tailor bilateral and multilateral neural networks that encode more relevant semantics for code learning, mining and analysis tasks

Crossref

Open Research Online (The Open University)

Bilateral dependency neural networks for cross-language algorithm classification

Author: BUI Duy Quoc Nghi
JIANG Lingxiao
YU Yijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2019
Field of study

Institutional Knowledge at Singapore Management University

Novel deep learning methods combined with static analysis for source code processing

Author: BUI Duy Quoc Nghi
Publication venue: Singapore Management University
Publication date: 01/08/2020
Field of study

Institutional Knowledge at Singapore Management University

Text Classification: A Review, Empirical, and Experimental Evaluation

Author: Taha Aya
Taha Kamal
Yeun Chan
Yoo Paul D.
Publication venue
Publication date: 11/01/2024
Field of study

The explosive and widespread growth of data necessitates the use of text classification to extract crucial information from vast amounts of data. Consequently, there has been a surge of research in both classical and deep learning text classification methods. Despite the numerous methods proposed in the literature, there is still a pressing need for a comprehensive and up-to-date survey. Existing survey papers categorize algorithms for text classification into broad classes, which can lead to the misclassification of unrelated algorithms and incorrect assessments of their qualities and behaviors using the same metrics. To address these limitations, our paper introduces a novel methodological taxonomy that classifies algorithms hierarchically into fine-grained classes and specific techniques. The taxonomy includes methodology categories, methodology techniques, and methodology sub-techniques. Our study is the first survey to utilize this methodological taxonomy for classifying algorithms for text classification. Furthermore, our study also conducts empirical evaluation and experimental comparisons and rankings of different algorithms that employ the same specific sub-technique, different sub-techniques within the same technique, different techniques within the same category, and categorie

arXiv.org e-Print Archive

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Author: Ball Robyn
Chute Chris
Ciurea-Ilcus Silviana
Haghgoo Behzad
Halabi Safwan S.
Irvin Jeremy
Jones Ricky
Ko Michael
Langlotz Curtis P.
Larson David B.
Lungren Matthew P.
Marklund Henrik
Mong David A.
Ng Andrew Y.
Patel Bhavik N.
Rajpurkar Pranav
Sandberg Jesse K.
Seekins Jayne
Shpanskaya Katie
Yu Yifan
Publication venue
Publication date: 21/01/2019
Field of study

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .Comment: Published in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

TreeCaps: Tree-Structured Capsule Networks for program source code processing

Author: BUI Duy Quoc Nghi
JAYASUNDARA Vinoj
JIANG Lingxiao
LO David
Publication venue
Publication date: 01/12/2019
Field of study

National Research Foundation (NRF) Singapore under its AI Singapore Programm

Institutional Knowledge at Singapore Management University