Search CORE

16,227 research outputs found

Building Program Vector Representations for Deep Learning

Author: Jin Zhi
Li Ge
Liu Yuxuan
Mou Lili
Peng Hao
Xu Yan
Zhang Lu
Publication venue
Publication date: 11/09/2014
Field of study

Deep learning has made significant breakthroughs in various fields of artificial intelligence. Advantages of deep learning include the ability to capture highly complicated features, weak involvement of human engineering, etc. However, it is still virtually impossible to use deep learning to analyze programs since deep architectures cannot be trained effectively with pure back propagation. In this pioneering paper, we propose the "coding criterion" to build program vector representations, which are the premise of deep learning for program analysis. Our representation learning approach directly makes deep learning a reality in this new field. We evaluate the learned vector representations both qualitatively and quantitatively. We conclude, based on the experiments, the coding criterion is successful in building program representations. To evaluate whether deep learning is beneficial for program analysis, we feed the representations to deep neural networks, and achieve higher accuracy in the program classification task than "shallow" methods, such as logistic regression and the support vector machine. This result confirms the feasibility of deep learning to analyze programs. It also gives primary evidence of its success in this new field. We believe deep learning will become an outstanding technique for program analysis in the near future.Comment: This paper was submitted to ICSE'1

arXiv.org e-Print Archive

CiteSeerX

Crossref

Clones in Graphs

Author: A Gély
B Ganter
D Borchmann
David Lusseau
OL Mangasarian
P Gleiser
R Medina
R Wille
RR Faulkner
S Wasserman
T Opsahl
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2018
Field of study

Finding structural similarities in graph data, like social networks, is a far-ranging task in data mining and knowledge discovery. A (conceptually) simple reduction would be to compute the automorphism group of a graph. However, this approach is ineffective in data mining since real world data does not exhibit enough structural regularity. Here we step in with a novel approach based on mappings that preserve the maximal cliques. For this we exploit the well known correspondence between bipartite graphs and the data structure formal context

(G,M,I)

from Formal Concept Analysis. From there we utilize the notion of clone items. The investigation of these is still an open problem to which we add new insights with this work. Furthermore, we produce a substantial experimental investigation of real world data. We conclude with demonstrating the generalization of clone items to permutations.Comment: 11 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

International conference on software engineering and knowledge engineering: Session chair

Author: Bosu Michael Franklin
Publication venue
Publication date: 01/07/2018
Field of study

The Thirtieth International Conference on Software Engineering and Knowledge Engineering (SEKE 2018) will be held at the Hotel Pullman, San Francisco Bay, USA, from July 1 to July 3, 2018. SEKE2018 will also be dedicated in memory of Professor Lofti Zadeh, a great scholar, pioneer and leader in fuzzy sets theory and soft computing. The conference aims at bringing together experts in software engineering and knowledge engineering to discuss on relevant results in either software engineering or knowledge engineering or both. Special emphasis will be put on the transference of methods between both domains. The theme this year is soft computing in software engineering & knowledge engineering. Submission of papers and demos are both welcome

Wintec Research Archive

Clone Detection and Elimination for Haskell

Author: Brown Christopher
Thompson Simon
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Duplicated code is a well known problem in software maintenance and refactoring. Code clones tend to increase program size and several studies have shown that duplicated code makes maintenance and code understanding more complex and time consuming. This paper presents a new technique for the detection and removal of duplicated Haskell code. The system is implemented within the refactoring framework of the Haskell Refactorer (HaRe), and uses an Abstract Syntax Tree (AST) based approach. Detection of duplicate code is automatic, while elimination is semi-automatic, with the user managing the clone removal. After presenting the system, an example is given to show how it works in practice

Kent Academic Repository

University of St. Andrews - Pure

RNA Viral Community in Human Feces: Prevalence of Plant Pathogenic Viruses

Author: Altschul
Altschul
Andrews
Backhed
Banyai
Breitbart
Breitbart
Castello
Chandra
Chia Lin Wei
Clark
Craggs
Dolin
Dolin
Edison T Liu
Ewing
Ewing
Forest Rohwer
Glass
Gorski
Griffin
Hogenhout
Hwang
Jagadish
Jeffrey Dangl
Jin-Quan Run
Lewandowski
Macpherson
Martin L Hibberd
Middleton
Murray
Mya Breitbart
Nicholson
Noble
Osawa
Pereira
Peyrefitte
Power
Purvis
Rogers
Rohwer
Rosen
Rubinstein
Shirlena Wee Ling Soh
Tao Zhang
Tringe
Tyson
Venter
Wah Heng Lee
Weinbauer
Wetter
Wilhelmi
Xi
Xu
Yijun Ruan
Yusibov
Publication venue: Public Library of Science
Publication date: 01/12/2005
Field of study

The human gut is known to be a reservoir of a wide variety of microbes, including viruses. Many RNA viruses are known to be associated with gastroenteritis; however, the enteric RNA viral community present in healthy humans has not been described. Here, we present a comparative metagenomic analysis of the RNA viruses found in three fecal samples from two healthy human individuals. For this study, uncultured viruses were concentrated by tangential flow filtration, and viral RNA was extracted and cloned into shotgun viral cDNA libraries for sequencing analysis. The vast majority of the 36,769 viral sequences obtained were similar to plant pathogenic RNA viruses. The most abundant fecal virus in this study was pepper mild mottle virus (PMMV), which was found in high concentrations—up to 10(9) virions per gram of dry weight fecal matter. PMMV was also detected in 12 (66.7%) of 18 fecal samples collected from healthy individuals on two continents, indicating that this plant virus is prevalent in the human population. A number of pepper-based foods tested positive for PMMV, suggesting dietary origins for this virus. Intriguingly, the fecal PMMV was infectious to host plants, suggesting that humans might act as a vehicle for the dissemination of certain plant viruses

Public Library of Science (PLOS)

Crossref

USFSP Digital Archive

LSHTM Research Online

Directory of Open Access Journals

PubMed Central

Scholar Commons - University of South Florida

ScholarBank@NUS

SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

Author: Dou Wensheng
Gao Chushu
Huang Tao
Wang Jie
Wei Jun
Xu Liang
Zhong Hua
Publication venue
Publication date: 27/04/2017
Field of study

Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

arXiv.org e-Print Archive

Crossref

Approaches to overcome flow cytometry limitations in the analysis of cells from veterinary relevant species

Author: Debes Gudrun F.
Hunka Julia
Riley John T
Publication venue: Jefferson Digital Commons
Publication date: 06/03/2020
Field of study

BACKGROUND: Flow cytometry is a powerful tool for the multiparameter analysis of leukocyte subsets on the single cell level. Recent advances have greatly increased the number of fluorochrome-labeled antibodies in flow cytometry. In particular, an increase in available fluorochromes with distinct excitation and emission spectra combined with novel multicolor flow cytometers with several lasers have enhanced the generation of multidimensional expression data for leukocytes and other cell types. However, these advances have mainly benefited the analysis of human or mouse cell samples given the lack of reagents for most animal species. The flow cytometric analysis of important veterinary, agricultural, wildlife, and other animal species is still hampered by several technical limitations, even though animal species other than the mouse can serve as more accurate models of specific human physiology and diseases. RESULTS: Here we present time-tested approaches that our laboratory regularly uses in the multiparameter flow cytometric analysis of ovine leukocytes. The discussed approaches will be applicable to the analysis of cells from most animal species and include direct modification of antibodies by covalent conjugation or Fc-directed labeling (Zenon™ technology), labeled secondary antibodies and other second step reagents, labeled receptor ligands, and antibodies with species cross-reactivity. CONCLUSIONS: Using refined technical approaches, the number of parameters analyzed by flow cytometry per cell sample can be greatly increased, enabling multidimensional analysis of rare samples and giving critical insight into veterinary and other less commonly analyzed species. By maximizing information from each cell sample, multicolor flow cytometry can reduce the required number of animals used in a study

Jefferson Digital Commons