Search CORE

25 research outputs found

On Planning an Evaluation of the Impact of Identifier Names on the Readability and Maintainability of Programs

Author: Kurs Jan
Lungu Mircea
Publication venue
Publication date: 01/01/2013
Field of study

ARTS repository - University of Groningen

On Planning an Evaluation of the Impact of Identifier Names on the Readability and Maintainability of Programs

Author: Kurs Jan
Lungu Mircea
Publication venue
Publication date: 01/01/2013
Field of study

University of Groningen

Classifying Web Exploits with Topic Modeling

Author: Ruohonen Jukka
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/10/2017
Field of study

This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and Expert Systems Applications (DEXA). http://ieeexplore.ieee.org/abstract/document/8049693

arXiv.org e-Print Archive

Crossref

An eye-tracking study on the role of scan time in finding source code defects

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Crossref

The Python user interface of the elsA cfd software: a coupling framework for external steering layers

Author: Lazareff Marc
Publication venue
Publication date: 22/07/2016
Field of study

The Python--elsA user interface of the elsA cfd (Computational Fluid Dynamics) software has been developed to allow users to specify simulations with confidence, through a global context of description objects grouped inside scripts. The software main features are generated documentation, context checking and completion, and helpful error management. Further developments have used this foundation as a coupling framework, allowing (thanks to the descriptive approach) the coupling of external algorithms with the cfd solver in a simple and abstract way, leading to more success in complex simulations. Along with the description of the technical part of the interface, we try to gather the salient points pertaining to the psychological viewpoint of user experience (ux). We point out the differences between user interfaces and pure data management systems such as cgns

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

Author: Babii Hlib
Janes Andrea
Karampatsis Rafael-Michael
Robbes Romain
Sutton Charles
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/03/2020
Field of study

Statistical language modeling techniques have successfully been applied to large source code corpora, yielding a variety of new software development tools, such as tools for code suggestion, improving readability, and API migration. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. Both large vocabularies and out-of-vocabulary issues severely affect Neural Language Models (NLMs) of source code, degrading their performance and rendering them unable to scale. In this paper, we address this issue by: 1) studying how various modelling choices impact the resulting vocabulary on a large-scale corpus of 13,362 projects; 2) presenting an open vocabulary source code NLM that can scale to such a corpus, 100 times larger than in previous work; and 3) showing that such models outperform the state of the art on three distinct code corpora (Java, C, Python). To our knowledge, these are the largest NLMs for code that have been reported. All datasets, code, and trained models used in this work are publicly available.Comment: 13 pages; to appear in Proceedings of ICSE 202

arXiv.org e-Print Archive

Edinburgh Research Explorer