25 research outputs found

    Classifying Web Exploits with Topic Modeling

    Full text link
    This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and Expert Systems Applications (DEXA). http://ieeexplore.ieee.org/abstract/document/8049693

    An eye-tracking study on the role of scan time in finding source code defects

    Full text link

    The Python user interface of the elsA cfd software: a coupling framework for external steering layers

    Full text link
    The Python--elsA user interface of the elsA cfd (Computational Fluid Dynamics) software has been developed to allow users to specify simulations with confidence, through a global context of description objects grouped inside scripts. The software main features are generated documentation, context checking and completion, and helpful error management. Further developments have used this foundation as a coupling framework, allowing (thanks to the descriptive approach) the coupling of external algorithms with the cfd solver in a simple and abstract way, leading to more success in complex simulations. Along with the description of the technical part of the interface, we try to gather the salient points pertaining to the psychological viewpoint of user experience (ux). We point out the differences between user interfaces and pure data management systems such as cgns

    Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

    Get PDF
    Statistical language modeling techniques have successfully been applied to large source code corpora, yielding a variety of new software development tools, such as tools for code suggestion, improving readability, and API migration. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. Both large vocabularies and out-of-vocabulary issues severely affect Neural Language Models (NLMs) of source code, degrading their performance and rendering them unable to scale. In this paper, we address this issue by: 1) studying how various modelling choices impact the resulting vocabulary on a large-scale corpus of 13,362 projects; 2) presenting an open vocabulary source code NLM that can scale to such a corpus, 100 times larger than in previous work; and 3) showing that such models outperform the state of the art on three distinct code corpora (Java, C, Python). To our knowledge, these are the largest NLMs for code that have been reported. All datasets, code, and trained models used in this work are publicly available.Comment: 13 pages; to appear in Proceedings of ICSE 202
    corecore