Search CORE

123,249 research outputs found

Classifying Web Exploits with Topic Modeling

Author: Ruohonen Jukka
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/10/2017
Field of study

This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and Expert Systems Applications (DEXA). http://ieeexplore.ieee.org/abstract/document/8049693

arXiv.org e-Print Archive

Crossref

Testing Market Response to Auditor Change Filings: a comparison of machine learning classifiers

Author: Holowczak Richard
Louton David
Saraoglu Hakan
Publication venue: Bryant Digital Repository
Publication date: 23/08/2018
Field of study

The use of textual information contained in company filings with the Securities Exchange Commission (SEC), including annual reports on Form 10-K, quarterly reports on Form 10-Q, and current reports on Form 8-K, has gained the increased attention of finance and accounting researchers. In this paper we use a set of machine learning methods to predict the market response to changes in a firm\u27s auditor as reported in public filings. We vectorize the text of 8-K filings to test whether the resulting feature matrix can explain the sign of the market response to the filing. Specifically, using classification algorithms and a sample consisting of the Item 4.01 text of 8-K documents, which provides information on changes in auditors of companies that are registered with the SEC, we predict the sign of the cumulative abnormal return (CAR) around 8-K filing dates. We report the correct classification performance and time efficiency of the classification algorithms. Our results show some improvement over the naïve classification method

DigitalCommons@Bryant University

Mapping Subsets of Scholarly Information

Author: Ginsparg Paul
Houle Paul
Joachims Thorsten
Sul Jae-Hoon
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2003
Field of study

We illustrate the use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature. An emerging field of research can be identified as part of an existing corpus, permitting the implementation of a more coherent community structure for its practitioners.Comment: 10 pages, 4 figures, presented at Arthur M. Sackler Colloquium on "Mapping Knowledge Domains", 9--11 May 2003, Beckman Center, Irvine, CA, proceedings to appear in PNA

arXiv.org e-Print Archive

CiteSeerX

Crossref

PubMed Central