414 research outputs found
Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines
Recent research in novelty detection focuses mainly on document-level classification, employing deep neural networks (DNN). However, the black-box nature of DNNs makes it difficult to extract an exact explanation of why a document is considered novel. In addition, dealing with novelty at the word level is crucial to provide a more fine-grained analysis than what is available at the document level. In this work, we propose a Tsetlin Machine (TM)-based architecture for scoring individual words according to their contribution to novelty. Our approach encodes a description of the novel documents using the linguistic patterns captured by TM clauses. We then adapt this description to measure how much a word contributes to making documents novel. Our experimental results demonstrate how our approach breaks down novelty into interpretable phrases, successfully measuring novelty.publishedVersionPaid Open Acces
NLP Methods in Host-based Intrusion Detection Systems: A Systematic Review and Future Directions
Host based Intrusion Detection System (HIDS) is an effective last line of
defense for defending against cyber security attacks after perimeter defenses
(e.g., Network based Intrusion Detection System and Firewall) have failed or
been bypassed. HIDS is widely adopted in the industry as HIDS is ranked among
the top two most used security tools by Security Operation Centers (SOC) of
organizations. Although effective and efficient HIDS is highly desirable for
industrial organizations, the evolution of increasingly complex attack patterns
causes several challenges resulting in performance degradation of HIDS (e.g.,
high false alert rate creating alert fatigue for SOC staff). Since Natural
Language Processing (NLP) methods are better suited for identifying complex
attack patterns, an increasing number of HIDS are leveraging the advances in
NLP that have shown effective and efficient performance in precisely detecting
low footprint, zero day attacks and predicting the next steps of attackers.
This active research trend of using NLP in HIDS demands a synthesized and
comprehensive body of knowledge of NLP based HIDS. Thus, we conducted a
systematic review of the literature on the end to end pipeline of the use of
NLP in HIDS development. For the end to end NLP based HIDS development
pipeline, we identify, taxonomically categorize and systematically compare the
state of the art of NLP methods usage in HIDS, attacks detected by these NLP
methods, datasets and evaluation metrics which are used to evaluate the NLP
based HIDS. We highlight the relevant prevalent practices, considerations,
advantages and limitations to support the HIDS developers. We also outline the
future research directions for the NLP based HIDS development
- …