'Columbia University Libraries/Information Services'
Doi
Abstract
We discuss the KDD process in "data-flow" environments, where unstructured and time dependent data can be processed into various levels of structured and semantically-rich forms for analysis tasks. Using network intrusion detection as a concrete application example, we describe how to construct models that are both accurate in describing the underlying concepts, and efficient when used to analyze data in real-time. We present procedures for analyzing frequent patterns from lower level data and constructing appropriate features to formulate higher level data. The features generated from various levels of data have different computational costs (in time and space). We show that in order to minimize the time required in using the classification models in a real-time environment, we can exploit the "necessary conditions" associated with the low-cost features to determine whether some high-cost features need to be computed and the corresponding classification rules need to be checked. We have applied our tools to the problem of building network intrusion detection models. We report our experiments using the network data provided as part of the 1998 DARPA Intrusion Detection Evaluation program. We also discuss our experience in using the mined models in NFR, a real-time network intrusion detection system