76 research outputs found

    On the Qualitative Behavior of Impurity-Based Splitting Rules I: The Minima-Free Property

    Get PDF
    We show that all strictly convex n impurity measures lead to splits at boundary points, and furthermore show that certain rational splitting rules, notably the information gain ratio, also have this property. A slightly weaker result is shown to hold for impurity measures that are only convex n, such as Inaccuracy

    Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars

    Full text link
    Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD's reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena

    Detecting the Abnormal: Machine Learning in Computer Security

    Get PDF
    TINOpr oblems of importance in computer security are to I) detect the presence of an intruder masquerading as the valid user and 2) detect the perpetration of abusive actions on the part of an otherwise innocuous user. In this paper we present a machine learning approach to anomaly detection, desigined to handle these two problems. Our system learns a user profile for each user account and subsequently employs it to detect anomalous behavior in that account. Based on sequences of actions (UNIX commands) of the current user\u27s input sti:earn, the system compares each fixed-length input sequence with a historical library of the account\u27s command sequences using a similarity measure. Tlle system must learn to classify current behavior as consistent or anomalous with past behavior using only positive examples of the account\u27s valid user. Our empirical results demonstrate that in most cases it is possib1.e to distingu. ish the legitimate user from an intruder and, furthermore, that an instance selection technique based on a memory page-replacement algorithm is capable of drastically reducing library size without hindering detection accuracy

    The Need for Diagnostics for Classification Algorithms

    No full text
    Many machine learning researchers view the task of inductive generalization as beginning after the data is collected, assuming that the useful features have been identified and that representative data has been collected. This assumption has led researchers to focus, with considerable success, on algorithm development. As a result, little attention has been paid to applying machine learning algorithms. One problem that arises is that when classification performance does not meet expectations, inexperienced practitioners can find little guidance in the available literature to help them. This talk addresses this gap between research and applied machine learning and suggests areas of research that can help bridge this gap. 1 THE APPLICATION DEVELOPMENT PROCESS The first step in the application development process is to analyze the factors relevant to the application domain. Application factors include the overall objectives of the project, the amount of domain knowledge available, and th..

    Dynamic Automatic Model Selection

    No full text
    The problem of how to learn from examples has been studied throughout the history of machine learning, and many successful learning algorithms have been developed. A problem that has received less attention is how to select which algorithm to use for a given learning task. The ability of a chosen algorithm to induce a good generalization depends on how appropriate the model class underlying the algorithm is for the given task. We define an algorithm's model class to be the representation language it uses to express a generalization of the examples. Supervised learning algorithms differ in their underlying model class and in how they search for a good generalization. Given this characterization, it is not surprising that some algorithms find better generalizations for some, but not all tasks. Therefore, in order to find the best generalization for each task, an automated learning system must search for the appropriate model class in addition to searching for the best generalization wit..

    An Application of Machine Learning to Anomaly Detection

    No full text
    The anomaly detection problem has been widely studied in the computer security literature. In this paper we present a machine learning approach to anomaly detection. Our system builds user profiles based on command sequences and compares current input sequences to the profile using a similarity measure. The system must learn to classify current behavior as consistent or anomalous with past behavior using only positive examples of the account's valid user. Our empirical results demonstrate that this is a promising approach to distinguishing the legitamate user from an intruder
    • …
    corecore