4 research outputs found

    Enabling the Discovery of Recurring Anomalies in Aerospace System Problem Reports using High-Dimensional Clustering Techniques

    Get PDF
    This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining techniques to discover anomalies in free-text reports regarding system health and safety of two aerospace systems. We discuss two problems of significant importance in the aviation industry. The first problem is that of automatic anomaly discovery about an aerospace system through the analysis of tens of thousands of free-text problem reports that are written about the system. The second problem that we address is that of automatic discovery of recurring anomalies, i.e., anomalies that may be described m different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system. The intent of recurring anomaly identification is to determine project or system weakness or high-risk issues. The discovery of recurring anomalies is a key goal in building safe, reliable, and cost-effective aerospace systems. We address the anomaly discovery problem on thousands of free-text reports using two strategies: (1) as an unsupervised learning problem where an algorithm takes free-text reports as input and automatically groups them into different bins, where each bin corresponds to a different unknown anomaly category; and (2) as a supervised learning problem where the algorithm classifies the free-text reports into one of a number of known anomaly categories. We then discuss the application of these methods to the problem of discovering recurring anomalies. In fact the special nature of recurring anomalies (very small cluster sizes) requires incorporating new methods and measures to enhance the original approach for anomaly detection. ?& pant 0

    Efficient algorithms for distortion and blocking techniques in association rule hiding

    No full text
    Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions. Finally, we study the overall security of the modified database, using the C4.5 decision tree algorithm of the WEKA data mining tool, and we discuss the advantages and the limitations of blocking

    1

    No full text
    We present automated, real-time models built with machine learning algorithms which use videotapes of subjects’ faces in conjunction with physiological measurements to predict rated emotion (trained coders ’ second-by-second assessments of sadness or amusement). Input consisted of videotapes of 41 subjects watching emotionallyevocative films along with measures of their cardiovascular activity, somatic activity, and electrodermal responding. We built algorithms based on extracted points from the subjects ’ faces as well as their physiological responses. Strengths of the current approach are 1) we are assessing real behavior of subjects watching emotional videos instead of actors making facial poses, 2) the training data allow us to predict both emotion type (amusement versus sadness) as well as the intensity level of each emotion, 3) we provide a direct comparison between person-specific, gender-specific, and general models. Results demonstrated good fits for the models overall, with better performance for emotion categories than for emotion intensity, for amusement ratings than sadness ratings, for a full model using both physiological measures and facial tracking than for either cue alone, and for person-specific models than for gender-specific or general models. 1
    corecore