99,950 research outputs found

    Mining frequent sequential patterns in data streams using SSM-algorithm.

    Get PDF
    Frequent sequential mining is the process of discovering frequent sequential patterns in data sequences as found in applications like web log access sequences. In data stream applications, data arrive at high speed rates in a continuous flow. Data stream mining is an online process different from traditional mining. Traditional mining algorithms work on an entire static dataset in order to obtain results while data stream mining algorithms work with continuously arriving data streams. With rapid change in technology, there are many applications that take data as continuous streams. Examples include stock tickers, network traffic measurements, click stream data, data feeds from sensor networks, and telecom call records. Mining frequent sequential patterns on data stream applications contend with many challenges such as limited memory for unlimited data, inability of algorithms to scan infinitely flowing original dataset more than once and to deliver current and accurate result on demand. This thesis proposes SSM-Algorithm (sequential stream mining-algorithm) that delivers frequent sequential patterns in data streams. The concept of this work came from FP-Stream algorithm that delivers time sensitive frequent patterns. Proposed SSM-Algorithm outperforms FP-Stream algorithm by the use of a hash based and two efficient tree based data structures. All incoming streams are handled dynamically to improve memory usage. SSM-Algorithm maintains frequent sequences incrementally and delivers most current result on demand. The introduced algorithm can be deployed to analyze e-commerce data where the primary source of the data is click stream data. (Abstract shortened by UMI.)Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .M668. Source: Masters Abstracts International, Volume: 44-03, page: 1409. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    Mining Frequent Item Sets Data Streams using "ÉclatAlgorithm"

    Get PDF
    Frequent pattern mining is the process of mining data in a set of items or some patterns from a largedatabase. The resulted frequent set data supports the minimum support threshold. A frequentpattern is a pattern that occurs frequently in a dataset. Association rule mining is defined as to findout association rules that satisfy the predefined minimum support and confidence from a given database. If an item set is said to be frequent, that item set supports the minimum support andconfidence. A Frequent item set should appear in all the transaction of that data base. Discoveringfrequent item sets play a very important role in mining association rules, sequence rules, web logmining and many other interesting patterns among complex data. Data stream is a real timecontinuous, ordered sequence of items. It is an uninterrupted flow of a long sequence of data. Somereal time examples of data stream data are sensor network data, telecommunication data,transactional data and scientific surveillances systems. These data produced trillions of updatesevery day. So it is very difficult to store the entire data. In that time some mining process is required.Data mining is the non-trivial process of identifying valid, original, potentially useful and ultimatelyunderstandable patterns in data. It is an extraction of the hidden predictive information from largedata base. There are lots of algorithms used to find out the frequent item set. In that Apriorialgorithm is the very first classical algorithm used to find the frequent item set. Apart from Apriori,lots of algorithms generated but they are similar to Apriori. They are based on prune and candidategeneration. It takes more memory and time to find out the frequent item set. In this paper, we havestudied about how the éclat algorithm is used in data streams to find out the frequent item sets.Éclat algorithm need not required candidate generation

    Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification

    Full text link
    Mining discriminative subgraph patterns from graph data has attracted great interest in recent years. It has a wide variety of applications in disease diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the graph representation alone. However, in many real-world applications, the side information is available along with the graph data. For example, for neurological disorder identification, in addition to the brain networks derived from neuroimaging data, hundreds of clinical, immunologic, serologic and cognitive measures may also be documented for each subject. These measures compose multiple side views encoding a tremendous amount of supplemental information for diagnostic purposes, yet are often ignored. In this paper, we study the problem of discriminative subgraph selection using multiple side views and propose a novel solution to find an optimal set of subgraph features for graph classification by exploring a plurality of side views. We derive a feature evaluation criterion, named gSide, to estimate the usefulness of subgraph patterns based upon side views. Then we develop a branch-and-bound algorithm, called gMSV, to efficiently search for optimal subgraph features by integrating the subgraph mining process and the procedure of discriminative feature selection. Empirical studies on graph classification tasks for neurological disorders using brain networks demonstrate that subgraph patterns selected by the multi-side-view guided subgraph selection approach can effectively boost graph classification performances and are relevant to disease diagnosis.Comment: in Proceedings of IEEE International Conference on Data Mining (ICDM) 201

    Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

    Get PDF
    Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow

    Cyber security situational awareness

    Get PDF

    A log mining approach for process monitoring in SCADA

    Get PDF
    SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow
    corecore