13 research outputs found

    A Novel Approach for Scalability a Two Way Sequential Pattern Mining using UDDAG

    Get PDF
    Traditional pattern growth-based approaches for sequential pattern mining derive length- (k + 1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in | log2 k + 1| levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces

    OSSM: Ordered Sequence set mining for maximal length frequent sequences

    Get PDF
    The process of finding sequential rules is an indispensable in frequent sequence mining. Generally, in sequence mining algorithms, suitable methodologies like a bottom2013;up approach will be used for creating large sequences from tiny patterns. This paper proposed on an algorithm that uses a hybrid two-way (bottom-up and top-down) approach for mining maximal length sequences. The model proposed is opting to bottom-up approach called 201C;Concurrent Edge Prevision and Rear Edge Pruning (CE

    シーケンシャルパターンマイニング拡張による特徴的なコード進行の抽出手法

    Get PDF
    近年,楽器演奏技術を持たないユーザーでも作曲を楽しむことができる作曲環境が注目されている.しかし,初心者が音楽理論を習得し,適切なコード進行を構成することは容易ではない.また,経験者にとっても,終止形などの特定の進行や楽曲の展開を重視したり,一定の作曲者やジャンル等を意識して楽曲を構成したりすることが考えられる. したがって,実際の楽曲に使われている特徴的なコード進行を直感的かつ適切に構成できる作曲支援環境が必要であると考える.特徴的なコード進行とは,単によく使われているというだけではなく,音楽理論上重要視されるコード進行や,楽曲情報(作曲者やジャンルなど)・楽曲構造(小節線や楽曲の開始・終了など)と関連して用いられるコード進行である.そのようなコード進行を積極的に作曲に用いることで,和声を構成する負担が軽減され,より容易な楽曲制作が可能になると筆者は考える. そこで本研究では,実際の楽曲のコード進行から頻出パターンマイニングの一種であるシーケンシャルパターンマイニングを拡張することで,特徴的なコード進行を定量的に発見・顕在化する手法を提案した.具体的には,アイテムへの親子関係を導入することで転回形を含むコード進行を抽出しやすくし,作曲者や小節線などの疑似アイテムを付加してマイニング制御を行うことで楽曲情報や楽曲構造を考慮したパターンを抽出できるよう拡張した. 本提案手法を楽曲150曲のコード進行データに適用した実験を行ったところ,親子関係の導入によって,support値が少なく抽出しにくいパターンや既存手法では発見できないパターンが高い最小support値で提示できるようになった.また,疑似アイテムとして楽曲のメタ情報や小節線,楽曲の開始・終了を表すアイテムを付加することにより,作曲者・ジャンルごとのコードの使用傾向を明らかにし,出現位置やコードの音価の特徴を抽出パターンとして反映することができた.電気通信大学201

    A STUDY ON EDUCATIONAL DATA MINING THROUGH QUESTIONNAIRE SURVEY

    Get PDF
    Educational Data Mining (EDM) is a recent area yet with many fields to be researched. Applying Data Mining techniques to education data help us dealing with issues that would be hard without them. With its techniques and methods we try to discover behaviours and strategies, both for students and teachers. This information will take us towards the discovery of which strategies must be avoided, which teaching strategies can be adapted to each kind of student or to anticipate which students will fail so they can be helped since an early stage. In This Paper we have conducted a Questionnaire Survey on 500 Software Engineers to understand the present scenario of EDM

    A Novel Approach for Scalability – Two Way Sequential Pattern Mining using UDDAG

    Get PDF
    Traditional pattern growth-based approaches for sequential pattern mining derive length- (k + 1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in | log2 k + 1| levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces

    Maximal frequent sequences applied to drug-drug interaction extraction

    Full text link
    A drug-drug interaction (DDI) occurs when the effects of a drug are modified by the presence of other drugs. DDIs can decrease therapeutic benefit or efficacy of treatments and this could have very harmful consequences in the patient's health that could even cause the patient's death. Knowing the interactions between prescribed drugs is of great clinical importance, it is very important to keep databases up-to-date with respect to new DDI. In this thesis we aim to build a system to assist healthcare professionals to be updated about published drug-drug interactions. The goal of this thesis is to study a method based on maximal frequent sequences (MFS) and machine learning techniques in order to automatically detect interactions between drugs in pharmacological and medical literature. With the study of these methods, the IT community will assist healthcare community to update their drug interactions database in a fast and semi-automatic way. In a first solution, we classify pharmacological sentences depending on whether or not they are describing a drug-drug interaction. This would enable to automatically find sentences containing drug-drug interactions. This solution is completely based in maximal frequent sequences (MFS) extracted from a set of test documents. In a second solution based in machine learning, we go further in the search and perform DDI extraction, determining if two specific drugs appearing in a sentence interact or not. This can be used as an assisting tool to populate databases with drug-drug interactions. The machine learning classifier is trained with several features i.e., bag of words, word categories, MFS, token and char level features and drug level features. The classifier we used was a Random Forest. This system was sent to the DDIExtraction 2011 competition and reached the 6th position. Finally, we introduce Maximal Frequent Discriminative Sequences (MFDS), a novel method of sequential pattern discovery that extends the concept of MFS to adapt it to classification tasks.García Blasco, S. (2012). Maximal frequent sequences applied to drug-drug interaction extraction. http://hdl.handle.net/10251/15342Archivo delegad

    Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints

    No full text

    Generalization of Pattern-growth Methods for Sequential Pattern Mining with Gap Constraints

    No full text
    Abstract The problem of sequential pattern mining is one of the several issues that have deserved particular attention on the general problem of data mining. Despite the important developments in the last years, the best algorithm in the area (PrefixSpan) does not deal with gap constraints and consequently doesn't allow for the introduction of background knowledge into the process. In this paper we present the generalization of the PrefixSpan algorithm to deal with gap constraints, using a new method to generate projected databases. Studies on performance and scalability were conducted in synthetic and real-life datasets, and the respective results are presented
    corecore