109,804 research outputs found

    Implementing Graph Pattern Mining for Big Data in the Cloud

    Get PDF
    With the increasing popularity of various social networking sites, there is an explosive growth in data associated with these, so mining big data has become an important problem in the graph pattern mining research area. Graph mining helps to explore the patterns from networks or databases. Till now various graph mining techniques exist for mining frequent patterns for a graph database which contains relatively small sized graphs. But with the rapid arrival of the era of big data, traditional graph mining approaches have been unable to meet large data analysis needs. In this context, this paper proposes an adaptation to the big graph data mining approach especially in the field of social networks. The proposed approach is based on Hadoop plateform, and improves the efficiency by processing big data in distributed fashion. Again the proposed approach can be adapted to cloud environment which has the merits – load balancing, scalability and efficiency. Experiments have been conducted with real Facebook data set. The approach can be also adapted to dataset larger than experimented data. DOI: 10.17762/ijritcc2321-8169.150514

    Discovering High Utility Itemsets using Hybrid Approach

    Get PDF
    Mining of high utility itemsets especially from the big transactional databases is time consuming task. For mining the high utility itemsets from large transactional datasets multiple methods are available and have some consequential limitations. In case of performance these methods need to be scrutinized under low memory based systems for mining high utility itemsets from transactional datasets as well as to address further measures. The proposed algorithm combines the High Utility Pattern Mining and Incremental Frequent Pattern Mining. Two algorithms used are Apriori and existing Parallel UP Growth for mining high utility itemsets using transactional databases. The information about high utility itemsets is maintained in a data structure called UP tree. These algorithms are not only used to scans the incremental database but also collects newly generated frequent itemsets support count. It provides fast execution because it includes new itemsets in tree and removes rare itemset from a utility pattern tree structure that reduces cost and time. From various Experimental analysis and results, this hybrid approach with existing Apriori and UP-Growth is proposed with aim of improving the performance

    Visualizing big network traffic data using frequent pattern mining and hypergraphs

    Get PDF
    Visualizing communication logs, like NetFlow records, is extremely useful for numerous tasks that need to analyze network traffic traces, like network planning, performance monitoring, and troubleshooting. Communication logs, however, can be massive, which necessitates designing effective visualization techniques for large data sets. To address this problem, we introduce a novel network traffic visualization scheme based on the key ideas of (1) exploiting frequent itemset mining (FIM) to visualize a succinct set of interesting traffic patterns extracted from large traces of communication logs; and (2) visualizing extracted patterns as hypergraphs that clearly display multi-attribute associations. We demonstrate case studies that support the utility of our visualization scheme and show that it enables the visualization of substantially larger data sets than existing network traffic visualization schemes based on parallel-coordinate plots or graphs. For example, we show that our scheme can easily visualize the patterns of more than 41 million NetFlow records. Previous research has explored using parallel-coordinate plots for visualizing network traffic flows. However, such plots do not scale to data sets with thousands of even millions of flows

    STUDI DAN IMPLEMENTASI ALGORITMA H MINE UNTUK MENEMUKAN FREQUENT PATTERNS

    Get PDF
    ABSTRAKSI: Generally, large databases potent to hide a lot of high valuable information. The information can be obtained by paying attention to repeatly accuring patterns. Frequent Pattern is one of important recuring pattern type in data mining.Getting frequent pattern from large databases needs big enough cost. Therefore an efficient algorithm is necessity. H-Mine is a mining frequent pattern algorithm which has good performance. H-Mine is an efficient algorithm since not require generation of candidate frequent pattern. H-Mine uses a data structure called H-Struct to generate frequent patterns. In this final project, the writer tries to study complexity of H-Mine algorithm. For performance analyse requirement of H-Mine algorithm, then would be builded a software for implementing H-Mine algorithm. But the software that has been built only could generate pattern with maximum length three. Because, the writer cann’t implement recursive function of H-Mine algorithm.Kata Kunci : frequent pattern, minimum support , H-Mine, H-Struct.ABSTRACT: Generally, large databases potent to hide a lot of high valuable information. The information can be obtained by paying attention to repeatly accuring patterns. Frequent Pattern is one of important recuring pattern type in data mining.Getting frequent pattern from large databases needs big enough cost. Therefore an efficient algorithm is necessity. H-Mine is a mining frequent pattern algorithm which has good performance. H-Mine is an efficient algorithm since not require generation of candidate frequent pattern. H-Mine uses a data structure called H-Struct to generate frequent patterns. In this final project, the writer tries to study complexity of H-Mine algorithm. For performance analyse requirement of H-Mine algorithm, then would be builded a software for implementing H-Mine algorithm. But the software that has been built only could generate pattern with maximum length three. Because, the writer cann’t implement recursive function of H-Mine algorithm.Keyword: frequent pattern, minimum support , H-Mine, H-Struct

    FIBS: A Generic Framework for Classifying Interval-based Temporal Sequences

    Full text link
    We study the problem of classifying interval-based temporal sequences (IBTSs). Since common classification algorithms cannot be directly applied to IBTSs, the main challenge is to define a set of features that effectively represents the data such that classifiers can be applied. Most prior work utilizes frequent pattern mining to define a feature set based on discovered patterns. However, frequent pattern mining is computationally expensive and often discovers many irrelevant patterns. To address this shortcoming, we propose the FIBS framework for classifying IBTSs. FIBS extracts features relevant to classification from IBTSs based on relative frequency and temporal relations. To avoid selecting irrelevant features, a filter-based selection strategy is incorporated into FIBS. Our empirical evaluation on eight real-world datasets demonstrates the effectiveness of our methods in practice. The results provide evidence that FIBS effectively represents IBTSs for classification algorithms, which contributes to similar or significantly better accuracy compared to state-of-the-art competitors. It also suggests that the feature selection strategy is beneficial to FIBS's performance.Comment: In: Big Data Analytics and Knowledge Discovery. DaWaK 2020. Springer, Cha

    ANALISIS IMPLEMENTASI ALGORITMA PATTERN DECOMPOSITION UNTUK MENCARI POLA ASOSIASI PADA DATA MINING ANALYSIS OF PATTERN DECOMPOSITION ALGORITHM IMPLEMENTATION FOR SEARCHING ASSOCIATION RULES IN DATA MINING

    Get PDF
    ABSTRAKSI: Data mining adalah salah satu bidang yang berkembang pesat karena besarnya kebutuhan akan nilai tambah dari basis data skala besar yang makin banyak terakumulasi sejalan dengan pertumbuhan teknologi informasi. Implementasi dari data mining dapat memberikan kontribusi yang penting dalam dunia bisnis. Pola-pola asosiasi yang dihasilkan dapat digunakan sebagai bahan pertimbangan dalam pengambilan keputusan dalam suatu perusahaan. Asosiasi merupakan salah satu fungsionalitas atau teknik dari data mining untuk menemukan aturan assosiatif antara suatu kombinasi item. Berbagai algoritma pernah dikembangkan untuk mendapatkan pola-pola asosiasi dengan mempertimbangkan aspek efektifitas dan efisiensi. Tugas Akhir ini membahas analisis data mining untuk mencari pola asosiasi dari suatu data transaksi pada sebuah aplikasi yang menerapkan algoritma Pattern Decomposition. Analisis dilakukakan berdasarkan hasil pengujian. Dari pengujian didapatkan bahwa semakin besar nilai minimum support maka akan semakin kecil frequent itemsets yang dapat dibangkitkan. Hubungan linear antara minimum support dan frequent itemsets digambarkan dalam sebuah rumusan : f(x) = a*1/x^2 + c Dataset yang digunakan dalam pengujian sangat berpengaruh. Dataset dengan jumlah record yang besar akan membutuhkan waktu yang lama dalam proses pembangkitan, terlebih jika minimum support-nya kecil. Kata Kunci : data mining, asosiasi, pattern decomposition, minimum support, frequent itemsets.ABSTRACT: Data Mining is one of area which grows rapidly because level of requirement of added value from big scale database which gets a lot of accumulation in line with information technology growth. Implementation from data mining can give important contribution in the world of business. Association pattern yielded can be used as consideration material in decision making in a company. Association represent one of fungsionalist or technique from data mining to find the assosiatif order among an item combination. Various algorithm have been developed to get the association pattern by considering aspect of good effective and efficiency. This final task criticism analysis of data mining to look for the association rules from a transaction dataset on application by using Pattern Decomposition algorithm. Analysis based on experimentation result. From the experimentation we can get information that if minimum support is large, frequent itemsets is small. Linear relationship between minimum support dan frequent itemsets can be difined with this formula : f(x) = a*1/x^2 + c A dataset that used atexperimentation is very influential. Dataset with high number record will need many time for processing all mining, moreover with small minimum support.Keyword: data mining, association, pattern decomposition, minimum support, frequent itemsets

    ANALISIS PERBANDINGAN ALGORITMA FP-GROWTH DAN ALGORITMA TREE PROJECTION DALAM PEMBANGKITAN FREQUENT PATTERN COMPARISON ANALYSIS OF FP-GROWTH AND TREE PROJECTION ALGORITHMS IN FREQUENT PATTERN GENERATING

    Get PDF
    ABSTRAKSI: Sebuah basisdata berukuran besar biasanya berpotensi menyimpan banyak informasi penting. Informasi tersebut dapat diperoleh dari pola-pola yang muncul secara berulang. Salah satu jenis pola berulang dalam data mining adalah frequent pattern atau frequent itemset.Untuk mendapatkan frequent pattern pada basisdata yang besar, diperlukan cost yang cukup besar. Oleh karena itu dibutuhkan suatu algoritma yang efisien. FP-growth dan Tree Projection adalah algoritma mining frequent pattern yang memiliki performansi yang baik. FP-growth merupakan algoritma yang efisien karena tidak membutuhkan pembangkitan kandidat frequent pattern, sedangkan Tree Projection merupakan algoritma yang membangkitkan kandidat frequent pattern yang memiliki performansi yang baik. Untuk mengetahui kemampuan serta kelebihan dan kekurangan masing-masing algoritma tersebut, perlu dilakukan studi perbandingan kompleksitas dan performansi algoritma.Dalam tugas akhir ini dilakukan studi perbandingan kompleksitas dan performansi algoritma FP-growth dan Tree Projection. Untuk kebutuhan analisis performansi kedua algoritma ini, maka dibangun suatu perangkat lunak yang mengimplementasikan algoritma Tree Projection dan algoritma FP-growth.Berdasarkan hasil penelitian, kompleksitas algoritma FP-growth dan Tree Projection dipengaruhi oleh jumlah frequent items. Performansi kedua algoritma berbanding terbalik dengan minimum support. Performansi FP-growth lebih baik dari pada Tree Projection ketika minimum support semakin kecil, dan performansi Tree Projection semakin mendekati bahkan lebih baik dari pada FP-growth ketika minimum support semakin besar.Kata Kunci : data mining, association analysis, frequent pattern, minimum support , FP-growth, TreeProjectionABSTRACT: Generally, large databases potent to hide a lot of high valuable information. The information can be obtained by paying attention to repeatly accuring patterns. Frequent Pattern is one of important recuring pattern type in data mining.Getting frequent pattern from large databases needs big enough cost. Therefore an efficient algorithm is necessity. FP-growth and Tree Projection are mining frequent pattern algorithms which have good performance. FP-growth is an efficient algorithm since not require generation of candidate frequent pattern, while Tree Projection is generating candidate frequent pattern algorithm which have good performance. to know ability and also the advantage and disadvantage of the algorithm each other, need study of comparison of complexity and performace algorithm.In this final project will be conducted comparison study of complexity and performance FP-growth and Tree Projection algorithms. For performance analyse requirement of both algorithms, then would be builded a software for implementing Tree Projection and FP-growth algorithms.Based on experiment result, the complexity of FP-growth and Tree Projection effected by frequent items number. Both algorithm performance opposite ratio with minimum support threshold. FP-growth performance is better than Tree Projection when the minimum support getting smaller, and Tree Projection getting closer even better then FP-growth when the minimum support threshold is higher.Keyword: data mining, association analysis, frequent pattern, minimum support , FP-growth, TreeProjectio

    Framework for cost-effective analytical modelling for sensory data over cloud environment

    Get PDF
    In order to offer sensory data as a service over the cloud, it is necessary to execute a cost-effective and yet precise data analytical logic within the sensing units. However, it is quite questionable as such forms of analytical operation are quite resource dependent which cannot be offered by the resource constraint sensory units. Therefore, the proposed paper introduces a novel approach of performing cost-effective data analytical method in order to extract knowledge from big data over the cloud. The proposed study uses a novel concept of the frequent pattern along with a tree-based approach in order to develop an analytical model for carrying out the mining operation in the large-scale sensor deployment over the cloud environment. Using a simulation-based approach over the mathematical model, the proposed model exhibit reduced mining duration, controlled energy dissipation, and highly optimized memory demands for all the resource constraint nodes
    • …
    corecore