Search CORE

1,274 research outputs found

HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

Author: Baig Abdul Rauf
Bashir Shariq
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/04/2009
Field of study

In this paper we present a novel hybrid (arraybased layout and vertical bitmap layout) database representation approach for mining complete Maximal Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan, 200

arXiv.org e-Print Archive

Crossref

Using a combination of methodologies for improving medical information retrieval performance

Author: Forghani Raissi Hoda
Publication venue
Publication date: 01/09/2013
Field of study

This thesis presents three approaches to improve the current state of Medical Information Retrieval. At the time of this writing, the health industry is experiencing a massive change in terms of introducing technology into all aspects of health delivery. The work in this thesis involves adapting existing established concepts in the field of Information Retrieval to the field of Medical Information Retrieval. In particular, we apply subtype filtering, ICD-9 codes, query expansion, and re-ranking methods in order to improve retrieval on medical texts. The first method applies association rule mining and cosine similarity measures. The second method applies subtype filtering and the Apriori algorithm. And the third method uses ICD-9 codes in order to improve retrieval accuracy. Overall, we show that the current state of medical information retrieval has substantial room for improvement. Our first two methods do not show significant improvements, while our third approach shows an improvement of up to 20%

YorkSpace

A Study Of Data Informatics: Data Analysis And Knowledge Discovery Via A Novel Data Mining Algorithm

Author: Balan Shilpa
Publication venue: eGrove
Publication date: 01/01/2014
Field of study

Frequent pattern mining (fpm) has become extremely popular among data mining researchers because it provides interesting and valuable patterns from large datasets. The decreasing cost of storage devices and the increasing availability of processing power make it possible for researchers to build and analyze gigantic datasets in various scientific and business domains. A filtering process is needed, however, to generate patterns that are relevant. This dissertation contributes to addressing this need. An experimental system named fpmies (frequent pattern mining information extraction system) was built to extract information from electronic documents automatically. Collocation analysis was used to analyze the relationship of words. Template mining was used to build the experimental system which is the foundation of fpmies. With the rising need for improved environmental performance, a dataset based on green supply chain practices of three companies was used to test fpmies. The new system was also tested by users resulting in a recall of 83.4%. The new algorithm\u27s combination of semantic relationships with template mining significantly improves the recall of fpmies. The study\u27s results also show that fpmies is much more efficient than manually trying to extract information. Finally, the performance of the fpmies system was compared with the most popular fpm algorithm, apriori, yielding a significantly improved recall and precision for fpmies (76.7% and 74.6% respectively) compared to that of apriori (30% recall and 24.6% precision)

eGrove (Univ. of Mississippi)

A new strategy for case-based reasoning retrieval using classification based on association

Author: Aljuboori AS
Meziane F
Parsons DJ
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper proposes a novel strategy, Case-Based Reasoning Using Association Rules (CBRAR) to improve the performance of the Similarity base Retrieval SBR, classed frequent pattern trees FP-CAR algorithm, in order to disambiguate wrongly retrieved cases in Case-Based Reasoning (CBR). CBRAR use class as-sociation rules (CARs) to generate an optimum FP-tree which holds a value of each node. The possible advantage offered is that more efficient results can be gained when SBR returns uncertain answers. We compare the CBR Query as a pattern with FP-CAR patterns to identify the longest length of the voted class. If the patterns are matched, the proposed strategy can select not just the most similar case but the correct one. Our experimental evaluation on real data from the UCI repository indicates that the proposed CBRAR is a better approach when com-pared to the accuracy of the CBR systems used in our experiments

University of Salford Institutional Repository

Crossref

UDORA - University of Derby Online Research Archive

High Performance Frequent Subgraph Mining on Transactional Datasets

Author: Jena Bismita
Publication venue: ScholarWorks @ Georgia State University
Publication date: 06/05/2019
Field of study

Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, and Web etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years several research groups have attempted to handle the frequent subgraph mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google\u27s MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and non-iterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this work, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results

ScholarWorks @ Georgia State University

Mining Privacy-Preserving Association Rules based on Parallel Processing in Cloud Computing

Author: B Murugeshwari
D Arul Kumar
D Dhinakaran
D Selvaraj
M Joe Prathap P.
Publication venue: 'Seventh Sense Research Group Journals'
Publication date: 21/04/2023
Field of study

With the onset of the Information Era and the rapid growth of information technology, ample space for processing and extracting data has opened up. However, privacy concerns may stifle expansion throughout this area. The challenge of reliable mining techniques when transactions disperse across sources is addressed in this study. This work looks at the prospect of creating a new set of three algorithms that can obtain maximum privacy, data utility, and time savings while doing so. This paper proposes a unique double encryption and Transaction Splitter approach to alter the database to optimize the data utility and confidentiality tradeoff in the preparation phase. This paper presents a customized apriori approach for the mining process, which does not examine the entire database to estimate the support for each attribute. Existing distributed data solutions have a high encryption complexity and an insufficient specification of many participants' properties. Proposed solutions provide increased privacy protection against a variety of attack models. Furthermore, in terms of communication cycles and processing complexity, it is much simpler and quicker. Proposed work tests on top of a realworld transaction database demonstrate that the aim of the proposed method is realistic

arXiv.org e-Print Archive

BCS SGAI SMA 2013: the BCS SGAI workshop on social media analysis

Author
Publication venue: M. Jeusfeld
Publication date: 01/01/2013
Field of study

Portsmouth University Research Portal (Pure)

Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem

Author: Belhadi Asma
Cano Alberto
Djenouri Youcef
Lin Jerry Chun-Wei
Zhang Chongsheng
Publication venue: VCU Scholars Compass
Publication date: 01/01/2020
Field of study

Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users\u27 queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy

SINTEF Open

VCU Scholars Compass

NORA - Norwegian Open Research Archives