Search CORE

193 research outputs found

High Performance Frequent Subgraph Mining on Transactional Datasets

Author: Jena Bismita
Publication venue: ScholarWorks @ Georgia State University
Publication date: 06/05/2019
Field of study

Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, and Web etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years several research groups have attempted to handle the frequent subgraph mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google\u27s MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and non-iterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this work, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results

Prognosis of Disease that may Occur with Growing Age using Confabulation Based Algorithm

Author: Gautam Jyoti
Srivastava Neha
Publication venue: 'Defence Scientific Information and Documentation Centre'
Publication date: 10/11/2017
Field of study

The enduring diagnosis of patient’s medical records might be useful to determine the causes that are responsible for a particular disease. So that, one can take early preventive measures to curtail the risk of diseases that may occur with the growing age. Consequently, this can enhance the life expectancy probability. Here, a new algorithm CMARM is proposed for analysis of symptoms in order to find out the disease that may occur frequently and rarely with growing age. It uses map reduce paradigm inspired by cognitive learning. It is concerned with acquisition of problem-solving skills, intelligence and conscious thought and uses prevailing knowledge to generate new rules. It has been evaluated over synthetic data sets collected from the health data repository. Since, CMARM requires one-time file access therefore, it is consistently faster and also consuming less memory space than the FP tree based algorithm

Defence Life Science Journal

Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks

Author: Kambatla Karthik Shashank
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks