4 research outputs found

    A Survey on Data Mining and Analysis in Hadoop and MongoDb

    Get PDF
    Data  Mining is a process to generate pattern and rules from various types of data marts and data warehouses ,in this process there are several steps which contains data cleaning data anomaly detection then clean data is mined with various approaches .In this research we have discussed data mining on large datasets ( Big Data) with this large data set major issues are scalability and security ,Hadoop is the tool to mine the data and Mongo db provides input for it, which is a key-value paradigm for parsing the data ,Other approaches are discussed with this report and their capability for data storage ,Map reduce is method which can be  used to reduce the data set to reduce query processing time and improve system throughput, In the Proposed system we are going to mine the big data this  Hadoop and Mongo db and we will try to mine the data with sorted or double sorted key value pair ,for and analyze the outcome of system. Keywords- DataMIning , Hadoop, MapReduce, HDFS, MongoDb

    A Survey of Model Used for Web User’s Browsing Behavior Prediction

    Get PDF
    The motivation behind the work is that the prediction of web user’s browsing behavior while serving the Internet, reduces the user’s browsing access time and avoids the visit of unnecessary pages to ease network traffic. Various models such as fuzzy interference models, support vector machines (SVMs), artificial neural networks (ANNs), association rule mining (ARM), k-nearest neighbor(kNN) Markov model, Kth order Markov model, all-Kth Markov model and modified Markov model were proposed to handle Web page prediction problem. Many times, the combination of two or more models were used to achieve higher prediction accuracy. This research work introduces the Support Vector Machines for web page prediction. The advantages of using support vector machines is that it offers most robust and accurate classification due to their generalized properties with its solid theoretical foundation and proven effectiveness. Web contains enormous amount of data and web data increases exponentially but the training time for Support vector machine is very large. That is, SVM’s suffer from a widely recognized scalability problem in both memory requirement and computation time when the input dataset is too large. To address this, I aimed at training the Support vector machine model in MapReduce programming model of Hadoop framework, since the MapReduce programming model has the ability to rapidly process large amount of data in parallel. MapReduce works in tandem with Hadoop Distributed File System (HDFS). So proposed approach will solve the scalability problem of present SVM algorithm. Keywords:Web Page Prediction, Support Vector Machines, Hadoop, MapReduce, HDFS
    corecore