Binary Journal of Data Mining & Networking
Not a member yet
    35 research outputs found

    FP-Growth Tree Based Algorithms Analysis: CP-Tree and K Map

    Get PDF
    We propose a novel frequent-pattern tree (FP-tree) structure; our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods. FP-tree method is efficient algorithm in association mining to mine frequent patterns in data mining, in spite of long or short frequent data patterns. By using compact best tree structure and partitioning-based and divide-and-conquer data mining searching method, it can be reduces the costs searchsubstantially .it just as the analysis multi-CPU or reduce computer memory to solve problem. But this approach can be apparently decrease the costs for exchanging and combining control information and the algorithm complexity is also greatly decreased, solve this problem efficiently. Even if main adopting multi-CPU technique, raising the requirement is basically hardware, best performanceimprovement is still to be limited. Is there any other way that most one may it can reduce these costs in FP-tree construction, performance best improvement is still limited

    Data Mining Based on Association Rule Privacy Preserving

    Get PDF
    The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful tool for discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important approaches for secure data. The objective of the proposed Association rulehiding algorithm for privacy preserving data mining is to hide certain information so that they cannot be discovered through association rule mining algorithm. The main approached of association rule hiding algorithms to hide some generated association rules, by increase or decrease the support or the confidence of the rules. The association rule items whether in Left Hand Side (LHS) or Right Hand Side (RHS) of the generated rule, that cannot be deduced through association rule mining algorithms. The concept of Increase Support of Left Hand Side (ISL) algorithm is decrease the confidence of rule by increase the support value of LHS. It doesnÊt work for both side of rule; it works only for modification of LHS. In Decrease Support of Right Hand Side (DSR) algorithm, confidence of the rule decrease by decrease the support value of RHS. It works for the modification of RHS. We proposed a new algorithm solves the problem of them. That can increase and decrease the support of the LHS and RHS item of the rule correspondingly so that more rule hide less number of modification. The efficiency of the proposed algorithm is compared with ISL algorithms and DSR algorithms using real databases, on the basis of number of rules hide, CPU time and the number of modifies entries and got better results

    A Hybrid Based RecommendationSystem based on Clustering and Association

    Get PDF
    Recommendation systems play an important role in filtering and customizing the desired information. Recommender system are divided into 3 categories i.e collaborative filtering , contentbased filtering, and hybrid filtering are the most adopted techniques being utilized in recommender systems. The main aim of this paper is to recommend the best suitable items to the user. In this paper the approach is to cluster the data and applying the association mining over clustering. The paper describes about different hybridization methods and discuss various limitations of current recommendation methods such as cold-start problem ,Graysheep problem,how to find the similarity between users and items and discuss possible extensions that can improve recommendation capabilities in range of applications extensions such as , improvement of understanding of users and items incorporation ofthe contextual information into the recommendation process, support for multicriteria ratings

    Analysis And ImplementationOf K-Mean And K-Medoids Algorithm For Large Dataset To Increase Scalability And Efficiency

    Get PDF
    The experiments are pursued on both synthetic in data sets are real. The synthetic data sets which we used for our experiments were generated using the procedure. We refer to readers to it for more details to the generation of large data sets. We report experimental results on two synthetic more data sets in this data set; the average transaction of size and its average maximal potentially frequent item set its size are set, while the number of process in the large dataset is set. It is a sparse of dataset. The frequent item sets are short and also numerous data sets to cluster. The second synthetic data set we used is. The average transaction size and average maximal potentially frequent item set size of set to 30 and 32 respectively. There exist exponentially numerous frequent item data sets in this data set when the support based on threshold goes down. There are also pretty long frequent item sets as well as a large number of short frequent item sets in it. It process of contains abundant mixtures of short and long frequent data item sets

    Data mining intelligent system for decision making based on ERP

    Get PDF
    As Enterprise Resource Planning (ERP) implementation has become more popular and suitable for every business organization, it has become a essential factor for the success of a business. This paper shows the best integration of ERP with Customer Relationship Management (CRM). Data Mining is overwhelming the integration in this model by giving support for applying best algorithm to make the successful result. This model has three major parts, outer view-CRM, inner view-ERP and knowledge discovery view. The CRM collect the customerÊs queries, EPR analyze and integrate the data and the knowledge discovery gave predictions and advises for the betterment of an organization. For the practical implementation of presented model, we use MADAR data and implemented Apriori Algorithm on it. Then the new rules and patterns suggested for the organization which helps the organization for solving the problem of customers in future correspondence

    A Roadmap: Designing and Construction of Data Warehouse

    Get PDF
    Data warehousing is not about the tools. Rather, it is about creating a strategy to plan, design, and construct a data store capable of answering business questions. Good strategy is a process that is never really finished; A defined data warehouse development process provides a foundation for reliability and reduction of risk. This process is defined through methodology. Reliability is pivotal in reducing the costs of maintenance and support. The data warehouse development enjoys high visibility; many firms have concentrated on reducing these costs. Standardization and reuse of the development artifacts and the deliverables of the process can reduce the time and cost of the data warehouseÊs creation. In todayÊs business world,data warehouses are increasingly being used to help companies make strategic business decisions. To understand how a warehouse can benefit you and what is required to manage a warehouse, you must first understand how a data warehouse is constructed and established

    Time-Series Data Mining:A Review

    Get PDF
    Data mining refers to the extraction of knowledge by analyzing the data from different perspectives and accumulates them to form useful information which could help the decision makers to take appropriate decisions. Classification and clustering has been the two broad areas in data mining. As the classification is a supervised learning approach, the clustering is an unsupervised learning approach and hence can be performed without the supervision of the domain experts. The basic concept is to group the objects in such a way so that the similar objects are closer to each. Time series data is observation of the data over a period of time. The estimation of the parameter, outlier detection and transformation of the data are some ofthe basic issues in handling the time series data. An approach is given for clustering the data based on the membership values assigned to each data point compressing the effect of outlier or noise present in the data. The Possibilistic Fuzzy C-Means (PFCM) with Error Prediction (EP) are done for the clustering and noise identification in the time-series data

    Data Visualization and Techniques

    Get PDF
    Data visualization is the graphical representation of information. Bar charts scatter graphs, and maps are examples of simple data visualizations that have been used for decades. Information technology combines the principles of visualization with powerful applications and large data sets to create sophisticated images and animations. A tag cloud, for instance, uses text size to indicate the relative frequency of use of a set of terms. In many cases, the data that feed a tag cloud come from thousands of Web pages, representing perhaps millions ofusers. All of this information is contained in a simple image that you can understand quickly and easily. More complex visualizations sometimes generate animations that demonstrate how data change over time. In an application called Gap minder, bubbles represent the countriesof the world, with each nationÊs population reflected in the size of its bubble. You can set the x and y axes to compare life expectancy with per capita income, for example, and the tool will show how each nationÊs bubble moves on the graph over time. You can see that higher income generallycorrelates with longer life expectancy, but the visualization also clearly shows that China doesnÊt follow this trend·in 1975, the country had one of the lowest per capita incomes but one of the longer life expectancies. The animation also shows the steep drop in life expectancy in many sub-Saharan African countries starting in the early 1990s (corresponding to the AIDS epidemic in that part of the world) and the plummeting of life expectancy in Rwanda at the time of that nationÊs genocide

    Data Mining Techniques in Cancer Research Area

    Get PDF
    In this paper we present an analysis of the prediction of survivability on different attributes, rate ofbreast cancer patients using data mining techniques. The data used is the real data. Thepreprocessed data set, which have all the available twelve fields from the database. We haveinvestigated data mining techniques

    Mining Frequent Item Sets Data Streams using "ÉclatAlgorithm"

    Get PDF
    Frequent pattern mining is the process of mining data in a set of items or some patterns from a largedatabase. The resulted frequent set data supports the minimum support threshold. A frequentpattern is a pattern that occurs frequently in a dataset. Association rule mining is defined as to findout association rules that satisfy the predefined minimum support and confidence from a given database. If an item set is said to be frequent, that item set supports the minimum support andconfidence. A Frequent item set should appear in all the transaction of that data base. Discoveringfrequent item sets play a very important role in mining association rules, sequence rules, web logmining and many other interesting patterns among complex data. Data stream is a real timecontinuous, ordered sequence of items. It is an uninterrupted flow of a long sequence of data. Somereal time examples of data stream data are sensor network data, telecommunication data,transactional data and scientific surveillances systems. These data produced trillions of updatesevery day. So it is very difficult to store the entire data. In that time some mining process is required.Data mining is the non-trivial process of identifying valid, original, potentially useful and ultimatelyunderstandable patterns in data. It is an extraction of the hidden predictive information from largedata base. There are lots of algorithms used to find out the frequent item set. In that Apriorialgorithm is the very first classical algorithm used to find the frequent item set. Apart from Apriori,lots of algorithms generated but they are similar to Apriori. They are based on prune and candidategeneration. It takes more memory and time to find out the frequent item set. In this paper, we havestudied about how the éclat algorithm is used in data streams to find out the frequent item sets.Éclat algorithm need not required candidate generation

    24

    full texts

    35

    metadata records
    Updated in last 30 days.
    Binary Journal of Data Mining & Networking
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇