16 research outputs found

    An Association of Efficient Mining by Compressed Database

    Get PDF
    Data mining can be viewed as a result of the natural evolution of information technology. The spread of computing has led to an explosion in the volume of data to be stored on hard disks and sent over the Internet. This growth has led to a need for data compression, that is, the ability to reduce the amount of storage or Internet bandwidth required to handle the data. This paper analysis the various data mining approaches which is used to compress the original database into a smaller one and perform the data mining process for compressed transaction such as M2TQT,PINCER-SEARCH algorithm, APRIORI & ID3 algorithm, TM algorithm, AIS & SETM, CT-Apriori algorithm, CBMine, CTITL algorithm, FIUT- Tree. Among the various  techniques M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate item sets which are impossible to become frequent in order to improve the performance of mining association rules. Thus M2TQT is observed to perform better than existing approaches

    Survey performance Improvement FP-Tree Based Algorithms Analysis

    Get PDF
    Construction of a compact FP-tree ensures that subsequent mining can be performed with a rather compact data structure. For large databases, the research on improving the mining performance and precision is necessary; so many focuses of today on association rule mining are about new mining theories, algorithms and improvement to old methods. Association rules mining is a function of data mining research domain and arise many researchers interest to design a high efficient algorithm to mine association rules from transaction database. Generally the entire frequent item sets discovery from the database in the process of association rule mining shares of larger, these algorithms considered as efficient because of their compact structure and also for less generation of candidates item sets compare to Apriori .the price is also spending more. This paper introduces an improved aprior algorithm so called FP-growth algorithm

    Betul Districts Primary School Performance Prediction Model Using Data Mining

    Get PDF
    As this academic performance is influenced by many factors, it is essential to develop predictive data mining model for students’ performance so as to identify the slow learners and study the influence of the dominant factors on their academic performance. In the present investigation, a survey cum experimental methodology was adopted to generate a database and it was constructed from a primary. While the primary data was collected from the regular students and irregular student the secondary data was gathered from the school in class 3, 4 and 5 a total of 1000 datasets of the 2014 year from five different schools in three different districts of BETUL state Madhya Pradesh were collected. The raw data was preprocessed in terms of filling up missing values, transforming values in one form into another and relevant attribute/ variable selection. As a result, we had 700 student records, which were used for primary school prediction model construction. A set of prediction rules were extracted from primary school prediction model and the efficiency of the generated student prediction model was found. The accuracy of the present model was compared with other model and it has been found to be satisfactory

    Machine Learning Approach for Cancer Entities Association and Classification

    Full text link
    According to the World Health Organization (WHO), cancer is the second leading cause of death globally. Scientific research on different types of cancers grows at an ever-increasing rate, publishing large volumes of research articles every year. The insight information and the knowledge of the drug, diagnostics, risk, symptoms, treatments, etc., related to genes are significant factors that help explore and advance the cancer research progression. Manual screening of such a large volume of articles is very laborious and time-consuming to formulate any hypothesis. The study uses the two most non-trivial NLP, Natural Language Processing functions, Entity Recognition, and text classification to discover knowledge from biomedical literature. Named Entity Recognition (NER) recognizes and extracts the predefined entities related to cancer from unstructured text with the support of a user-friendly interface and built-in dictionaries. Text classification helps to explore the insights into the text and simplifies data categorization, querying, and article screening. Machine learning classifiers are also used to build the classification model and Structured Query Languages (SQL) is used to identify the hidden relations that may lead to significant predictions

    An Efficient Itemset Representation for Mining Frequent Patterns in Transactional Databases

    Get PDF
    In this paper we propose very efficient itemset representation for frequent itemset mining from transactional databases. The combinatorial number system is used to uniquely represent frequent k-itemset with just one integer value, for any k ≥ 2. Experiments show that memory requirements can be reduced up to 300 %, especially for very low minimal support thresholds. Further, we exploit combinatorial number schema for representing candidate itemsets during iterative join-based approach. The novel algorithm maintains one-dimensional array rank, starting from k = 2nd iteration. At the index r of the array, the proposed algorithm stores unique integer representation of the r-th candidate in lexicographic order. The rank array provides joining of two candidate k-itemsets to be O(1) instead of O(k) operation. Additionally, the rank array provides faster determination which candidates are contained in the given transaction during the support count and test phase. Finally, we believe that itemset ranking by combinatorial number system can be effectively integrated into pattern-growth algorithms, that are state-of-the-art in frequent itemset mining, and additionally improve their performances

    Machine-Learning Techniques for Customer Recommendations

    Get PDF
    Today, there is a demand for automated procedures for predicting future customers using recommendation engines in the customer relationship management market. There are already functions commonly available for finding “twins”, i.e., possible customers that are similar to existing customers, and for browsing through lists of customers partitioned into categories such as locations or lines of business. Current recommendation engines are typically built using machine-learning algorithms. Thus, it is of interest to determine which machine-learning algorithms that are best suited for making a recommendation engine aimed at customer prediction possible. This thesis investigates the prerequisites for determining suitability, and perform an evaluation of various off-the-shelf machinelearning algorithms. The supervised learner models are shown to have promise, as a direct method of identifying new potential customers. A classifier algorithm can be trained using a set that contains existing customers, and be applied on a large set of various companies, to classify suitable prospects, provided there is a sufficiently large number of existing customers
    corecore