5,020 research outputs found

    Highly Relevant Routing Recommendation Systems for Handling Few Data Using MDL Principle and Embedded Relevance Boosting Factors

    Get PDF
    A route recommendation system can provide better recommendation if it also takes collected user reviews into account, e.g. places that generally get positive reviews may be preferred. However, to classify sentiment, many classification algorithms existing today suffer in handling small data items such as short written reviews. In this paper we propose a model for a strongly relevant route recommendation system that is based on an MDL-based (Minimum Description Length) sentiment classification and show that such a system is capable of handling small data items (short user reviews). Another highlight of the model is the inclusion of a set of boosting factors in the relevance calculation to improve the relevance in any recommendation system that implements the model.Comment: ACM SIGIR 2018 Workshop on Learning from Limited or Noisy Data for Information Retrieval (LND4IR'18), July 12, 2018, Ann Arbor, Michigan, USA, 8 pages, 9 figure

    Computing Multi-Relational Sufficient Statistics for Large Databases

    Full text link
    Databases contain information about which relationships do and do not hold among entities. To make this information accessible for statistical analysis requires computing sufficient statistics that combine information from different database tables. Such statistics may involve any number of {\em positive and negative} relationships. With a naive enumeration approach, computing sufficient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the efficient implementation of this M\"obius virtual join operation. The M\"obius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evaluation with seven benchmark datasets showed that information about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning.Comment: 11pages, 8 figures, 8 tables, CIKM'14,November 3--7, 2014, Shanghai, Chin

    Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems

    Get PDF
    Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional databases. Apriori is a classical frequent itemset mining algorithm, which employs iterative passes over database combining with generation of candidate itemsets based on frequent itemsets found at the previous iteration, and pruning of clearly infrequent itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of Apriori, which tries to reduce the number of passes made over a transactional database while keeping the number of itemsets counted in a pass relatively low. In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi many-core system for the case when the transactional database fits in main memory. Intel Xeon Phi provides a large number of small compute cores with vector processing units. The paper presents a parallel implementation of DIC based on OpenMP technology and thread-level parallelism. We exploit the bit-based internal layout for transactions and itemsets. This technique reduces the memory space for storing the transactional database, simplifies the support count via logical bitwise operation, and allows for vectorization of such a step. Experimental evaluation on the platforms of the Intel Xeon CPU and the Intel Xeon Phi coprocessor with large synthetic and real databases showed good performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information Technology (http://cit.fer.hr

    A STUDY ON EFFICIENT DATA MINING APPROACH ON COMPRESSED TRANSACTION

    Get PDF
    Data mining can be viewed as a result of the natural evolution of information technology. The spread of computing has led to an explosion in the volume of data to be stored on hard disks and sent over the Internet. This growth has led to a need for data compression, that is, the ability to reduce the amount of storage or Internet bandwidth required to handle the data. This paper analysis the various data mining approaches which is used to compress the original database into a smaller one and perform the data mining process for compressed transaction such as M2TQT,PINCER-SEARCH algorithm, APRIOR
    • ā€¦
    corecore