7,821 research outputs found

    Optimal Prefix Codes with Fewer Distinct Codeword Lengths are Faster to Construct

    Full text link
    A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the input weights. Let nn be the number of weights and kk be the number of distinct codeword lengths as produced by the algorithm for the optimum codes. The running time of our algorithm is O(kβ‹…n)O(k \cdot n). Following our previous work in \cite{be}, no algorithm can possibly construct optimal prefix codes in o(kβ‹…n)o(k \cdot n) time. When the given weights are presorted our algorithm performs O(9kβ‹…log⁑2kn)O(9^k \cdot \log^{2k}{n}) comparisons.Comment: 23 pages, a preliminary version appeared in STACS 200

    X-TREPAN: a multi class regression and adapted extraction of comprehensible decision tree in artificial neural networks

    Full text link
    In this work, the TREPAN algorithm is enhanced and extended for extracting decision trees from neural networks. We empirically evaluated the performance of the algorithm on a set of databases from real world events. This benchmark enhancement was achieved by adapting Single-test TREPAN and C4.5 decision tree induction algorithms to analyze the datasets. The models are then compared with X-TREPAN for comprehensibility and classification accuracy. Furthermore, we validate the experimentations by applying statistical methods. Finally, the modified algorithm is extended to work with multi-class regression problems and the ability to comprehend generalized feed forward networks is achieved.Comment: 17 Pages, 8 Tables, 8 Figures, 6 Equation

    Approximate k-nearest neighbour based spatial clustering using k-d tree

    Full text link
    Different spatial objects that vary in their characteristics, such as molecular biology and geography, are presented in spatial areas. Methods to organize, manage, and maintain those objects in a structured manner are required. Data mining raised different techniques to overcome these requirements. There are many major tasks of data mining, but the mostly used task is clustering. Data set within the same cluster share common features that give each cluster its characteristics. In this paper, an implementation of Approximate kNN-based spatial clustering algorithm using the K-d tree is proposed. The major contribution achieved by this research is the use of the k-d tree data structure for spatial clustering, and comparing its performance to the brute-force approach. The results of the work performed in this paper revealed better performance using the k-d tree, compared to the traditional brute-force approach

    Tree models for difference and change detection in a complex environment

    Full text link
    A new family of tree models is proposed, which we call "differential trees." A differential tree model is constructed from multiple data sets and aims to detect distributional differences between them. The new methodology differs from the existing difference and change detection techniques in its nonparametric nature, model construction from multiple data sets, and applicability to high-dimensional data. Through a detailed study of an arson case in New Zealand, where an individual is known to have been laying vegetation fires within a certain time period, we illustrate how these models can help detect changes in the frequencies of event occurrences and uncover unusual clusters of events in a complex environment.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS548 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On Relationship between Primal-Dual Method of Multipliers and Kalman Filter

    Full text link
    Recently the primal-dual method of multipliers (PDMM), a novel distributed optimization method, was proposed for solving a general class of decomposable convex optimizations over graphic models. In this work, we first study the convergence properties of PDMM for decomposable quadratic optimizations over tree-structured graphs. We show that with proper parameter selection, PDMM converges to its optimal solution in finite number of iterations. We then apply PDMM for the causal estimation problem over a statistical linear state-space model. We show that PDMM and the Kalman filter have the same update expressions, where PDMM can be interpreted as solving a sequence of quadratic optimizations over a growing chain graph.Comment: 11 page

    Mining Education Data to Predict Student's Retention: A comparative Study

    Full text link
    The main objective of higher education is to provide quality education to students. One way to achieve highest level of quality in higher education system is by discovering knowledge for prediction regarding enrolment of students in a course. This paper presents a data mining project to generate predictive models for student retention management. Given new records of incoming students, these predictive models can produce short accurate prediction lists identifying students who tend to need the support from the student retention program most. This paper examines the quality of the predictive models generated by the machine learning algorithms. The results show that some of the machines learning algorithms are able to establish effective predictive models from the existing student retention data.Comment: 5 pages. arXiv admin note: substantial text overlap with arXiv:1202.481

    Forest Floor Visualizations of Random Forests

    Full text link
    We propose a novel methodology, forest floor, to visualize and interpret random forest (RF) models. RF is a popular and useful tool for non-linear multi-variate classification and regression, which yields a good trade-off between robustness (low variance) and adaptiveness (low bias). Direct interpretation of a RF model is difficult, as the explicit ensemble model of hundreds of deep trees is complex. Nonetheless, it is possible to visualize a RF model fit by its mapping from feature space to prediction space. Hereby the user is first presented with the overall geometrical shape of the model structure, and when needed one can zoom in on local details. Dimensional reduction by projection is used to visualize high dimensional shapes. The traditional method to visualize RF model structure, partial dependence plots, achieve this by averaging multiple parallel projections. We suggest to first use feature contributions, a method to decompose trees by splitting features, and then subsequently perform projections. The advantages of forest floor over partial dependence plots is that interactions are not masked by averaging. As a consequence, it is possible to locate interactions, which are not visualized in a given projection. Furthermore, we introduce: a goodness-of-visualization measure, use of colour gradients to identify interactions and an out-of-bag cross validated variant of feature contributions.Comment: 25 pages, 12 figures, supplementary materials. v2->v3: minor proofing, moderated comments on ICE-plots, replaced \psi-operator with the subset named H in equation 13 and 14 to improve simplicit

    Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification

    Full text link
    Now-a-days the amount of data stored in educational database increasing rapidly. These databases contain hidden information for improvement of students' performance. Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, Bayesian network etc can be applied on the educational data for predicting the student's performance in examination. This prediction will help to identify the weak students and help them to score better marks. The C4.5, ID3 and CART decision tree algorithms are applied on engineering student's data to predict their performance in the final exam. The outcome of the decision tree predicted the number of students who are likely to pass, fail or promoted to next year. The results provide steps to improve the performance of the students who were predicted to fail or promoted. After the declaration of the results in the final examination the marks obtained by the students are fed into the system and the results were analyzed for the next session. The comparative analysis of the results states that the prediction has helped the weaker students to improve and brought out betterment in the result.Comment: 6 pages, 3 Figures. arXiv admin note: substantial text overlap with arXiv:1202.481

    Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA

    Full text link
    An automated technique has recently been proposed to transfer learning in the hierarchical Bayesian optimization algorithm (hBOA) based on distance-based statistics. The technique enables practitioners to improve hBOA efficiency by collecting statistics from probabilistic models obtained in previous hBOA runs and using the obtained statistics to bias future hBOA runs on similar problems. The purpose of this paper is threefold: (1) test the technique on several classes of NP-complete problems, including MAXSAT, spin glasses and minimum vertex cover; (2) demonstrate that the technique is effective even when previous runs were done on problems of different size; (3) provide empirical evidence that combining transfer learning with other efficiency enhancement techniques can often yield nearly multiplicative speedups.Comment: Accepted at Parallel Problem Solving from Nature (PPSN XII), 10 pages. arXiv admin note: substantial text overlap with arXiv:1201.224

    Steering plasmodium with light: Dynamical programming of Physarum machine

    Full text link
    A plasmodium of Physarum polycephalum is a very large cell visible by unaided eye. The plasmodium is capable for distributed sensing, parallel information processing, and decentralized optimization. It is an ideal substrate for future and emerging bio-computing devices. We study space-time dynamics of plasmodium reactiom to localised illumination, and provide analogies between propagating plasmodium and travelling wave-fragments in excitable media. We show how plasmodium-based computing devices can be precisely controlled and shaped by planar domains of illumination.Comment: Accepted for publication in New Mathematics and Natural Computation Journal (April, 2009
    • …
    corecore