36,162 research outputs found

    Using machine learning techniques to automate sky survey catalog generation

    Get PDF
    We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data

    RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS

    Get PDF
    In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions

    Cost-Sensitive Decision Tree with Multiple Resource Constraints

    Get PDF
    Resource constraints are commonly found in classification tasks. For example, there could be a budget limit on implementation and a deadline for finishing the classification task. Applying the top-down approach for tree induction in this situation may have significant drawbacks. In particular, it is difficult, especially in an early stage of tree induction, to assess an attribute’s contribution to improving the total implementation cost and its impact on attribute selection in later stages because of the deadline constraint. To address this problem, we propose an innovative algorithm, namely, the Cost-Sensitive Associative Tree (CAT) algorithm. Essentially, the algorithm first extracts and retains association classification rules from the training data which satisfy resource constraints, and then uses the rules to construct the final decision tree. The approach has advantages over the traditional top-down approach, first because only feasible classification rules are considered in the tree induction and, second, because their costs and resource use are known. In contrast, in the top-down approach, the information is not available for selecting splitting attributes. The experiment results show that the CAT algorithm significantly outperforms the top-down approach and adapts very well to available resources.Cost-sensitive learning, mining methods and algorithms, decision trees

    Computational Complexity for Physicists

    Full text link
    These lecture notes are an informal introduction to the theory of computational complexity and its links to quantum computing and statistical mechanics.Comment: references updated, reprint available from http://itp.nat.uni-magdeburg.de/~mertens/papers/complexity.shtm

    Merged Tree-CAT: A fast method for building precise computerized adaptive tests based on decision trees

    Get PDF
    Over the last few years, there has been an increasing interest in the creation of Computerized Adaptive Tests (CATs) based on Decision Trees (DTs). Among the available methods, the Tree-CAT method has been able to demonstrate a mathematical equivalence between both techniques. However, this method has the inconvenience of requiring a high performance cluster while taking a few days to perform its computations. This article presents the Merged Tree-CAT method, which extends the Tree-CAT technique, to create CATs based on DTs in just a few seconds in a personal computer. In order to do so, the Merged Tree-CAT method controls the growth of the tree by merging those branches in which both the distribution and the estimation of the latent level are similar. The performed experiments show that the proposed method obtains estimations of the latent level which are comparable to the obtained by the state-of-the-art techniques, while drastically reducing the computational time.Numerical experiments were conducted in Uranus, a supercomputer cluster located at Universidad Carlos III de Madrid and jointly funded by EU-FEDER funds and by the Spanish Government via the National Projects nos. UNC313-4E-2361, ENE2009-12213- C03-03, ENE2012-33219, ENE2012-31753 and ENE2015-68265-P. This article was also funded by the Spanish National Project no. RTI2018-101857-B-I00

    Combining decision trees and stochastic curtailment for assessment length reduction of test batteries used for classification.

    Get PDF
    For classification problems in psychology (e.g., clinical diagnosis), batteries of tests are often administered. However, not every test or item may be necessary for accurate classification. In the current article, a combination of classification and regression trees (CART) and stochastic curtailment (SC) is introduced to reduce assessment length of questionnaire batteries. First, the CART algorithm provides relevant subscales and cutoffs needed for accurate classification, in the form of a decision tree. Second, for every subscale and cutoff appearing in the decision tree, SC reduces the number of items needed for accurate classification. This procedure is illustrated by post hoc simulation on a data set of 3,579 patients, to whom the Mood and Anxiety Symptoms Questionnaire (MASQ) was administered. Subscales of the MASQ are used for predicting diagnoses of depression. Results show that CART-SC provided an assessment length reduction of 56%, without loss of accuracy, compared with the more traditional prediction method of performing linear discriminant analysis on subscale scores. CART-SC appears to be an efficient and accurate algorithm for shortening test batteries. © The Author(s) 2013
    corecore