11,839 research outputs found

    Data exploration systems for databases

    Get PDF
    Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems

    Infinite games with finite knowledge gaps

    Full text link
    Infinite games where several players seek to coordinate under imperfect information are deemed to be undecidable, unless the information is hierarchically ordered among the players. We identify a class of games for which joint winning strategies can be constructed effectively without restricting the direction of information flow. Instead, our condition requires that the players attain common knowledge about the actual state of the game over and over again along every play. We show that it is decidable whether a given game satisfies the condition, and prove tight complexity bounds for the strategy synthesis problem under ω\omega-regular winning conditions given by parity automata.Comment: 39 pages; 2nd revision; submitted to Information and Computatio

    Introduction to IND and recursive partitioning, version 1.0

    Get PDF
    This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, lists the manual pages for the routines, and instructions on installation

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    DATA MINING: A SEGMENTATION ANALYSIS OF U.S. GROCERY SHOPPERS

    Get PDF
    Consumers make choices about where to shop based on their preferences for a shopping environment and experience as well as the selection of products at a particular store. This study illustrates how retail firms and marketing analysts can utilize data mining techniques to better understand customer profiles and behavior. Among the key areas where data mining can produce new knowledge is the segmentation of customer data bases according to demographics, buying patterns, geographics, attitudes, and other variables. This paper builds profiles of grocery shoppers based on their preferences for 33 retail grocery store characteristics. The data are from a representative, nationwide sample of 900 supermarket shoppers collected in 1999. Six customer profiles are found to exist, including (1) "Time Pressed Meat Eaters", (2) "Back to Nature Shoppers", (3) "Discriminating Leisure Shoppers", (4) "No Nonsense Shoppers", (5) "The One Stop Socialites", and (6) "Middle of the Road Shoppers". Each of the customer profiles is described with respect to the underlying demographics and income. Consumer shopping segments cut across most demographic groups but are somewhat correlated with income. Hierarchical lists of preferences reveal that low price is not among the top five most important store characteristics. Experience and preferences for internet shopping shows that of the 44% who have access to the internet, only 3% had used it to order food.Consumer/Household Economics, Food Consumption/Nutrition/Food Safety,

    Introduction in IND and recursive partitioning

    Get PDF
    This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, and lists the manual pages for the routines and instructions on installation

    Small-scale intraspecific life history variation in herbivorous spider mites (Tetranychus pacificus) is associated with host plant cultivar.

    Get PDF
    Life history variation is a general feature of arthropod systems, but is rarely included in models of field or laboratory data. Most studies assume that local processes occur identically across individuals, ignoring any genetic or phenotypic variation in life history traits. In this study, we tested whether field populations of Pacific spider mites (Tetranychus pacificus) on grapevines (Vitis vinifera) display significant intraspecific life history variation associated with host plant cultivar. To address this question we collected individuals from sympatric vineyard populations where either Zinfandel or Chardonnay were grown. We then conducted a "common garden experiment" of mites on bean plants (Phaseolus lunatus) in the laboratory. Assay populations were sampled non-destructively with digital photography to quantify development times, survival, and reproductive rates. Two classes of models were fit to the data: standard generalized linear mixed models and a time-to-event model, common in survival analysis, that allowed for interval-censored data and hierarchical random effects. We found a significant effect of cultivar on development time in both GLMM and time-to-event analyses, a slight cultivar effect on juvenile survival, and no effect on reproductive rate. There were shorter development times and a trend towards higher juvenile survival in populations from Zinfandel vineyards compared to those from Chardonnay vineyards. Lines of the same species, originating from field populations on different host plant cultivars, expressed different development times and slightly different survival rates when reared on a common host plant in a common environment

    A Preliminary Investigation Of Decision Tree Models For Classification Accuracy Rates And Extracting Interpretable Rules In The Credit Scoring Task: A Case Of The German Data Set

    Get PDF
    For many years lenders have been using traditional statistical techniques such as logistic regression and discriminant analysis to more precisely distinguish between creditworthy customers who are granted loans and non-creditworthy customers who are denied loans. More recently new machine learning techniques such as neural networks, decision trees, and support vector machines have been successfully employed to classify loan applicants into those who are likely to pay a loan off or default upon a loan. Accurate classification is beneficial to lenders in terms of increased financial profits or reduced losses and to loan applicants who can avoid overcommitment. This paper examines a historical data set from consumer loans issued by a German bank to individuals whom the bank considered to be qualified customers. The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper examines and compares the classification accuracy rates of three decision tree techniques as well as analyzes their ability to generate easy to understand rules
    corecore