Search CORE

11,839 research outputs found

Data exploration systems for databases

Author: Greene Richard J.
Hield Christopher
Publication venue
Publication date
Field of study

Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems

NASA Technical Reports Server

Infinite games with finite knowledge gaps

Author: Berwanger Dietmar
Mathew Anup Basil
Publication venue
Publication date: 28/07/2015
Field of study

Infinite games where several players seek to coordinate under imperfect information are deemed to be undecidable, unless the information is hierarchically ordered among the players. We identify a class of games for which joint winning strategies can be constructed effectively without restricting the direction of information flow. Instead, our condition requires that the players attain common knowledge about the actual state of the game over and over again along every play. We show that it is decidable whether a given game satisfies the condition, and prove tight complexity bounds for the strategy synthesis problem under

\omega

-regular winning conditions given by parity automata.Comment: 39 pages; 2nd revision; submitted to Information and Computatio

arXiv.org e-Print Archive

CiteSeerX

Introduction to IND and recursive partitioning, version 1.0

Author: Buntine Wray
Caruana Rich
Publication venue
Publication date
Field of study

This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, lists the manual pages for the routines, and instructions on installation

NASA Technical Reports Server

Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

Author
Publication venue: 'Sabanci University Information Center'
Publication date: 01/01/2011
Field of study

Sabanci University Research Database

DATA MINING: A SEGMENTATION ANALYSIS OF U.S. GROCERY SHOPPERS

Author: Katsaras Nikolaos
Kinsey Jean D.
Senauer Benjamin
Wolfson Paul J.
Publication venue
Publication date
Field of study

Consumers make choices about where to shop based on their preferences for a shopping environment and experience as well as the selection of products at a particular store. This study illustrates how retail firms and marketing analysts can utilize data mining techniques to better understand customer profiles and behavior. Among the key areas where data mining can produce new knowledge is the segmentation of customer data bases according to demographics, buying patterns, geographics, attitudes, and other variables. This paper builds profiles of grocery shoppers based on their preferences for 33 retail grocery store characteristics. The data are from a representative, nationwide sample of 900 supermarket shoppers collected in 1999. Six customer profiles are found to exist, including (1) "Time Pressed Meat Eaters", (2) "Back to Nature Shoppers", (3) "Discriminating Leisure Shoppers", (4) "No Nonsense Shoppers", (5) "The One Stop Socialites", and (6) "Middle of the Road Shoppers". Each of the customer profiles is described with respect to the underlying demographics and income. Consumer shopping segments cut across most demographic groups but are somewhat correlated with income. Hierarchical lists of preferences reveal that low price is not among the top five most important store characteristics. Experience and preferences for internet shopping shows that of the 44% who have access to the internet, only 3% had used it to order food.Consumer/Household Economics, Food Consumption/Nutrition/Food Safety,

Research Papers in Economics

Introduction in IND and recursive partitioning

Author: Buntine Wray
Caruana Rich
Publication venue
Publication date
Field of study

This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, and lists the manual pages for the routines and instructions on installation

NASA Technical Reports Server

Small-scale intraspecific life history variation in herbivorous spider mites (Tetranychus pacificus) is associated with host plant cultivar.

Author: de Valpine Perry
Mills Nicholas J
Scranton Katherine
Stavrinides Menelaos
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

Life history variation is a general feature of arthropod systems, but is rarely included in models of field or laboratory data. Most studies assume that local processes occur identically across individuals, ignoring any genetic or phenotypic variation in life history traits. In this study, we tested whether field populations of Pacific spider mites (Tetranychus pacificus) on grapevines (Vitis vinifera) display significant intraspecific life history variation associated with host plant cultivar. To address this question we collected individuals from sympatric vineyard populations where either Zinfandel or Chardonnay were grown. We then conducted a "common garden experiment" of mites on bean plants (Phaseolus lunatus) in the laboratory. Assay populations were sampled non-destructively with digital photography to quantify development times, survival, and reproductive rates. Two classes of models were fit to the data: standard generalized linear mixed models and a time-to-event model, common in survival analysis, that allowed for interval-censored data and hierarchical random effects. We found a significant effect of cultivar on development time in both GLMM and time-to-event analyses, a slight cultivar effect on juvenile survival, and no effect on reproductive rate. There were shorter development times and a trend towards higher juvenile survival in populations from Zinfandel vineyards compared to those from Chardonnay vineyards. Lines of the same species, originating from field populations on different host plant cultivars, expressed different development times and slightly different survival rates when reared on a common host plant in a common environment

Directory of Open Access Journals

Ktisis

PubMed Central

eScholarship - University of California

A Preliminary Investigation Of Decision Tree Models For Classification Accuracy Rates And Extracting Interpretable Rules In The Credit Scoring Task: A Case Of The German Data Set

Author: Lam Peng C.
Zurada Jozef
Publication venue: 'Clute Institute'
Publication date: 01/07/2008
Field of study

For many years lenders have been using traditional statistical techniques such as logistic regression and discriminant analysis to more precisely distinguish between creditworthy customers who are granted loans and non-creditworthy customers who are denied loans. More recently new machine learning techniques such as neural networks, decision trees, and support vector machines have been successfully employed to classify loan applicants into those who are likely to pay a loan off or default upon a loan. Accurate classification is beneficial to lenders in terms of increased financial profits or reduced losses and to loan applicants who can avoid overcommitment. This paper examines a historical data set from consumer loans issued by a German bank to individuals whom the bank considered to be qualified customers. The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper examines and compares the classification accuracy rates of three decision tree techniques as well as analyzes their ability to generate easy to understand rules

Clute Institute: Journals