11 research outputs found

    COMPUTER PROGRAMMING AND AGRICULTURAL ECONOMISTS - BRIDGING THE GAP

    Get PDF
    Research and Development/Tech Change/Emerging Technologies,

    Data Quality in Very Large, Multiple-Source, Secondary Datasets for Data Mining Applications

    Get PDF
    The data mining research community is increasingly addressing data quality issues, including problems of dirty data. Hand, Blunt, Kelly and Adams (2000) have identified high-level and low-level quality issues in data mining. Kim, Choi, Hong, Kim and Lee (2003) have compiled a useful, complete taxonomy of dirty data that provides a starting point for research in effective techniques and fast algorithms for preprocessing data, and ways to approach the problems of dirty data. In this study we create a classification scheme for data errors by transforming their general taxonomy to apply to very large multiple-source secondary datasets. These types of datasets are increasingly being compiled by organizations for use in their data mining applications. We contribute this classification scheme to the body of research addressing quality issues in the very large multiple-source secondary datasets that are being built through today’s global organizations’ massive data collection from the Internet

    Oklahoma Agricultural Experiment Station, Bulletin no. 777, October 1985: Efficiencies in central coordination of fluid milk supplies through cooperative mergers - A case study of the Southern Region of Associated Milk Producers, Inc.

    No full text
    The Oklahoma Agricultural Experiment Station periodically issues revisions to its publications. The most current edition is made available. For access to an earlier edition, if available for this title, please contact the Oklahoma State University Library Archives by email at [email protected] or by phone at 405-744-6311

    A Decision Support Software on Bidding for Job Interviews in College Placement Offices

    No full text
    Many university placement offices employ a bidding system to allocate on-campus recruiter interview slots to students. Typically, a student is given (say) 700 points each week to bid on the firms visiting that week. Interview slots for each firm are assigned beginning with the highest bidder until all slots are filled. This paper describes the mathematical modeling behind a decision support system for helping students to bid in such a system. It has three components. The first component elicits a student's utilities of getting an interview with the various firms. The second component estimates the probability of getting an interview with a particular firm for a given bid amount. The final component considers our bidding problem as the maximization of a student's expected utility, which can be formulated as a nonlinear integer programming (IP) problem. It is shown that this IP problem can be transformed into a number of nonlinear programming problems without integer requirements, which can then be solved very rapidly to give on-line bidding recommendations to a large number of students.decision support system, bidding, mixed integer programming

    A Scalable Classification Algorithm for Very Large Datasets

    No full text
    Today's organisations are collecting and storing massive amounts of data from their customer transactions and e-commerce/e-business applications. Many classification algorithms are not scalable to work effectively and efficiently with these very large datasets. This study constructs a new scalable classification algorithm (referred to in this manuscript as Iterative Refinement Algorithm, or IRA in short) that builds domain knowledge from very large datasets using an iterative inductive learning mechanism. Unlike existing algorithms that build the complete domain knowledge from a dataset all at once, IRA builds the initial domain knowledge from a subset of the available data and then iteratively improves, sharpens and polishes it using the chucks from the remaining data. Performance testing of IRA on two datasets (one with approximately five million records for a binary classification problem and another with approximately 600 K records for a seven-class classification problem) resulted in more accurate domain knowledge as compared to other prediction methods including logistic regression, discriminant analysis, neural networks, C5, CART and CHAID. Unlike other classification algorithms whose performance and accuracy deteriorate as data size increases, the efficacy of IRA improves as datasets become significantly larger.Massive datasets, data mining, rule induction, classification, knowledge bases, refinement techniques
    corecore