16,087 research outputs found

    Study of LANDSAT-D thematic mapper performance as applied to hydrocarbon exploration

    Get PDF
    Improved delineation of known oil and gas fields in southern Ontario and a spectacularly high amount of structural information on the Owl Creek, Wyoming scene were obtained from analysis of TM data. The use of hue, saturation, and value image processing techniques on a Death Valley, California scene permitted direct comparison of TM processed imagery with existing 1:250,000 scale geological maps of the area and revealed small outcrops of Tertiary volcanic material overlying Paleozoic sections. Analysis of TM data over Lawton, Oklahoma suggests that the reducing chemical environment associated with hydrocarbon seepage change ferric iron to soluble ferrous iron, allowing it to be leached. Results of the band selection algorithm show a suprising consistency, with the 1,4,5 combination selected as optimal in most cases

    CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

    Full text link
    Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202

    Applying Thompson Sampling to Online Hypothesis Testing

    Get PDF
    Online hypothesis testing occurs in many branches of science. Most notably it is of use when there are too many hypotheses to test with traditional multiple hypothesis testing or when the hypotheses are created one-by-one. When testing multiple hypotheses one-by-one, the order in which the hypotheses are tested often has great influence to the power of the procedure. In this thesis we investigate the applicability of reinforcement learning tools to solve the exploration – exploitation problem that often arises in online hypothesis testing. We show that a common reinforcement learning tool, Thompson sampling, can be used to gain a modest amount of power using a method for online hypothesis testing called alpha-investing. Finally we examine the size of this effect using both synthetic data and a practical case involving simulated data studying urban pollution. We found that, by choosing the order of tested hypothesis with Thompson sampling, the power of alpha investing is improved. The level of improvement depends on the assumptions that the experimenter is willing to make and their validity. In a practical situation the presented procedure rejected up to 6.8 percentage points more hypotheses than testing the hypotheses in a random order

    Implications of information from LANDSAT-4 for private industry

    Get PDF
    The broader spectral coverage and higher resolution of LANDSAT-4 Thematic Mapper (TM) data open the door for identification from space of spectral phenomena associated with mineralization and microseepage of hydrocarbon. Digitally enhanced image products generated from TM data allow the mapping of many major and minor structural features that mark or influence emplacement of mineralization and accumulation of hydrocarbons. These improvements in capabilities over multispectral scanner data should accelerate the acceptance and integration of satellite data as a routinely used exploration tool that allows rapid examination of large areas in considerable detail. Imagery of Southern Ontario, Canada as well as of Cement, Oklahoma and Death Valley, California is discussed
    • …
    corecore