17,629 research outputs found

    Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing

    Full text link

    Randomization does not help much, comparability does

    Full text link
    Following Fisher, it is widely believed that randomization "relieves the experimenter from the anxiety of considering innumerable causes by which the data may be disturbed." In particular, it is said to control for known and unknown nuisance factors that may considerably challenge the validity of a result. Looking for quantitative advice, we study a number of straightforward, mathematically simple models. However, they all demonstrate that the optimism with respect to randomization is wishful thinking rather than based on fact. In small to medium-sized samples, random allocation of units to treatments typically yields a considerable imbalance between the groups, i.e., confounding due to randomization is the rule rather than the exception. In the second part of this contribution, we extend the reasoning to a number of traditional arguments for and against randomization. This discussion is rather non-technical, and at times even "foundational" (Frequentist vs. Bayesian). However, its result turns out to be quite similar. While randomization's contribution remains questionable, comparability contributes much to a compelling conclusion. Summing up, classical experimentation based on sound background theory and the systematic construction of exchangeable groups seems to be advisable

    Identifying hidden contexts

    Get PDF
    In this study we investigate how to identify hidden contexts from the data in classification tasks. Contexts are artifacts in the data, which do not predict the class label directly. For instance, in speech recognition task speakers might have different accents, which do not directly discriminate between the spoken words. Identifying hidden contexts is considered as data preprocessing task, which can help to build more accurate classifiers, tailored for particular contexts and give an insight into the data structure. We present three techniques to identify hidden contexts, which hide class label information from the input data and partition it using clustering techniques. We form a collection of performance measures to ensure that the resulting contexts are valid. We evaluate the performance of the proposed techniques on thirty real datasets. We present a case study illustrating how the identified contexts can be used to build specialized more accurate classifiers

    An MRI-Derived Definition of MCI-to-AD Conversion for Long-Term, Automati c Prognosis of MCI Patients

    Get PDF
    Alzheimer's disease (AD) and mild cognitive impairment (MCI), continue to be widely studied. While there is no consensus on whether MCIs actually "convert" to AD, the more important question is not whether MCIs convert, but what is the best such definition. We focus on automatic prognostication, nominally using only a baseline image brain scan, of whether an MCI individual will convert to AD within a multi-year period following the initial clinical visit. This is in fact not a traditional supervised learning problem since, in ADNI, there are no definitive labeled examples of MCI conversion. Prior works have defined MCI subclasses based on whether or not clinical/cognitive scores such as CDR significantly change from baseline. There are concerns with these definitions, however, since e.g. most MCIs (and ADs) do not change from a baseline CDR=0.5, even while physiological changes may be occurring. These works ignore rich phenotypical information in an MCI patient's brain scan and labeled AD and Control examples, in defining conversion. We propose an innovative conversion definition, wherein an MCI patient is declared to be a converter if any of the patient's brain scans (at follow-up visits) are classified "AD" by an (accurately-designed) Control-AD classifier. This novel definition bootstraps the design of a second classifier, specifically trained to predict whether or not MCIs will convert. This second classifier thus predicts whether an AD-Control classifier will predict that a patient has AD. Our results demonstrate this new definition leads not only to much higher prognostic accuracy than by-CDR conversion, but also to subpopulations much more consistent with known AD brain region biomarkers. We also identify key prognostic region biomarkers, essential for accurately discriminating the converter and nonconverter groups

    Data analytical stability of measuring brain activation in fMRI studies

    Get PDF

    Population structure and adaptation in fishes: Insights from clupeid and salmonid species

    Get PDF
    • …
    corecore