3,474 research outputs found

    Confound-leakage: confound removal in machine learning leads to leakage

    Get PDF
    BACKGROUND: Machine learning (ML) approaches are a crucial component of modern data analysis in many fields, including epidemiology and medicine. Nonlinear ML methods often achieve accurate predictions, for instance, in personalized medicine, as they are capable of modeling complex relationships between features and the target. Problematically, ML models and their predictions can be biased by confounding information present in the features. To remove this spurious signal, researchers often employ featurewise linear confound regression (CR). While this is considered a standard approach for dealing with confounding, possible pitfalls of using CR in ML pipelines are not fully understood. RESULTS: We provide new evidence that, contrary to general expectations, linear confound regression can increase the risk of confounding when combined with nonlinear ML approaches. Using a simple framework that uses the target as a confound, we show that information leaked via CR can increase null or moderate effects to near-perfect prediction. By shuffling the features, we provide evidence that this increase is indeed due to confound-leakage and not due to revealing of information. We then demonstrate the danger of confound-leakage in a real-world clinical application where the accuracy of predicting attention-deficit/hyperactivity disorder is overestimated using speech-derived features when using depression as a confound. CONCLUSIONS: Mishandling or even amplifying confounding effects when building ML models due to confound-leakage, as shown, can lead to untrustworthy, biased, and unfair predictions. Our expose of the confound-leakage pitfall and provided guidelines for dealing with it can help create more robust and trustworthy ML models

    A Profile Likelihood Analysis of the Constrained MSSM with Genetic Algorithms

    Full text link
    The Constrained Minimal Supersymmetric Standard Model (CMSSM) is one of the simplest and most widely-studied supersymmetric extensions to the standard model of particle physics. Nevertheless, current data do not sufficiently constrain the model parameters in a way completely independent of priors, statistical measures and scanning techniques. We present a new technique for scanning supersymmetric parameter spaces, optimised for frequentist profile likelihood analyses and based on Genetic Algorithms. We apply this technique to the CMSSM, taking into account existing collider and cosmological data in our global fit. We compare our method to the MultiNest algorithm, an efficient Bayesian technique, paying particular attention to the best-fit points and implications for particle masses at the LHC and dark matter searches. Our global best-fit point lies in the focus point region. We find many high-likelihood points in both the stau co-annihilation and focus point regions, including a previously neglected section of the co-annihilation region at large m_0. We show that there are many high-likelihood points in the CMSSM parameter space commonly missed by existing scanning techniques, especially at high masses. This has a significant influence on the derived confidence regions for parameters and observables, and can dramatically change the entire statistical inference of such scans.Comment: 47 pages, 8 figures; Fig. 8, Table 7 and more discussions added to Sec. 3.4.2 in response to referee's comments; accepted for publication in JHE

    Identification of Colletotrichum species associated with anthracnose disease of coffee in Vietnam

    Get PDF
    Colletotrichum gloeosporioides, C. acutatum, C. capsici and C. boninense associated with anthracnose disease on coffee (Coffea spp.) in Vietnam were identified based on morphology and DNA analysis. Phylogenetic analysis of DNA sequences from the internal transcribed spacer region of nuclear rDNA and a portion of mitochondrial small subunit rRNA were concordant and allowed good separation of the taxa. We found several Colletotrichum isolates of unknown species and their taxonomic position remains unresolved. The majority of Vietnamese isolates belonged to C. gloeosporioides and they grouped together with the coffee berry disease (CBD) fungus, C. kahawae. However, C. kahawae could be distinguished from the Vietnamese C. gloeosporioides isolates based on ammonium tartrate utilization, growth rate and pathogenictity. C. gloeosporioides isolates were more pathogenic on detached green berries than isolates of the other species, i.e. C. acutatum, C capsici and C. boninense. Some of the C. gloeosporioides isolates produced slightly sunken lesion on green berries resembling CBD symptoms but it did not destroy the bean. We did not find any evidence of the presence of C. kahawae in Vietnam

    Differences in efficacy of monepantel, derquantel and abamectin against multi-resistant nematodes of sheep

    Get PDF
    Drug resistance has become a global phenomenon in gastrointestinal nematodes of sheep, particularly resistance to macrocyclic lactones. New anthelmintics are urgently needed for both the control of infections with multi-resistant nematodes in areas where classical anthelmintics are no longer effective, and the prevention of the spread of resistance in areas where the problem is not as severe. Recently, two new active ingredients became commercially available for the treatment of nematode infections in sheep, monepantel (ZolvixŸ) and derquantel, the latter used only in a formulated combination with the macrocyclic lactone, abamectin (StartectŸ). In order to assess the potential of the new actives for the control and prevention of spread of anthelmintic resistance, two characterized multi-resistant field isolates from Australia were used in a GLP (good laboratory practice) conducted efficacy study in sheep. Eight infected sheep in each group were treated orally according to the product labels with 2.5 mg/kg body weight monepantel, 0.2 mg/kg abamectin, or with the combination of 2.0 mg/kg derquantel and 0.2 mg/kg abamectin. The results demonstrate that monepantel was fully effective against multi-resistant species, Trichostrongylus colubriformis and Haemonchus contortus (99.9%). In contrast, the combination of derquantel and abamectin was effective against T. colubriformis (99.9%), but was not effective against larval stages of the barber's pole worm H. contortus (18.3%)

    Quality Of Antenatal Care In Rural Southern Tanzania: A Reality Check.

    Get PDF
    Counselling on the danger signs of unpredictable obstetric complications and the appropriate management of such complications are crucial in reducing maternal mortality. The objectives of this study were to identify gaps in the provision of ANC services and knowledge of danger signs as well as the quality of care women receive in case of complications. The study took place in the Rufiji District of Tanzania in 2008 and was conducted in seven health facilities. The study used (1) observations from 63 antenatal care (ANC) sessions evaluated with an ANC checklist, (2) self-assessments of 11 Health workers, (3) interviews with 28 pregnant women and (4) follow-up of 12 women hospitalized for pregnancy-related conditions.Blood pressure measurements and abdominal examinations were common during ANC visits while urine testing for albumin or sugar or haemoglobin levels was rare which was often explained as due to a lack of supplies. The reasons for measuring blood pressure or abdominal examinations were usually not explained to the women. Only 15/28 (54%) women were able to mention at least one obstetric danger sign requiring medical attention. The outcomes of ten complicated cases were five stillbirths and three maternal complications. There was a considerable delay in first contact with a health professional or the start of timely interventions including checking vital signs, using a partograph, and detailed record keeping. Linking danger signs to clinical and laboratory examination results during ANC with the appropriate follow up and avoiding delays in emergency obstetric care are crucial to the delivery of coordinated, effective care interventions

    Towards a resolution of some outstanding issues in transitive research: an empirical test on middle childhood

    Get PDF
    Transitive Inference (deduce B > D from B > C and C > D) can help us to understand other areas of sociocognitive development. Across three experiments, learning, memory, and the validity of two transitive paradigms were investigated. In Experiment 1 (N = 121), 7-year-olds completed a three-term nontraining task or a five-term task requiring extensive-training. Performance was superior on the three-term task. Experiment 2 presented 5–10-year-olds with a new five-term task, increasing learning opportunities without lengthening training (N = 71). Inferences improved, suggesting children can learn five-term series rapidly. Regarding memory, the minor (CD) premise was the best predictor of BD-inferential performance in both task-types. However, tasks exhibited different profiles according to associations between the major (BC) premise and BD inference, correlations between the premises, and the role of age. Experiment 3 (N = 227) helped rule out the possible objection that the above findings simply stemmed from three-term tasks with real objects being easier to solve than computer-tasks. It also confirmed that, unlike for five-term task (Experiments 1 & 2), inferences on three-term tasks improve with age, whether the age range is wide (Experiment 3) or narrow (Experiment 2). I conclude that the tasks indexed different routes within a dual-process conception of transitive reasoning: The five-term tasks indexes Type 1 (associative) processing, and the three-term task indexes Type 2 (analytic) processing. As well as demonstrating that both tasks are perfectly valid, these findings open up opportunities to use transitive tasks for educability, to investigate the role of transitivity in other domains of reasoning, and potentially to benefit the lived experiences of persons with developmental issues

    Ge quantum dot arrays grown by ultrahigh vacuum molecular beam epitaxy on the Si(001) surface: nucleation, morphology and CMOS compatibility

    Get PDF
    Issues of morphology, nucleation and growth of Ge cluster arrays deposited by ultrahigh vacuum molecular beam epitaxy on the Si(001) surface are considered. Difference in nucleation of quantum dots during Ge deposition at low (<600 deg C) and high (>600 deg. C) temperatures is studied by high resolution scanning tunneling microscopy. The atomic models of growth of both species of Ge huts---pyramids and wedges---are proposed. The growth cycle of Ge QD arrays at low temperatures is explored. A problem of lowering of the array formation temperature is discussed with the focus on CMOS compatibility of the entire process; a special attention is paid upon approaches to reduction of treatment temperature during the Si(001) surface pre-growth cleaning, which is at once a key and the highest-temperature phase of the Ge/Si(001) quantum dot dense array formation process. The temperature of the Si clean surface preparation, the final high-temperature step of which is, as a rule, carried out directly in the MBE chamber just before the structure deposition, determines the compatibility of formation process of Ge-QD-array based devices with the CMOS manufacturing cycle. Silicon surface hydrogenation at the final stage of its wet chemical etching during the preliminary cleaning is proposed as a possible way of efficient reduction of the Si wafer pre-growth annealing temperature.Comment: 30 pages, 11 figure

    Development of a quality assessment tool for systematic reviews of observational studies (QATSO) of HIV prevalence in men having sex with men and associated risk behaviours

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systematic reviews based on the critical appraisal of observational and analytic studies on HIV prevalence and risk factors for HIV transmission among men having sex with men are very useful for health care decisions and planning. Such appraisal is particularly difficult, however, as the quality assessment tools available for use with observational and analytic studies are poorly established.</p> <p>Methods</p> <p>We reviewed the existing quality assessment tools for systematic reviews of observational studies and developed a concise quality assessment checklist to help standardise decisions regarding the quality of studies, with careful consideration of issues such as external and internal validity.</p> <p>Results</p> <p>A pilot version of the checklist was developed based on epidemiological principles, reviews of study designs, and existing checklists for the assessment of observational studies. The Quality Assessment Tool for Systematic Reviews of Observational Studies (QATSO) Score consists of five items: External validity (1 item), reporting (2 items), bias (1 item) and confounding factors (1 item). Expert opinions were sought and it was tested on manuscripts that fulfil the inclusion criteria of a systematic review. Like all assessment scales, QATSO may oversimplify and generalise information yet it is inclusive, simple and practical to use, and allows comparability between papers.</p> <p>Conclusion</p> <p>A specific tool that allows researchers to appraise and guide study quality of observational studies is developed and can be modified for similar studies in the future.</p

    A framework for using self-organising maps to analyse spatiotemporal patterns, exemplified by analysis of mobile phone usage

    Get PDF
    We suggest a visual analytics framework for the exploration and analysis of spatially and temporally referenced values of numeric attributes. The framework supports two complementary perspectives on spatio-temporal data: as a temporal sequence of spatial distributions of attribute values (called spatial situations) and as a set of spatially referenced time series of attribute values representing local temporal variations. To handle a large amount of data, we use the self-organising map (SOM) method, which groups objects and arranges them according to similarity of relevant data features. We apply the SOM approach to spatial situations and to local temporal variations and obtain two types of SOM outcomes, called space-in-time SOM and time-in-space SOM, respectively. The examination and interpretation of both types of SOM outcomes are supported by appropriate visualisation and interaction techniques. This article describes the use of the framework by an example scenario of data analysis. We also discuss how the framework can be extended from supporting explorative analysis to building predictive models of the spatio-temporal variation of attribute values. We apply our approach to phone call data showing its usefulness in real-world analytic scenarios
    • 

    corecore