15 research outputs found

    Semimartingale decomposition of convex functions of continuous semimartingales by Brownian perturbation

    Full text link
    In this note we prove that the local martingale part of a convex function f of a d-dimensional semimartingale X = M + A can be written in terms of an It^o stochastic integral \int H(X)dM, where H(x) is some particular measurable choice of subgradient of f at x, and M is the martingale part of X. This result was first proved by Bouleau in [2]. Here we present a new treatment of the problem. We first prove the result for X' = X + eB, e > 0, where B is a standard Brownian motion, and then pass to the limit as e tends to 0, using results in [1] and [4].Comment: 16 pages. Re-submitted to ESAIMPS December, 201

    An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

    Get PDF
    Abstract: In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial

    Stochastic search and joint fine-mapping increases accuracy and identifies previously unreported associations in immune-mediated diseases

    Get PDF
    Abstract: Thousands of genetic variants are associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. Both lack of power, and joint tagging of two or more distinct causal variants by a single non-causal SNP, lead to inaccuracies in fine-mapping, with stochastic search more robust than stepwise. We develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. MFM analysis of six immune-mediated diseases reveals causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes

    Genetic dissection of the tissue‐specific roles of type III effectors and phytotoxins in the pathogenicity of Pseudomonas syringae pv. syringae to cherry

    Get PDF
    When compared with other phylogroups (PGs) of the Pseudomonas syringae species complex, P. syringae pv. syringae (Pss) strains within PG2 have a reduced repertoire of type III effectors (T3Es) but produce several phytotoxins. Effectors within the cherry pathogen Pss 9644 were grouped based on their frequency in strains from Prunus as the conserved effector locus (CEL) common to most P. syringae pathogens; a core of effectors common to PG2; a set of PRUNUS effectors common to cherry pathogens; and a FLEXIBLE set of T3Es. Pss 9644 also contains gene clusters for biosynthesis of toxins syringomycin, syringopeptin and syringolin A. After confirmation of virulence gene expression, mutants with a sequential series of T3E and toxin deletions were pathogenicity tested on wood, leaves and fruits of sweet cherry (Prunus avium) and leaves of ornamental cherry (Prunus incisa). The toxins had a key role in disease development in fruits but were less important in leaves and wood. An effectorless mutant retained some pathogenicity to fruit but not wood or leaves. Striking redundancy was observed amongst effector groups. The CEL effectors have important roles during the early stages of leaf infection and possibly acted synergistically with toxins in all tissues. Deletion of separate groups of T3Es had more effect in P. incisa than in P. avium. Mixed inocula were used to complement the toxin mutations in trans and indicated that strain mixtures may be important in the field. Our results highlight the niche‐specific role of toxins in P. avium tissues and the complexity of effector redundancy in the pathogen Pss 9644

    Implementation of genomic prediction in Lolium perenne (L.) breeding populations

    Get PDF
    Perennial ryegrass (Lolium perenne L.) is one of the most widely grown forage grasses in temperate agriculture. In order to maintain and increase its usage as forage in livestock agriculture, there is a continued need for improvement in biomass yield, quality, disease resistance and seed yield. Genetic gain for traits such as biomass yield has been relatively modest. This has been attributed to its long breeding cycle, and the necessity to use population based breeding methods. Thanks to recent advances in genotyping techniques there is increasing interest in genomic selection from which genomically estimated breeding values (GEBV) are derived. In this paper we compare the classical RRBLUP model with state-of-the-art machine learning (ML) techniques that should yield themselves easily to use in GS and demonstrate their application to predicting quantitative traits in a breeding population of L. perenne. Prediction accuracies varied from 0 to 0.59 depending on trait, prediction model and composition of the training population. The BLUP model produced the highest prediction accuracies for most traits and training populations. Forage quality traits had the highest accuracies compared to yield related traits. There appeared to be no clear pattern to the effect of the training population composition on the prediction accuracies. The heritability of the forage quality traits was generally higher than for the yield related traits, and could partly explain the difference in accuracy. Some population structure was evident in the breeding populations, and probably contributed to the varying effects of training population on the predictions. The average linkage disequilibrium (LD) between adjacent markers ranged from 0.121 to 0.215. Higher marker density and larger training population closely related with the test population are likely to improve the prediction accuracy

    Probabilistic classification of anti-SARS-CoV-2 antibody responses improves seroprevalence estimates.

    Get PDF
    OBJECTIVES: Population-level measures of seropositivity are critical for understanding the epidemiology of an emerging pathogen, yet most antibody tests apply a strict cutoff for seropositivity that is not learnt in a data-driven manner, leading to uncertainty when classifying low-titer responses. To improve upon this, we evaluated cutoff-independent methods for their ability to assign likelihood of SARS-CoV-2 seropositivity to individual samples. METHODS: Using robust ELISAs based on SARS-CoV-2 spike (S) and the receptor-binding domain (RBD), we profiled antibody responses in a group of SARS-CoV-2 PCR+ individuals (n = 138). Using these data, we trained probabilistic learners to assign likelihood of seropositivity to test samples of unknown serostatus (n = 5100), identifying a support vector machines-linear discriminant analysis learner (SVM-LDA) suited for this purpose. RESULTS: In the training data from confirmed ancestral SARS-CoV-2 infections, 99% of participants had detectable anti-S and -RBD IgG in the circulation, with titers differing > 1000-fold between persons. In data of otherwise healthy individuals, 7.2% (n = 367) of samples were of uncertain serostatus, with values in the range of 3-6SD from the mean of pre-pandemic negative controls (n = 595). In contrast, SVM-LDA classified 6.4% (n = 328) of test samples as having a high likelihood (> 99% chance) of past infection, 4.5% (n = 230) to have a 50-99% likelihood, and 4.0% (n = 203) to have a 10-49% likelihood. As different probabilistic approaches were more consistent with each other than conventional SD-based methods, such tools allow for more statistically-sound seropositivity estimates in large cohorts. CONCLUSION: Probabilistic antibody testing frameworks can improve seropositivity estimates in populations with large titer variability

    Semimartingale decomposition of convex functions of continuous semimartingales by Brownian perturbation

    No full text
    In this note we prove that the local martingale part of a convex function f of a d-dimensional semimartingale X = M + A can be written in terms of an Itô stochastic integral ∫H(X)dM, where H(x) is some particular measurable choice of subgradient \hbox{\sub} of f at x, and M is the martingale part of X. This result was first proved by Bouleau in [N. Bouleau, C. R. Acad. Sci. Paris Sér. I Math. 292 (1981) 87–90]. Here we present a new treatment of the problem. We first prove the result for \hbox{X~=X+ϵB\widetilde{X}=X+\epsilon B}, ϵ > 0, where B is a standard Brownian motion, and then pass to the limit as ϵ → 0, using results in [M.T. Barlow and P. Protter, On convergence of semimartingales. In Séminaire de Probabilités, XXIV, 1988/89, Lect. Notes Math., vol. 1426. Springer, Berlin (1990) 188–193; E. Carlen and P. Protter, Illinois J. Math. 36 (1992) 420–427]. The former paper concerns convergence of semimartingale decompositions of semimartingales, while the latter studies a special case of converging convex functions of semimartingales
    corecore