35 research outputs found

    A Bayesian inference approach for determining player abilities in football

    Full text link
    We consider the task of determining a football player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a specific ability, through the marginal posterior variational densities. We then use these inferred player abilities to extend the Bayesian hierarchical model of Baio and Blangiardo (2010) which captures a team's scoring rate (the rate at which they score goals). We apply the resulting scheme to the English Premier League, capturing player abilities over the 2013/2014 season, before using output from the hierarchical model to predict whether over or under 2.5 goals will be scored in a given game in the 2014/2015 season. This validates our model as a way of providing insights into team formation and the individual success of sports teams.Comment: 31 pages, 14 figure

    Interpretable surface-based detection of focal cortical dysplasias:a Multi-centre Epilepsy Lesion Detection study

    Get PDF
    One outstanding challenge for machine learning in diagnostic biomedical imaging is algorithm interpretability. A key application is the identification of subtle epileptogenic focal cortical dysplasias (FCDs) from structural MRI. FCDs are difficult to visualize on structural MRI but are often amenable to surgical resection. We aimed to develop an open-source, interpretable, surface-based machine-learning algorithm to automatically identify FCDs on heterogeneous structural MRI data from epilepsy surgery centres worldwide. The Multi-centre Epilepsy Lesion Detection (MELD) Project collated and harmonized a retrospective MRI cohort of 1015 participants, 618 patients with focal FCD-related epilepsy and 397 controls, from 22 epilepsy centres worldwide. We created a neural network for FCD detection based on 33 surface-based features. The network was trained and cross-validated on 50% of the total cohort and tested on the remaining 50% as well as on 2 independent test sites. Multidimensional feature analysis and integrated gradient saliencies were used to interrogate network performance. Our pipeline outputs individual patient reports, which identify the location of predicted lesions, alongside their imaging features and relative saliency to the classifier. On a restricted 'gold-standard' subcohort of seizure-free patients with FCD type IIB who had T1 and fluid-attenuated inversion recovery MRI data, the MELD FCD surface-based algorithm had a sensitivity of 85%. Across the entire withheld test cohort the sensitivity was 59% and specificity was 54%. After including a border zone around lesions, to account for uncertainty around the borders of manually delineated lesion masks, the sensitivity was 67%. This multicentre, multinational study with open access protocols and code has developed a robust and interpretable machine-learning algorithm for automated detection of focal cortical dysplasias, giving physicians greater confidence in the identification of subtle MRI lesions in individuals with epilepsy

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Improved bridge constructs for stochastic differential equations

    Get PDF
    We consider the task of generating discrete-time realisations of a nonlinear multivariate diffusion process satisfying an Itô stochastic differential equation conditional on an observation taken at a fixed future time-point. Such realisations are typically termed diffusion bridges. Since, in general, no closed form expression exists for the transition densities of the process of interest, a widely adopted solution works with the Euler–Maruyama approximation, by replacing the intractable transition densities with Gaussian approximations. However, the density of the conditioned discrete-time process remains intractable, necessitating the use of computationally intensive methods such as Markov chain Monte Carlo. Designing an efficient proposal mechanism which can be applied to a noisy and partially observed system that exhibits nonlinear dynamics is a challenging problem, and is the focus of this paper. By partitioning the process into two parts, one that accounts for nonlinear dynamics in a deterministic way, and another as a residual stochastic process, we develop a class of novel constructs that bridge the residual process via a linear approximation. In addition, we adapt a recently proposed construct to a partial and noisy observation regime. We compare the performance of each new construct with a number of existing approaches, using three applications

    Bayesian Inference for Diffusion-Driven Mixed-Effects Models

    Get PDF
    Stochastic differential equations (SDEs) provide a natural framework for modelling intrinsic stochasticity inherent in many continuous-time physical processes. When such processes are observed in multiple individuals or experimental units, SDE driven mixed-effects models allow the quantification of between (as well as within) individual variation. Performing Bayesian inference for such models, using discrete time data that may be incomplete and subject to measurement error is a challenging problem and is the focus of this paper. We extend a recently proposed MCMC scheme to include the SDE driven mixed-effects framework. Fundamental to our approach is the development of a novel construct that allows for efficient sampling of conditioned SDEs that may exhibit nonlinear dynamics between observation times. We apply the resulting scheme to synthetic data generated from a simple SDE model of orange tree growth, and real data consisting of observations on aphid numbers recorded under a variety of different treatment regimes. In addition, we provide a systematic comparison of our approach with an inference scheme based on a tractable approximation of the SDE, that is, the linear noise approximation.Comment: 30 page

    User-centered development of a smartphone application (Fit2Thrive) to promote physical activity in breast cancer survivors

    No full text
    Increased moderate and vigorous physical activity (MVPA) is associated with better health outcomes in breast cancer survivors; yet, most are insufficiently active. Smartphone applications (apps) to promote MVPA have high scalability potential, but few evidence-based apps exist. The purpose is to describe the testing and usability of Fit2Thrive, a MVPA promotion app for breast cancer survivors. A user-centered, iterative design process was utilized on three independent groups of participants. Two groups of breast cancer survivors (group 1 n = 8; group 2: n = 14) performed app usability field testing by interacting with the app for ≥3 days in a free-living environment. App refinements occurred following each field test. The Post-Study System Usability Questionnaire (PSSUQ) and the User Version Mobile Application Rating Scale (uMARS) assessed app usability and quality on a 7- and 5-point scale, respectively, and women provided qualitative written feedback. A third group (n = 15) rated potential app notification content. Quantitative data were analyzed using descriptive statistics. Qualitative data were analyzed using a directed content analysis. The PSSUQ app usability score (M1= 3.8; SD = 1.4 vs. M2= 3.2; SD = 1.1; lower scores are better) and uMARS app quality score (M1 = 3.4; SD = 1.3 vs. M2= 3.4; SD = 0.6; higher scores are better) appeared to improve in Field Test 2. Group 1 participants identified app "clunkiness," whereas group 2 participants identified issues with error messaging/functionality. Group 3 "liked" 53% of the self-monitoring, 71% of the entry reminder, 60% of the motivational, and 70% of the goal accomplishment notifications. Breast cancer survivors indicated that the Fit2Thrive app was acceptable and participants were able to use the app. Future work will test the efficacy of this app to increase MVPA
    corecore