21 research outputs found

    distr6: R6 Object-Oriented Probability Distributions Interface in R

    Get PDF
    distr6 is an object-oriented (OO) probability distributions interface leveraging the extensibility and scalability of R6, and the speed and efficiency of Rcpp. Over 50 probability distributions are currently implemented in the package with `core' methods including density, distribution, and generating functions, and more `exotic' ones including hazards and distribution function anti-derivatives. In addition to simple distributions, distr6 supports compositions such as truncation, mixtures, and product distributions. This paper presents the core functionality of the package and demonstrates examples for key use-cases. In addition this paper provides a critical review of the object-oriented programming paradigms in R and describes some novel implementations for design patterns and core object-oriented features introduced by the package for supporting distr6 components.Comment: Accepted in The R Journa

    Algebraic Geometric Comparison of Probability Distributions

    Get PDF
    We propose a novel algebraic framework for treating probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of Algebraic Geometry, which we demonstrate in a compact proof for an identifiability criterion

    Problematic internet use (PIU): Associations with the impulsive-compulsive spectrum. An application of machine learning in psychiatry.

    Get PDF
    Problematic internet use is common, functionally impairing, and in need of further study. Its relationship with obsessive-compulsive and impulsive disorders is unclear. Our objective was to evaluate whether problematic internet use can be predicted from recognised forms of impulsive and compulsive traits and symptomatology. We recruited volunteers aged 18 and older using media advertisements at two sites (Chicago USA, and Stellenbosch, South Africa) to complete an extensive online survey. State-of-the-art out-of-sample evaluation of machine learning predictive models was used, which included Logistic Regression, Random Forests and Naïve Bayes. Problematic internet use was identified using the Internet Addiction Test (IAT). 2006 complete cases were analysed, of whom 181 (9.0%) had moderate/severe problematic internet use. Using Logistic Regression and Naïve Bayes we produced a classification prediction with a receiver operating characteristic area under the curve (ROC-AUC) of 0.83 (SD 0.03) whereas using a Random Forests algorithm the prediction ROC-AUC was 0.84 (SD 0.03) [all three models superior to baseline models p < 0.0001]. The models showed robust transfer between the study sites in all validation sets [p < 0.0001]. Prediction of problematic internet use was possible using specific measures of impulsivity and compulsivity in a population of volunteers. Moreover, this study offers proof-of-concept in support of using machine learning in psychiatry to demonstrate replicability of results across geographically and culturally distinct settings.This research received internal departmental funds of the Department of Psychiatry at the University of Chicago.This is the final version of the article. It first appeared from Elsevier at http://dx.doi.org/10.1016/j.jpsychires.2016.08.010

    Data Study Group Final Report: Roche

    Get PDF
    Data Study Groups are week-long events at The Alan Turing Institute bringing together some of the country’s top talent from data science, artificial intelligence, and wider fields, to analyse real-world data science challenges. Roche: Personalised lung cancer treatment modelling using electronic health records and genomics Cancer immunotherapy (CIT) is a promising new type of cancer treatment that uses the patient’s own immune system to fight cancer cells. CIT drugs work to stop the cancer cells from turning off the immune system’s T-cells by inhibiting the PD-L1 produced by the tumour cells (PD-L1 is a protein that binds to PD-1 receptors on T-cells and prevents the immune system from attacking the cancer cells). CIT is currently being used to treat patients with non-small cell lung cancer (NSCLC) for whom chemotherapy or other drugs have failed. CIT is also be-ing used as part of the first-line treatment in patients with advanced NSCLC (aNSCLC - stage III and higher). Theoretically, patients with high PD-L1 ex-pression levels are more likely to respond well to CIT; however, in practice, patient outcomes vary considerably. In this data study group, we investigated different approaches for predicting survival time for patients treated with CIT as first line of treatment, using both electronic health records and tumour genomic data. We also investigated the causal effects of CIT vs other oncology treatments, and studied treatment heterogeneity. The results contribute to identifying patients who are most likely to benefit from CIT

    Problematic internet use as an age-related multifaceted problem: Evidence from a two-site survey.

    Get PDF
    BACKGROUND AND AIMS: Problematic internet use (PIU; otherwise known as Internet Addiction) is a growing problem in modern societies. There is scarce knowledge of the demographic variables and specific internet activities associated with PIU and a limited understanding of how PIU should be conceptualized. Our aim was to identify specific internet activities associated with PIU and explore the moderating role of age and gender in those associations. METHODS: We recruited 1749 participants aged 18 and above via media advertisements in an Internet-based survey at two sites, one in the US, and one in South Africa; we utilized Lasso regression for the analysis. RESULTS: Specific internet activities were associated with higher problematic internet use scores, including general surfing (lasso β: 2.1), internet gaming (β: 0.6), online shopping (β: 1.4), use of online auction websites (β: 0.027), social networking (β: 0.46) and use of online pornography (β: 1.0). Age moderated the relationship between PIU and role-playing-games (β: 0.33), online gambling (β: 0.15), use of auction websites (β: 0.35) and streaming media (β: 0.35), with older age associated with higher levels of PIU. There was inconclusive evidence for gender and gender × internet activities being associated with problematic internet use scores. Attention-deficit hyperactivity disorder (ADHD) and social anxiety disorder were associated with high PIU scores in young participants (age ≤ 25, β: 0.35 and 0.65 respectively), whereas generalized anxiety disorder (GAD) and obsessive-compulsive disorder (OCD) were associated with high PIU scores in the older participants (age > 55, β: 6.4 and 4.3 respectively). CONCLUSIONS: Many types of online behavior (e.g. shopping, pornography, general surfing) bear a stronger relationship with maladaptive use of the internet than gaming supporting the diagnostic classification of problematic internet use as a multifaceted disorder. Furthermore, internet activities and psychiatric diagnoses associated with problematic internet use vary with age, with public health implications

    Full code to ``Prediction and Quantification of Individual Athletic Performance of Runners"

    Get PDF
    Full code in R and MATLAB to ``Prediction and Quantification of Individual Athletic Performance of Runners" by Duncan A.J. Blythe and Franz J. Király.<div><br></div><div>Data downloadable from:</div><div><br></div><div>https://figshare.com/articles/thepowerof10/3408202</div><div><br></div><div><br></div
    corecore