15 research outputs found

    Imbalanced regression using regressor-classifier ensembles

    Get PDF
    Acknowledgements: This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) UK through the ACTION on cancer Grant (EP/R022925/1, EP/R022941/1). The computational resources were provided by the Swedish National Infrastructure for Computing (SNIC) at the Chalmers University of Technology partially funded by the Swedish Research Council through Grant Agreement No. 2018-05973. Prof. King acknowledges the support of the Knut and Alice Wallenberg Foundation Wallenberg Autonomous Systems and Software Program (WASP). N. F. Grinberg would like to acknowledge funding from the Wellcome Trust (WT107881) and the MRC (MC_UU_00002/4).AbstractWe present an extension to the federated ensemble regression using classification algorithm, an ensemble learning algorithm for regression problems which leverages the distribution of the samples in a learning set to achieve improved performance. We evaluated the extension using four classifiers and four regressors, two discretizers, and 119 responses from a wide variety of datasets in different domains. Additionally, we compared our algorithm to two resampling methods aimed at addressing imbalanced datasets. Our results show that the proposed extension is highly unlikely to perform worse than the base case, and on average outperforms the two resampling methods with significant differences in performance.</jats:p

    Stochastic search and joint fine-mapping increases accuracy and identifies previously unreported associations in immune-mediated diseases

    No full text
    Thousands of genetic variants are associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. Both lack of power, and joint tagging of two or more distinct causal variants by a single non-causal SNP, lead to inaccuracies in fine-mapping, with stochastic search more robust than stepwise. We develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. MFM analysis of six immune-mediated diseases reveals causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes

    Stochastic search and joint fine-mapping increases accuracy and identifies previously unreported associations in immune-mediated diseases

    No full text
    Thousands of genetic variants are associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. Both lack of power, and joint tagging of two or more distinct causal variants by a single non-causal SNP, lead to inaccuracies in fine-mapping, with stochastic search more robust than stepwise. We develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. MFM analysis of six immune-mediated diseases reveals causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes

    Seropositivity in blood donors and pregnant women during the first year of SARS-CoV-2 transmission in Stockholm, Sweden.

    Get PDF
    BACKGROUND: In Sweden, social restrictions to contain SARS-CoV-2 have primarily relied upon voluntary adherence to a set of recommendations. Strict lockdowns have not been enforced, potentially affecting viral dissemination. To understand the levels of past SARS-CoV-2 infection in the Stockholm population before the start of mass vaccinations, healthy blood donors and pregnant women (n = 5,100) were sampled at random between 14 March 2020 and 28 February 2021. METHODS: In this cross-sectional prospective study, otherwise-healthy blood donors (n = 2,600) and pregnant women (n = 2,500) were sampled for consecutive weeks (at four intervals) throughout the study period. Sera from all participants and a cohort of historical (negative) controls (n = 595) were screened for IgG responses against stabilized trimers of the SARS-CoV-2 spike (S) glycoprotein and the smaller receptor-binding domain (RBD). As a complement to standard analytical approaches, a probabilistic (cut-off independent) Bayesian framework that assigns likelihood of past infection was used to analyse data over time. SETTING: Healthy participant samples were randomly selected from their respective pools through Karolinska University Hospital. The study was carried out in accordance with Swedish Ethical Review Authority: registration number 2020-01807. PARTICIPANTS: No participants were symptomatic at sampling, and blood donors were all over the age of 18. No additional metadata were available from the participants. RESULTS: Blood donors and pregnant women showed a similar seroprevalence. After a steep rise at the start of the pandemic, the seroprevalence trajectory increased steadily in approach to the winter second wave of infections, approaching 15% of all individuals surveyed by 13 December 2020. By the end of February 2021, 19% of the population tested seropositive. Notably, 96% of seropositive healthy donors screened (n = 56) developed neutralizing antibody responses at titres comparable to or higher than those observed in clinical trials of SARS-CoV-2 spike mRNA vaccination, supporting that mild infection engenders a competent B-cell response. CONCLUSIONS: These data indicate that in the first year since the start of community transmission, seropositivity levels in metropolitan in Stockholm had reached approximately one in five persons, providing important baseline seroprevalence information prior to the start of vaccination.Swedish Research Council (agreement 2017-00968) National Institutes of Health (agreement 400 SUM1A44462-02) Wellcome Trust (WT107881) Medical Research Council (MC_UP_1302/5) European Union-funded CoroNAb project (coordination number 101003653

    Hidden resilience and adaptive dynamics of the global online hate ecology

    No full text
    Online hate and extremist narratives have been linked to abhorrent real-world events, including a current surge in hate crimes and an alarming increase in youth suicides that result from social media vitriol ; inciting mass shootings such as the 2019 attack in Christchurch, stabbings and bombings ; recruitment of extremists , including entrapment and sex-trafficking of girls as fighter brides ; threats against public figures, including the 2019 verbal attack against an anti-Brexit politician, and hybrid (racist-anti-women-anti-immigrant) hate threats against a US member of the British royal family ; and renewed anti-western hate in the 2019 post-ISIS landscape associated with support for Osama Bin Laden's son and Al Qaeda. Social media platforms seem to be losing the battle against online hate and urgently need new insights. Here we show that the key to understanding the resilience of online hate lies in its global network-of-network dynamics. Interconnected hate clusters form global 'hate highways' that-assisted by collective online adaptations-cross social media platforms, sometimes using 'back doors' even after being banned, as well as jumping between countries, continents and languages. Our mathematical model predicts that policing within a single platform (such as Facebook) can make matters worse, and will eventually generate global 'dark pools' in which online hate will flourish. We observe the current hate network rapidly rewiring and self-repairing at the micro level when attacked, in a way that mimics the formation of covalent bonds in chemistry. This understanding enables us to propose a policy matrix that can help to defeat online hate, classified by the preferred (or legally allowed) granularity of the intervention and top-down versus bottom-up nature. We provide quantitative assessments for the effects of each intervention. This policy matrix also offers a tool for tackling a broader class of illicit online behaviours such as financial fraud

    Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

    No full text
    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools

    Genomic basis of European ash tree resistance to ash dieback fungus

    No full text
    Populations of European ash trees (Fraxinus excelsior) are being devastated by the invasive alien fungus Hymenoscyphus fraxineus, which causes ash dieback. We sequenced whole genomic DNA from 1,250 ash trees in 31 DNA pools, each pool containing trees with the same ash dieback damage status in a screening trial and from the same seed-source zone. A genome-wide association study identified 3,149 single nucleotide polymorphisms (SNPs) associated with low versus high ash dieback damage. Sixty-one of the 192 most significant SNPs were in, or close to, genes with putative homologues already known to be involved in pathogen responses in other plant species. We also used the pooled sequence data to train a genomic prediction model, cross-validated using individual whole genome sequence data generated for 75 healthy and 75 damaged trees from a single seed source. The model’s genomic estimated breeding values (GEBVs) allocated these 150 trees to their observed health statuses with 67% accuracy using 10,000 SNPs. Using the top 20% of GEBVs from just 200 SNPs, we could predict observed tree health with over 90% accuracy. We infer that ash dieback resistance in F. excelsior is a polygenic trait that should respond well to both natural selection and breeding, which could be accelerated using genomic prediction
    corecore