100 research outputs found

    Manifold-aware Forests: Closing the Gap to Convolutional Neural Networks

    Get PDF
    Decision forests (DF), in particular random forests and gradient boosting trees, have demonstrated state-of-the-art accuracy compared to other methods in many supervised learning scenarios. In particular, DFs dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to permuting feature indices. However, in structured data lying on a manifold---such as images, text, and speech---neural nets (NN), specifically convolutional neural nets (CNN), tend to outperform DFs. We conjecture that at least part of the reason for this is that the input to NN is not simply the feature magnitudes, but also their indices (for example, the convolution operation uses feature "locality"). In contrast, naive DF implementations fail to explicitly consider feature indices. A recently proposed DF approach demonstrates that DFs, for each node, implicitly sample a random matrix from some specific distribution. Here, we build on that to show that one can choose distributions in a manifold aware fashion. For example, for image classification, rather than randomly selecting pixels, one can randomly select contiguous patches. We demonstrate the empirical performance of data living on three different manifolds: images, time-series, and a torus. In all three cases, our Manifold-aware Forest (MF) algorithm empirically dominates other state-of-the-art approaches that ignore feature space structure, achieving a lower classification error on all sample sizes. This dominance extends to the MNIST data set as well. Moreover, training time is significantly faster for MF as compared to deep nets. This approach, therefore, has promise to enable DFs and other machine learning methods to close the gap with deep nets on manifold-valued data

    Bibliometrics Analysis and Density-Equalizing Mapping

    Get PDF
    The objective of this paper is to provide a detailed evaluation of type 2 diabetes mellitus research output from 1951-2012, using large-scale data analysis, bibliometric indicators and density-equalizing mapping. Data were retrieved from the Science Citation Index Expanded database, one of the seven curated databases within Web of Science. Using Boolean operators "OR", "AND" and "NOT", a search strategy was developed to estimate the total number of published items. Only studies with an English abstract were eligible. Type 1 diabetes and gestational diabetes items were excluded. Specific software developed for the database analysed the data. Information including titles, authors’ affiliations and publication years were extracted from all files and exported to excel. Density-equalizing mapping was conducted as described by Groenberg-Kloft et al, 2008. A total of 24,783 items were published and cited 476,002 times. The greatest number of outputs were published in 2010 (n=2,139). The United States contributed 28.8% to the overall output, followed by the United Kingdom (8.2%) and Japan (7.7%). Bilateral cooperation was most common between the United States and United Kingdom (n=237). Harvard University produced 2% of all publications, followed by the University of California (1.1%). The leading journals were Diabetes, Diabetologia and Diabetes Care and they contributed 9.3%, 7.3% and 4.0% of the research yield, respectively. In conclusion, the volume of research is rising in parallel with the increasing global burden of disease due to type 2 diabetes mellitus. Bibliometrics analysis provides useful information to scientists and funding agencies involved in the development and implementation of research strategies to address global health issues

    Depth variation in coral carbonate production on remote reefs

    Get PDF
    Recurrent climate-driven warming events, which can induce severe coral bleaching and mortality on tropical reefs, are predicted to cause homogenisation of coral communities and loss of ecosystem functions in shallow reef systems (<30 m). However, data documenting the variation in coral carbonate production across depth are limited. Here we explore differences in coral cover, community composition, coral colony size structure and carbonate production rates between two depths (10 m and 17.5 m) across four atolls in the remote Chagos Archipelago. We show higher coral carbonate production rates at 10 m depth (4.82 ± 0.27 G, where G = kg CaCO3 m-2 yr-1) compared to sites at 17.5 m (3.1 ± 0.18 G). The main carbonate producers at 10 m consisted of fast-growing branching and tabular corals (mainly Acroporids) and massive corals (mainly Porites), with high abundances of medium- and large-sized colonies. In contrast, coral carbonate production at 17.5 m was driven by slow-growing encrusting and foliose morphotypes and small colony sizes. Utilising a dataset following 6–7 years of recovery after the 2015–2017 bleaching event, our results show that depth-homogenization of coral communities was temporary and carbonate production rates at 10 m depth recovered quickler at 3 of 4 studied atolls. The exception is Great Chagos Bank where slower recovery of branching and tabular corals at 10 m has led to a longer-lasting depth-homogenisation of carbonate production rates. The latter example cautions that more frequent bleaching events may drive increasing homogenisation of carbonate production rates across depth gradients, with implications for vital reef geo-ecological functions

    Manifold Forests: Closing the Gap on Neural Networks

    Full text link
    Decision forests (DFs), in particular random forests and gradient boosting trees, have demonstrated state-of-the-art accuracy compared to other methods in many supervised learning scenarios. In particular, DFs dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to permuting feature indices. However, in structured data lying on a manifold---such as images, text, and speech---deep networks (DNs), specifically convolutional deep networks (ConvNets), tend to outperform DFs. We conjecture that at least part of the reason for this is that the input to DNs is not simply the feature magnitudes, but also their indices (for example, the convolution operation uses feature locality). In contrast, naive DF implementations fail to explicitly consider feature indices. A recently proposed DF approach demonstrates that DFs, for each node, implicitly sample a random matrix from some specific distribution. These DFs, like some classes of DNs, learn by partitioning the feature space into convex polytopes corresponding to linear functions. We build on that approach and show that one can choose distributions in a manifold-aware fashion to incorporate feature locality. We demonstrate the empirical performance on data whose features live on three different manifolds: a torus, images, and time-series. In all simulations, our Manifold Oblique Random Forest (MORF) algorithm empirically dominates other state-of-the-art approaches that ignore feature space structure and challenges the performance of ConvNets. Moreover, MORF runs significantly faster than ConvNets and maintains interpretability and theoretical justification. This approach, therefore, has promise to enable DFs and other machine learning methods to close the gap to deep networks on manifold-valued data.Comment: 12 pages, 4 figure

    COVID-19 Infection Risk amongst 14,104 Vaccinated Care Home Residents: A national observational longitudinal cohort study in Wales, United Kingdom, December 2020 to March 2021

    Get PDF
    Backgroundvaccinations for COVID-19 have been prioritised for older people living in care homes. However, vaccination trials included limited numbers of older people.Aimwe aimed to study infection rates of SARS-CoV-2 for older care home residents following vaccination and identify factors associated with increased risk of infection.Study Design and Settingwe conducted an observational data-linkage study including 14,104 vaccinated older care home residents in Wales (UK) using anonymised electronic health records and administrative data.Methodswe used Cox proportional hazards models to estimate hazard ratios (HRs) for the risk of testing positive for SARS-CoV-2 infection following vaccination, after landmark times of either 7 or 21 days post-vaccination. We adjusted HRs for age, sex, frailty, prior SARS-CoV-2 infections and vaccination type.Resultswe observed a small proportion of care home residents with positive polymerase chain reaction (tests following vaccination 1.05% (N = 148), with 90% of infections occurring within 28 days. For the 7-day landmark analysis we found a reduced risk of SARS-CoV-2 infection for vaccinated individuals who had a previous infection; HR (95% confidence interval) 0.54 (0.30, 0.95). For the 21-day landmark analysis, we observed high HRs for individuals with low and intermediate frailty compared with those without; 4.59 (1.23, 17.12) and 4.85 (1.68, 14.04), respectively.Conclusionsincreased risk of infection after 21 days was associated with frailty. We found most infections occurred within 28 days of vaccination, suggesting extra precautions to reduce transmission risk should be taken in this time frame

    Nonpar MANOVA via Independence Testing

    Full text link
    The kk-sample testing problem tests whether or not kk groups of data points are sampled from the same distribution. Multivariate analysis of variance (MANOVA) is currently the gold standard for kk-sample testing but makes strong, often inappropriate, parametric assumptions. Moreover, independence testing and kk-sample testing are tightly related, and there are many nonparametric multivariate independence tests with strong theoretical and empirical properties, including distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic). We prove that universally consistent independence tests achieve universally consistent kk-sample testing and that kk-sample statistics like Energy and Maximum Mean Discrepancy (MMD) are exactly equivalent to Dcorr. Empirically evaluating these tests for kk-sample scenarios demonstrates that these nonparametric independence tests typically outperform MANOVA, even for Gaussian distributed settings. Finally, we extend these non-parametric kk-sample testing procedures to perform multiway and multilevel tests. Thus, we illustrate the existence of many theoretically motivated and empirically performant kk-sample tests. A Python package with all independence and k-sample tests called hyppo is available from https://hyppo.neurodata.io/.Comment: 15 pages main + 4 pages appendix, 9 figure

    Domestication-induced reduction in eye size revealed in multiple common garden experiments: The case of Atlantic salmon (Salmo salar L.)

    Get PDF
    Domestication leads to changes in traits that are under directional selection in breeding programmes, though unintentional changes in nonproduction traits can also arise. In offspring of escaping fish and any hybrid progeny, such unintentionally altered traits may reduce fitness in the wild. Atlantic salmon breeding programmes were established in the early 1970s, resulting in genetic changes in multiple traits. However, the impact of domestication on eye size has not been studied. We measured body size corrected eye size in 4000 salmon from six common garden experiments conducted under artificial and natural conditions, in freshwater and saltwater environments, in two countries. Within these common gardens, offspring of domesticated and wild parents were crossed to produce 11 strains, with varying genetic backgrounds (wild, domesticated, F1 hybrids, F2 hybrids and backcrosses). Size-adjusted eye size was influenced by both genetic and environmental factors. Domesticated fish reared under artificial conditions had smaller adjusted eye size when compared to wild fish reared under identical conditions, in both the freshwater and marine environments, and in both Irish and Norwegian experiments. However, in parr that had been introduced into a river environment shortly after hatching and sampled at the end of their first summer, differences in adjusted eye size observed among genetic groups were of a reduced magnitude and were nonsignificant in 2-year-old sea migrating smolts sampled in the river immediately prior to sea entry. Collectively, our findings could suggest that where natural selection is present, individuals with reduced eye size are maladapted and consequently have reduced fitness, building on our understanding of the mechanisms that underlie a well-documented reduction in the fitness of the progeny of domesticated salmon, including hybrid progeny, in the wild

    Investigating the uptake, effectiveness and safety of COVID-19 vaccines : protocol for an observational study using linked UK national data

    Get PDF
    Funding: This research is part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (HDRUK2020.146). EAVE II is funded by the Medical Research Council (MC_PC_19075) and supported by the Scottish Government. This work is supported by BREATHE - The Health Data Research Hub for Respiratory Health (MC_PC_19004). BREATHE is funded through the UK Research and Innovation Industrial Strategy Challenge Fund and delivered through Health Data Research UK. ConCOV is supported by the Medical Research Council (MR/V028367/1); Health Data Research UK (HDR-9006) which receives its funding from the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation (BHF) and the Wellcome Trust; and Administrative Data Research UK which is funded by the Economic and Social Research Council (grant ES/S007393/1).Introduction : The novel coronavirus SARS-CoV-2, which emerged in December 2019, has caused millions of deaths and severe illness worldwide. Numerous vaccines are currently under development of which a few have now been authorised for population-level administration by several countries. As of 20 September 2021, over 48 million people have received their first vaccine dose and over 44 million people have received their second vaccine dose across the UK. We aim to assess the uptake rates, effectiveness, and safety of all currently approved COVID-19 vaccines in the UK. Methods and analysis : We will use prospective cohort study designs to assess vaccine uptake, effectiveness and safety against clinical outcomes and deaths. Test-negative case–control study design will be used to assess vaccine effectiveness (VE) against laboratory confirmed SARS-CoV-2 infection. Self-controlled case series and retrospective cohort study designs will be carried out to assess vaccine safety against mild-to-moderate and severe adverse events, respectively. Individual-level pseudonymised data from primary care, secondary care, laboratory test and death records will be linked and analysed in secure research environments in each UK nation. Univariate and multivariate logistic regression models will be carried out to estimate vaccine uptake levels in relation to various population characteristics. VE estimates against laboratory confirmed SARS-CoV-2 infection will be generated using a generalised additive logistic model. Time-dependent Cox models will be used to estimate the VE against clinical outcomes and deaths. The safety of the vaccines will be assessed using logistic regression models with an offset for the length of the risk period. Where possible, data will be meta-analysed across the UK nations. Ethics and dissemination : We obtained approvals from the National Research Ethics Service Committee, Southeast Scotland 02 (12/SS/0201), the Secure Anonymised Information Linkage independent Information Governance Review Panel project number 0911. Concerning English data, University of Oxford is compliant with the General Data Protection Regulation and the National Health Service (NHS) Digital Data Security and Protection Policy. This is an approved study (Integrated Research Application ID 301740, Health Research Authority (HRA) Research Ethics Committee 21/HRA/2786). The Oxford-Royal College of General Practitioners Clinical Informatics Digital Hub meets NHS Digital’s Data Security and Protection Toolkit requirements. In Northern Ireland, the project was approved by the Honest Broker Governance Board, project number 0064. Findings will be made available to national policy-makers, presented at conferences and published in peer-reviewed journals.Publisher PDFPeer reviewe

    Immune Responses to Plague Infection in Wild Rattus rattus, in Madagascar: A Role in Foci Persistence?

    Get PDF
    Plague is endemic within the central highlands of Madagascar, where its main reservoir is the black rat, Rattus rattus. Typically this species is considered susceptible to plague, rapidly dying after infection inducing the spread of infected fleas and, therefore, dissemination of the disease to humans. However, persistence of transmission foci in the same area from year to year, supposes mechanisms of maintenance among which rat immune responses could play a major role. Immunity against plague and subsequent rat survival could play an important role in the stabilization of the foci. In this study, we aimed to investigate serological responses to plague in wild black rats from endemic areas of Madagascar. In addition, we evaluate the use of a recently developed rapid serological diagnostic test to investigate the immune response of potential reservoir hosts in plague foci.We experimentally infected wild rats with Yersinia pestis to investigate short and long-term antibody responses. Anti-F1 IgM and IgG were detected to evaluate this antibody response. High levels of anti-F1 IgM and IgG were found in rats one and three weeks respectively after challenge, with responses greatly differing between villages. Plateau in anti-F1 IgM and IgG responses were reached for as few as 500 and 1500 colony forming units (cfu) inoculated respectively. More than 10% of rats were able to maintain anti-F1 responses for more than one year. This anti-F1 response was conveniently followed using dipsticks.Inoculation of very few bacteria is sufficient to induce high immune response in wild rats, allowing their survival after infection. A great heterogeneity of rat immune responses was found within and between villages which could heavily impact on plague epidemiology. In addition, results indicate that, in the field, anti-F1 dipsticks are efficient to investigate plague outbreaks several months after transmission
    corecore