81 research outputs found

    Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network

    Get PDF
    The increased diversity and scale of published biological data has to led to a growing appreciation for the applications of machine learning and statistical methodologies to gain new insights. Key to achieving this aim is solving the Relationship Extraction problem which specifies the semantic interaction between two or more biological entities in a published study. Here, we employed two deep neural network natural language processing (NLP) methods, namely: the continuous bag of words (CBOW), and the bi-directional long short-term memory (bi-LSTM). These methods were employed to predict relations between entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system combines pre-processing of full-text articles in a machine-readable format with relevant sentence extraction for downstream NLP analysis. Using the SUBA corpus, the neural network classifier predicted interactions between protein name, subcellular localisation and experimental methodology with an average precision, recall rate, accuracy and F1 scores of 95.1%, 82.8%, 89.3% and 88.4% respectively (n = 30). Comparable scoring metrics were obtained using the CropPAL database as an independent testing dataset that stores protein subcellular localisation in crop species, demonstrating wide applicability of prediction model. We provide a framework for extracting protein functional features from unstructured text in the literature with high accuracy, improving data dissemination and unlocking the potential of big data text analytics for generating new hypotheses.Rakesh David, Rhys‑Joshua D. Menezes, Jan De Klerk, Ian R. Castleden, Cornelia M. Hooper, Gustavo Carneiro and Matthew Gilliha

    Transitions of cardio-metabolic risk factors in the Americas between 1980 and 2014

    Get PDF
    Describing the prevalence and trends of cardiometabolic risk factors that are associated with non-communicable diseases (NCDs) is crucial for monitoring progress, planning prevention, and providing evidence to support policy efforts. We aimed to analyse the transition in body-mass index (BMI), obesity, blood pressure, raised blood pressure, and diabetes in the Americas, between 1980 and 2014

    Galaxy bulges and their massive black holes: a review

    Full text link
    With references to both key and oft-forgotten pioneering works, this article starts by presenting a review into how we came to believe in the existence of massive black holes at the centres of galaxies. It then presents the historical development of the near-linear (black hole)-(host spheroid) mass relation, before explaining why this has recently been dramatically revised. Past disagreement over the slope of the (black hole)-(velocity dispersion) relation is also explained, and the discovery of sub-structure within the (black hole)-(velocity dispersion) diagram is discussed. As the search for the fundamental connection between massive black holes and their host galaxies continues, the competing array of additional black hole mass scaling relations for samples of predominantly inactive galaxies are presented.Comment: Invited (15 Feb. 2014) review article (submitted 16 Nov. 2014). 590 references, 9 figures, 25 pages in emulateApJ format. To appear in "Galactic Bulges", E. Laurikainen, R.F. Peletier, and D.A. Gadotti (eds.), Springer Publishin

    Mapping geographical inequalities in childhood diarrhoeal morbidity and mortality in low-income and middle-income countries, 2000–17 : analysis for the Global Burden of Disease Study 2017

    Get PDF
    Background Across low-income and middle-income countries (LMICs), one in ten deaths in children younger than 5 years is attributable to diarrhoea. The substantial between-country variation in both diarrhoea incidence and mortality is attributable to interventions that protect children, prevent infection, and treat disease. Identifying subnational regions with the highest burden and mapping associated risk factors can aid in reducing preventable childhood diarrhoea. Methods We used Bayesian model-based geostatistics and a geolocated dataset comprising 15 072 746 children younger than 5 years from 466 surveys in 94 LMICs, in combination with findings of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2017, to estimate posterior distributions of diarrhoea prevalence, incidence, and mortality from 2000 to 2017. From these data, we estimated the burden of diarrhoea at varying subnational levels (termed units) by spatially aggregating draws, and we investigated the drivers of subnational patterns by creating aggregated risk factor estimates. Findings The greatest declines in diarrhoeal mortality were seen in south and southeast Asia and South America, where 54·0% (95% uncertainty interval [UI] 38·1–65·8), 17·4% (7·7–28·4), and 59·5% (34·2–86·9) of units, respectively, recorded decreases in deaths from diarrhoea greater than 10%. Although children in much of Africa remain at high risk of death due to diarrhoea, regions with the most deaths were outside Africa, with the highest mortality units located in Pakistan. Indonesia showed the greatest within-country geographical inequality; some regions had mortality rates nearly four times the average country rate. Reductions in mortality were correlated to improvements in water, sanitation, and hygiene (WASH) or reductions in child growth failure (CGF). Similarly, most high-risk areas had poor WASH, high CGF, or low oral rehydration therapy coverage. Interpretation By co-analysing geospatial trends in diarrhoeal burden and its key risk factors, we could assess candidate drivers of subnational death reduction. Further, by doing a counterfactual analysis of the remaining disease burden using key risk factors, we identified potential intervention strategies for vulnerable populations. In view of the demands for limited resources in LMICs, accurately quantifying the burden of diarrhoea and its drivers is important for precision public health
    corecore