17 research outputs found

    DisProt: intrinsic protein disorder annotation in 2020

    Get PDF
    The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Contributions of mean and shape of blood pressure distribution to worldwide trends and variations in raised blood pressure: A pooled analysis of 1018 population-based measurement studies with 88.6 million participants

    Get PDF
    © The Author(s) 2018. Background: Change in the prevalence of raised blood pressure could be due to both shifts in the entire distribution of blood pressure (representing the combined effects of public health interventions and secular trends) and changes in its high-blood-pressure tail (representing successful clinical interventions to control blood pressure in the hypertensive population). Our aim was to quantify the contributions of these two phenomena to the worldwide trends in the prevalence of raised blood pressure. Methods: We pooled 1018 population-based studies with blood pressure measurements on 88.6 million participants from 1985 to 2016. We first calculated mean systolic blood pressure (SBP), mean diastolic blood pressure (DBP) and prevalence of raised blood pressure by sex and 10-year age group from 20-29 years to 70-79 years in each study, taking into account complex survey design and survey sample weights, where relevant. We used a linear mixed effect model to quantify the association between (probittransformed) prevalence of raised blood pressure and age-group- and sex-specific mean blood pressure. We calculated the contributions of change in mean SBP and DBP, and of change in the prevalence-mean association, to the change in prevalence of raised blood pressure. Results: In 2005-16, at the same level of population mean SBP and DBP, men and women in South Asia and in Central Asia, the Middle East and North Africa would have the highest prevalence of raised blood pressure, and men and women in the highincome Asia Pacific and high-income Western regions would have the lowest. In most region-sex-age groups where the prevalence of raised blood pressure declined, one half or more of the decline was due to the decline in mean blood pressure. Where prevalence of raised blood pressure has increased, the change was entirely driven by increasing mean blood pressure, offset partly by the change in the prevalence-mean association. Conclusions: Change in mean blood pressure is the main driver of the worldwide change in the prevalence of raised blood pressure, but change in the high-blood-pressure tail of the distribution has also contributed to the change in prevalence, especially in older age groups

    Critical assessment of protein intrinsic disorder prediction

    Get PDF
    Abstract: Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

    Tally-2.0: upgraded validator of tandem repeat detection in protein sequences

    No full text
    Motivation: Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. Results: Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%

    DisProt : intrinsic protein disorder annotation in 2020

    No full text
    Altres ajuts: European Regional Development Fund [POCI-01-0145-FEDER-031173, POCI-01-0145-FEDER-029221].- ICREA-Academia 2015The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome

    Critical assessment of protein intrinsic disorder prediction

    No full text
    International audienceIntrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has F max = 0.483 on the full dataset and F max = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with F max = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude
    corecore