28 research outputs found

    The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

    Get PDF
    Reproducibility is a fundamental requirement in scientific experiments and clinical contexts. Recent publications raise concerns about the reliability of microarray technology because of the apparent lack of agreement between lists of differentially expressed genes (DEGs). In this study we demonstrate that (1) such discordance may stem from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion, the lists become much more reproducible, especially when fewer genes are selected; and (3) the instability of short DEG lists based on P cutoffs is an expected mathematical consequence of the high variability of the t-values. We recommend the use of FC ranking plus a non-stringent P cutoff as a baseline practice in order to generate more reproducible DEG lists. The FC criterion enhances reproducibility while the P criterion balances sensitivity and specificity

    The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists.</p> <p>Results</p> <p>Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan – the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (<it>P</it>) derived from widely used simple <it>t</it>-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent <it>P</it>-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on <it>P</it>-value ranking is an expected mathematical consequence of the high variability of the <it>t</it>-values; the more stringent the <it>P</it>-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations.</p> <p>Conclusion</p> <p>We recommend the use of FC-ranking plus a non-stringent <it>P </it>cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the <it>P</it>-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and <it>P</it>-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the <it>P </it>criterion balances sensitivity and specificity.</p

    Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data

    Get PDF
    Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Controlling a Rehabilitation Robot with Brain-Machine Interface: An approach based on Independent Component Analysis and Multiple Kernel Learning

    No full text
    Patients suffering from severe motor disabilities usually require assistance from other people when doing rehabilitation exercises, which causes the rehabilitation process to be time-consuming and inconvenient. Therefore, we propose an automatic feature extraction method for a brain-machine interface that allows patients to control a robot using their own brain waves. A brain–machine interface (BMI) based on the P300 event-related potential (ERP), called Brain Controlled Rehabilitation System (BCRS), was developed to detect the intentions of patients. Using the BCRS, patients can communicate with the robot through their brain waves. However, deciding how to obtain an automatically extracted, useful EEG signal is a difficult and important problem for BMI research. In this paper, Independent Component Analysis – Multiple Kernel Learning (ICA-MKL) is used to directly extract a useful signal and build the classification mode for BCRS. The results reveal that this method is useful for automatically extracting the P300 signal and the accuracy is better than MKL. In additional, the same method can be extended into any motor imaginary area and the accuracy of ICA-MKL for brain imaginary data is also good to removing eye-blink artifacts and the accuracy performance is also good

    Data from: A revised framework of Dryopteris subg. Nothoperanema (Dryopteridaceae) inferred from phylogenetic evidence, with descriptions of two new sections.

    No full text
    Dryopteris subgenus Nothoperanema (Dryopteridaceae) includes sections Acrophorus, Diacalpe, Nothoperanema, and Peranema. Phylogenetic relationships among these sections and their relationship to sect. Dryopsis (genus Dryopteris subgenus Erythrovariae, Dryopteridaceae) are unclear. Additionally, previous phylogenetic work has not included Stenolepia, which has been suggested as an important relative of Peranema based on morphology. In this study, we examined phylogenetic relationships within subgenus Nothoperanema by including Stenolepia and utilizing six plastid regions (∼5,500 characters). Our inferred phylogeny revealed that sect. Dryopsis is not monophyletic. The Nothoperanema clade is highly supported, and includes sect. Acrophorus, sect. Diacalpe, sect. Nothoperanema, sect. Peranema, certain Dryopsis species, and Stenolepia. By re-examining diagnostic morphological characters, we establish and describe two new sections under subgenus Nothoperanema: sect. Shiehia and sect. Stenolepia. This revision accommodates new species transferred from sects. Dryopsis and Stenolepia, and makes subgenus Nothoperanema and each of its sections natural groups. Finally, we provide a table with morphological comparisons and a key to sections

    Heterogeneous Nuclear Ribonucleoproteins A1 and A2 Function in Telomerase-Dependent Maintenance of Telomeres

    No full text
    The A/B subfamily of heterogeneous nuclear ribonucleoproteins (hnRNPs A/B), which includes hnRNP A1, A2/B1, and A3, plays an important role in cell proliferation. The simultaneous suppression of hnRNP A1/A2, but not the suppression of hnRNP A1 or A2 alone, has been shown to inhibit cell proliferation and induce apoptosis in cancer cells, but not in mortal normal cells. However, the molecular basis for such a differential inhibition of cell proliferation remains unknown. Here, we show that the simultaneous suppression of hnRNP A1 and hnRNP A2 resulted in dysfunctional telomeres and induced DNA damage responses in cancer cells. The inhibition of apoptosis did not alleviate the inhibition of cell proliferation nor the formation of dysfunctional telomeres in cancer cells depleted of hnRNP A1/A2. Moreover, while proliferation of mortal normal fibroblasts was not sensitive to the depletion of hnRNP A1/A2, the ectopic expression of hTERT in normal fibroblasts rendered these cells sensitive to proliferation inhibition, which was associated with the production of dysfunctional telomeres. Our study demonstrates that hnRNP A1 and A2 function to maintain telomeres in telomerase-expressing cells only, suggesting that the maintenance of functional telomeres in telomerase-expressing cancer cells employs factors that differ from those used in the telomerase-negative normal cells

    FMR1 CGG allele size and prevalence ascertained through newborn screening in the United States

    Get PDF
    Abstract Background Population screening for FMR1 mutations has been a topic of considerable discussion since the FMR1 gene was identified in 1991. Advances in understanding the molecular basis of fragile X syndrome (FXS) and in genetic testing methods have led to new, less expensive methodology to use for large screening endeavors. A core criterion for newborn screening is an accurate understanding of the public health burden of a disease, considering both disease severity and prevalence rate. This article addresses this need by reporting prevalence rates observed in a pilot newborn screening study for FXS in the US. Methods Blood spot screening of 14,207 newborns (7,312 males and 6,895 females) was conducted in three birthing hospitals across the United States beginning in November 2008, using a PCR-based approach. Results The prevalence of gray zone alleles was 1:66 females and 1:112 males, while the prevalence of a premutation was 1:209 females and 1:430 males. Differences in prevalence rates were observed among the various ethnic groups; specifically higher frequency for gray zone alleles in males was observed in the White group compared to the Hispanic and African-American groups. One full mutation male was identified (&gt;200 CGG repeats). Conclusions The presented pilot study shows that newborn screening in fragile X is technically feasible and provides overall prevalence of the premutation and gray zone alleles in the USA, suggesting that the prevalence of the premutation, particularly in males, is higher than has been previously reported
    corecore