28 research outputs found

    Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    Get PDF
    We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

    Prognostic DNA methylation markers for sporadic colorectal cancer: a systematic review

    Get PDF
    Background Biomarkers that can predict the prognosis of colorectal cancer (CRC) patients and that can stratify high-risk early stage patients from low-risk early stage patients are urgently needed for better management of CRC. During the last decades, a large variety of prognostic DNA methylation markers has been published in the literature. However, to date, none of these markers are used in clinical practice. Methods To obtain an overview of the number of published prognostic methylation markers for CRC, the number of markers that was validated independently, and the current level of evidence (LoE), we conducted a systematic review of PubMed, EMBASE, and MEDLINE. In addition, we scored studies based on the REMARK guidelines that were established in order to attain more transparency and complete reporting of prognostic biomarker studies. Eighty-three studies reporting on 123 methylation markers fulfilled the study entry criteria and were scored according to REMARK. Results Sixty-three studies investigated single methylation markers, whereas 20 studies reported combinations of methylation markers. We observed substantial variation regarding the reporting of sample sizes and patient characteristics, statistical analyses, and methodology. The median (range) REMARK score for the studies was 10.7 points (4.5 to 17.5) out of a maximum of 20 possible points. The median REMARK score was lower in studies, which reported a p value below 0.05 versus those, which did not (p = 0.005). A borderline statistically significant association was observed between the reported p value of the survival analysis and the size of the study population (p = 0.051). Only 23 out of 123 markers (17%) were investigated in two or more study series. For 12 markers, and two multimarker panels, consistent results were reported in two or more study series. For four markers, the current LoE is level II, for all other markers, the LoE is lower. Conclusion This systematic review reflects that adequate reporting according to REMARK and validation of prognostic methylation markers is absent in the majority of CRC methylation marker studies. However, this systematic review provides a comprehensive overview of published prognostic methylation markers for CRC and highlights the most promising markers that have been published in the last two decades

    ‘MCC’ protein interacts with E-cadherin and ÎČ-catenin strengthening cell–cell adhesion of HCT116 colon cancer cells

    No full text
    E-cadherin and ÎČ-catenin are key proteins that are essential in the formation of the epithelial cell layer in the colon but their regulatory pathways that are disrupted in cancer metastasis are not completely understood. Mutated in colorectal cancer (MCC) is a tumour suppressor gene that is silenced by promoter methylation in colorectal cancer and particularly in patients with increased lymph node metastasis. Here, we show that MCC methylation is found in 45% of colon and 24% of rectal cancers and is associated with proximal colon, poorly differentiated, circumferential and mucinous tumours as well as increasing T stage and larger tumour size. Knockdown of MCC in HCT116 colon cancer cells caused a reduction in E-cadherin protein level, which is a hallmark of epithelial–mesenchymal transition in cancer, and consequently diminished the E-cadherin/ÎČ-catenin complex. MCC knockdown disrupted cell–cell adhesive strength and integrity in the dispase and transepithelial electrical resistance assays, enhanced hepatocyte growth factor-induced cell scatter and increased tumour cell invasiveness in an organotypic assay. The Src/Abl inhibitor dasatinib, a candidate anti-invasive drug, abrogated the invasive properties induced by MCC deficiency. Mechanistically, we establish that MCC interacts with the E-cadherin/ÎČ-catenin complex. These data provide a significant advance in the current understanding of cell–cell adhesion in colon cancer cells
    corecore