9,242 research outputs found

    A Pairwise Feature Selection Method For Gene Data Using Information Gain

    Get PDF
    The current technical practice for doing classification has limitations when using gene expression microarray data. For example, the robustness of top scoring pairs does not extend to some datasets involving small data size and the gene set with best discrimination power may not be involve a combination of genes. Hence, it is necessary to construct a discriminative and stable classifier that generates highly informative gene sets. As we know, not all the features will be active in a biological process. So a good feature selector should be robust with respect to noise and outliers; the challenge is to select the most informative genes. In this study, the top discriminating pair (TDP) approach is motivated by this issue and aims to reveal which features are highly ranked according to their discrimination power. To identify TDPS, each pair of genes is assigned a score based on their relative probability distribution. Our experiment combines the TDP methodology with information gain (ig) to achieve an effective feature set. To illustrate the effectiveness of TDP with ig, we applied this method to two breast cancer datasets (Wang et al., 2005 and Van\u27t Veer et al., 2002). The result from these experimental datasets using the TDP method is competitive with the baseline method using random forests. Information gain combined with the TDP algorithm used in this study provides a new effective method for feature selection for machine learning

    The signal transducer IL6ST (gp130) as a predictive and 2 prognostic biomarker in breast cancer

    Get PDF
    Novel biomarkers are needed to continue to improve breast cancer clinical management and outcome. IL6-like cytokines, whose pleiotropic functions include roles in many hallmarks of malignancy, rely on the signal transducer IL6ST (gp130) for all their signalling. To date, 10 separate independent studies based on the analysis of clinical breast cancer samples have identified IL6ST as a predictor. Consistent findings suggest that IL6ST is a positive prognostic factor and is associated with ER status. Interestingly, these studies include 4 multigene signatures (EndoPredict, EER4, IRSN-23 and 42GC) that incorporate IL6ST to predict risk of recurrence or outcome from endocrine or chemotherapy. Here we review the existing evidence on the promising predictive and prognostic value of IL6ST. We also discuss how this potential could be further translated into clinical practice beyond the EndoPredict tool, which is already available in the clinic. The most promising route to further exploit IL6ST’s promising predicting power will likely be through additional hybrid multifactor signatures that allow for more robust stratification of ER+ breast tumours into discrete groups with distinct outcomes, thus enabling greater refinement of the treatment-selection process

    Specialized Named Entity Recognition For Breast Cancer Subtyping

    Get PDF
    The amount of data and analysis being published and archived in the biomedical research community is more than can feasibly be sifted through manually, which limits the information an individual or small group can synthesize and integrate into their own research. This presents an opportunity for using automated methods, including Natural Language Processing (NLP), to extract important information from text on various topics. Named Entity Recognition (NER), is one way to automate knowledge extraction of raw text. NER is defined as the task of identifying named entities from text using labels such as people, dates, locations, diseases, and proteins. There are several NLP tools that are designed for entity recognition, but rely on large established corpus for training data. Biomedical research has the potential to guide diagnostic and therapeutic decisions, yet the overwhelming density of publications acts as a barrier to getting these results into a clinical setting. An exceptional example of this is the field of breast cancer biology where over 2 million people are diagnosed worldwide every year and billions of dollars are spent on research. Breast cancer biology literature and research relies on a highly specific domain with unique language and vocabulary, and therefore requires specialized NLP tools which can generate biologically meaningful results. This thesis presents a novel annotation tool, that is optimized for quickly creating training data for spaCy pipelines as well as exploring the viability of said data for analyzing papers with automated processing. Custom pipelines trained on these annotations are shown to be able to recognize custom entities at levels comparable to large corpus based recognition

    Integrative mixture of experts to combine clinical factors and gene markers

    Get PDF
    Motivation: Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature

    Prognostic impact of alternative splicing-derived hMENA isoforms in resected, node-negative, non-small-cell lung cancer

    Get PDF
    Risk assessment and treatment choice remain a challenge in early non-small-cell lung cancer (NSCLC). Alternative splicing is an emerging source for diagnostic, prognostic and therapeutic tools. Here, we investigated the prognostic value of the actin cytoskeleton regulator hMENA and its isoforms, hMENA(11a) and hMENA Delta v6, in early NSCLC. The epithelial hMENA(11a) isoform was expressed in NSCLC lines expressing E-CADHERIN and was alternatively expressed with hMENA Delta v6. Enforced expression of hMENA Delta v6 or hMENA(11a) increased or decreased the invasive ability of A549 cells, respectively. hMENA isoform expression was evaluated in 248 node-negative NSCLC. High pan-hMENA and low hMENA(11a) were the only independent predictors of shorter disease-free and cancer-specific survival, and low hMENA(11a) was an independent predictor of shorter overall survival, at multivariate analysis. Patients with low pan-hMENA/high hMENA(11a) expression fared significantly better (P <= 0.0015) than any other subgroup. Such hybrid variable was incorporated with T-size and number of resected lymph nodes into a 3-class-risk stratification model, which strikingly discriminated between different risks of relapse, cancer-related death, and death. The model was externally validated in an independent dataset of 133 patients. Relative expression of hMENA splice isoforms is a powerful prognostic factor in early NSCLC, complementing clinical parameters to accurately predict individual patient risk

    Prognostic relevance of gene-expression signatures

    Full text link
    Cancer prognosis can be regarded as estimating the risk of future outcomes from multiple variables. In prognostic signatures, these variables represent expressions of genes that are summed up to calculate a risk score. However, it is a natural phenomenon in living systems that the whole is more than the sum of its parts. We hypothesize that the prognostic power of signatures is fundamentally limited without incorporating emergent effects. Convergent evidence from a set of unprecedented size (ca. 10,000 signatures) implicates a maximum prognostic power. We show that a signature can correctly discriminate patients' prognoses in no more than 80% of the time. Using a simple simulation, we show that more than 50% of the potentially available information is still missing at this value.Comment: 27 pages, 6 figures, supporting informatio

    Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

    Breast-Lesion Characterization using Textural Features of Quantitative Ultrasound Parametric Maps

    Get PDF
    © 2017 The Author(s). This study evaluated, for the first time, the efficacy of quantitative ultrasound (QUS) spectral parametric maps in conjunction with texture-analysis techniques to differentiate non-invasively benign versus malignant breast lesions. Ultrasound B-mode images and radiofrequency data were acquired from 78 patients with suspicious breast lesions. QUS spectral-analysis techniques were performed on radiofrequency data to generate parametric maps of mid-band fit, spectral slope, spectral intercept, spacing among scatterers, average scatterer diameter, and average acoustic concentration. Texture-analysis techniques were applied to determine imaging biomarkers consisting of mean, contrast, correlation, energy and homogeneity features of parametric maps. These biomarkers were utilized to classify benign versus malignant lesions with leave-one-patient-out cross-validation. Results were compared to histopathology findings from biopsy specimens and radiology reports on MR images to evaluate the accuracy of technique. Among the biomarkers investigated, one mean-value parameter and 14 textural features demonstrated statistically significant differences (p < 0.05) between the two lesion types. A hybrid biomarker developed using a stepwise feature selection method could classify the legions with a sensitivity of 96%, a specificity of 84%, and an AUC of 0.97. Findings from this study pave the way towards adapting novel QUS-based frameworks for breast cancer screening and rapid diagnosis in clinic

    Clinical and Experimental Importance of Circulating Tumor Cells in Prostate Cancer

    Get PDF
    Prostate cancer (PCa) remains a leading cause of death in men, primarily due to ineffective treatment in the metastatic setting. During this phase of PCa, circulating tumor cells (CTCs) are shed into the bloodstream and their presence and number are important in patient prognosis. The CellSearchÂź system (CSS) is the only U.S. Food and Drug Administration (FDA) and Health Canada approved instrument for detection of CTCs, making it the current clinical gold standard in CTC technology. Although the CSS provides a minimally invasive means of patient monitoring in the metastatic setting, little is known about the role of CTCs in early-stage PCa. Additionally, examination of the utility of CTC molecular characterization in personalized patient care is an area of great interest. However, the underlying biology of CTCs remains poorly understood. In the present study, we demonstrated that CTCs are detectable in early-stage, post-surgical PCa patients undergoing adjuvant and salvage radiotherapy, and that in combination with other clinicopathological risk factors, CTCs may be useful in predicting treatment failure earlier then currently utilized clinical techniques. Additionally, we provide 2 technical resources outlining the FDA and Health Canada approved process of CTC identification and enumeration using the CSS, the detailed experimental process of user-defined protein molecular characterization using the CSS, and a comparable CTC assay for use in in vivo pre-clinical mouse models of metastasis. Finally, a comprehensive biological examination of the role of the epithelial-to-mesenchymal transition (EMT) in CTC kinetics and metastatic dissemination in PCa is presented, demonstrating that highly mesenchymal PCa cells shed CTCs earlier and in greater numbers during the metastatic cascade and have a greater metastatic capacity then PCa cells with an epithelial phenotype. Collectively these data improve our understanding biology of CTCs in PCa, including CTC kinetics, their relationship with EMT, and metastasis. These results will iii guide future research and technology development in the identification and capture of CTCs with the greatest metastatic potential, and may ultimately lead to changes in patient treatment guidelines
    • 

    corecore