22,082 research outputs found

    A resampling-based meta-analysis for detection of differential gene expression in breast cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Accuracy in the diagnosis of breast cancer and classification of cancer subtypes has improved over the years with the development of well-established immunohistopathological criteria. More recently, diagnostic gene-sets at the mRNA expression level have been tested as better predictors of disease state. However, breast cancer is heterogeneous in nature; thus extraction of differentially expressed gene-sets that stably distinguish normal tissue from various pathologies poses challenges. Meta-analysis of high-throughput expression data using a collection of statistical methodologies leads to the identification of robust tumor gene expression signatures.</p> <p>Methods</p> <p>A resampling-based meta-analysis strategy, which involves the use of resampling and application of distribution statistics in combination to assess the degree of significance in differential expression between sample classes, was developed. Two independent microarray datasets that contain normal breast, invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILC) samples were used for the meta-analysis. Expression of the genes, selected from the gene list for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes were tested on 10 independent primary IDC samples and matched non-tumor controls by real-time qRT-PCR. Other existing breast cancer microarray datasets were used in support of the resampling-based meta-analysis.</p> <p>Results</p> <p>The two independent microarray studies were found to be comparable, although differing in their experimental methodologies (Pearson correlation coefficient, R = 0.9389 and R = 0.8465 for ductal and lobular samples, respectively). The resampling-based meta-analysis has led to the identification of a highly stable set of genes for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes. The expression results of the selected genes obtained through real-time qRT-PCR supported the meta-analysis results.</p> <p>Conclusion</p> <p>The proposed meta-analysis approach has the ability to detect a set of differentially expressed genes with the least amount of within-group variability, thus providing highly stable gene lists for class prediction. Increased statistical power and stringent filtering criteria used in the present study also make identification of novel candidate genes possible and may provide further insight to improve our understanding of breast cancer development.</p

    Galectin-3 performance in histologic and cytologic assessment of thyroid nodules. A systematic review and meta-analysis

    Get PDF
    The literature on Galectin-3 (Gal-3) was systematically reviewed to achieve more robust information on its histologic reliability in identifying thyroid cancers and on the concordance between Gal-3 test in histologic and cytologic samples. A computer search of the PubMed and Scopus databases was conducted by combinations of the terms thyroid and Gal-3. Initially, 545 articles were found and, after their critical review, 52 original papers were finally included. They reported 8172 nodules with histologic evaluation of Gal-3, of which 358 with also preoperative FNAC Gal-3 assessment. At histology, Gal-3 sensitivity was 87% (95% confidence intervals [CI] from 86% to 88%), and specificity 87% (95% CI from 86% to 88%); in both cases, we found heterogeneity (I285% and 93%, respectively) and significant publication bias (p &lt; 0.001). The pooled rate of positive Gal-3 at fine needle aspiration (FNAC) among cancers with histologically proven Gal-3 positivity was 94% (95% CI from 89% to 97%), with neither heterogeneity (I214.5%) nor bias (p = 0.086). These data show high reliability of Gal-3 for thyroid cancer at histology, while its sensitivity on FNAC samples is lower. The limits of cytologic preparations and interpretation of Gal-3 results have to be solved

    The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

    Get PDF
    Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter methods generally outperform more complex embedded or wrapper methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/

    Computational Models for Transplant Biomarker Discovery.

    Get PDF
    Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems

    Unlocking biomarker discovery: Large scale application of aptamer proteomic technology for early detection of lung cancer

    Get PDF
    Lung cancer is the leading cause of cancer deaths, because ~84% of cases are diagnosed at an advanced stage. Worldwide in 2008, ~1.5 million people were diagnosed and ~1.3 million died &#x2013; a survival rate unchanged since 1960. However, patients diagnosed at an early stage and have surgery experience an 86% overall 5-year survival. New diagnostics are therefore needed to identify lung cancer at this stage. Here we present the first large scale clinical use of aptamers to discover blood protein biomarkers in disease with our breakthrough proteomic technology. This multi-center case-control study was conducted in archived samples from 1,326 subjects from four independent studies of non-small cell lung cancer (NSCLC) in long-term tobacco-exposed populations. We measured &#x3e;800 proteins in 15uL of serum, identified 44 candidate biomarkers, and developed a 12-protein panel that distinguished NSCLC from controls with 91% sensitivity and 84% specificity in a training set and 89% sensitivity and 83% specificity in a blinded, independent verification set. Performance was similar for early and late stage NSCLC. This is a significant advance in proteomics in an area of high clinical need

    Knowledge-based gene expression classification via matrix factorization

    Get PDF
    Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks. Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.Siemens AG, MunichDFG (Graduate College 638)DAAD (PPP Luso - Alem˜a and PPP Hispano - Alemanas

    Is now the time for molecular driven therapy for diffuse large B-cell lymphoma?

    Get PDF
    INTRODUCTION: Recent genetic and molecular discoveries regarding alterations in diffuse large B-cell lymphoma (DLBCL) deeply changed the approach to this lymphoproliferative disorder. Novel additional predictors of outcomes and new therapeutic strategies are being introduced to improve outcomes. Areas covered: This review aims to analyse the recent molecular discoveries in DLBCL, the rationale of novel molecular driven treatments and their impact on DLBCL prognosis, especially in ABC-DLBCL and High Grade B Cell Lymphoma. Pre-clinical and clinical evidences are reviewed to critically evaluate the novel DLBCL management strategies. Expert commentary: New insights in DLBCL molecular characteristics should guide the therapeutic approach; the results of the current studies which are investigating safety and efficacy of novel 'X-RCHOP' will probably lead, in future, to a cell of origin (COO) based upfront therapy. Moreover, it is necessary to identify early patients with DLBCL who carried MYC, BCL2 and/or BCL6 rearrangements double hit lymphomas (DHL) because they should not receive standard R-CHOP but high intensity treatment as reported in many retrospective studies. New prospective trials are needed to investigate the more appropriate treatment of DHL

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    corecore