3,631 research outputs found

    Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray-based tumor classification is characterized by a very large number of features (genes) and small number of samples. In such cases, statistical techniques cannot determine which genes are correlated to each tumor type. A popular solution is the use of a subset of pre-specified genes. However, molecular variations are generally correlated to a large number of genes. A gene that is not correlated to some disease may, by combination with other genes, express itself.</p> <p>Results</p> <p>In this paper, we propose a new classiification strategy that can reduce the effect of over-fitting without the need to pre-select a small subset of genes. Our solution works by taking advantage of the information embedded in the testing samples. We note that a well-defined classification algorithm works best when the data is properly labeled. Hence, our classification algorithm will discriminate all samples best when the testing sample is assumed to belong to the correct class. We compare our solution with several well-known alternatives for tumor classification on a variety of publicly available data-sets. Our approach consistently leads to better classification results.</p> <p>Conclusion</p> <p>Studies indicate that thousands of samples may be required to extract useful statistical information from microarray data. Herein, it is shown that this problem can be circumvented by using the information embedded in the testing samples.</p

    Stable Feature Selection for Biomarker Discovery

    Full text link
    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

    Transcriptional responses to radiation exposure facilitate the discovery of biomarkers functioning as radiation biodosimeters

    Get PDF
    The development of new methods for a retrospective quantification of the radiation dose of exposed individuals is of widespread interest. To this end, I developed a computational framework for biomarker discovery and radiation dose prediction and successfully identified gene signatures with which low and medium to high radiation doses can be accurately quantified. To enhance our understanding of the radiation-induced transcriptional response, I additionally analyzed microarray data of human PBLs after ex vivo gamma-irradiation and characterized affected functional processes and pathways

    Preanalytical variables and performance of diagnostic RNA-based gene expression analysis in breast cancer

    Get PDF
    Prognostic multigene expression assays have become widely available to provide additional information to standard clinical parameters and to support clinicians in treatment decisions. In this study, we analyzed the impact of variations in tissue handling on the diagnostic EndoPredict test results. EndoPredict is a quantitative reverse transcription PCR assay conducted on RNA from formalin-fixed, paraffin-embedded (FFPE) tissue that predicts the likelihood of distant recurrence in patients with ER-positive/HER2-negative breast cancer. In this study, we performed a total of 138 EndoPredict assays to study the effects of preanalytical variables such as time to fixation, fixation time, tumor cell content, and section storage time on the EndoPredict test results. A time to fixation of up to 12 h and fixation of up to 5 days did not affect the results of the gene expression test. Paired samples of FFPE sections with tumor cell content ranging from 15 to 95 % and tumor-enriched samples showed a correlation coefficient of 0.97. Test results of tissue sections that have been stored for 12 months at +4 or +20 °C showed a correlation of 0.99 when compared to results of nonstored sections. In conclusion, preanalytical tissue handling is not a critical factor for diagnostic gene expression analysis with the EndoPredict assay. The test can therefore be easily integrated into the standard workflow of molecular pathology

    Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective

    Get PDF
    Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    Highly sensitive and multiplexed platforms for allergy diagnostics

    Full text link
    Thesis (Ph.D.)--Boston UniversityAllergy is a disorder of the immune system caused by an immune response to otherwise harmless environmental allergens. Currently 20% of the US population is allergic and 90% of pediatric patients and 60% of adult patients with asthma have allergies. These percentages have increased by 18.5% in the past decade, with predicted similar trends for the future. Here we design sensitive, multiplexed platforms to detect allergen-specific IgE using the Interferometric Reflectance Imaging Sensor (IRIS) for various clinical settings. A microarray platform for allergy diagnosis allows for testing of specific IgE sensitivity to a multitude of allergens, while requiring only small volumes of patient blood sample. However, conventional fluorescent microarray technology is limited by i) the variation of probe immobilization, which hinders the ability to make quantitative, assertive, and statistically relevant conclusions necessary in immunodiagnostics and ii) the use of fluorophore labels, which is not suitable for some clinical applications due to the tendency of fluorophores to stick to blood particulates and require daily calibration methods. This calibrated fluorescence enhancement (CaFE) method integrates the low magnification modality of IRIS with enhanced fluorescence sensing in order to directly correlate immobilized probe (major allergens) density to allergen-specific IgE in patient serum. However, this platform only operates in processed serum samples, which is not ideal for point of care testing. Thus, a high magnification modality of IRIS was adapted as an alternative allergy diagnostic platform to automatically discriminate and size single nanoparticles bound to specific IgE in unprocessed, characterized human blood and serum samples. These features make IRIS an ideal candidate for clinical and diagnostic applications, such a POC testing. The high magnification (nanoparticle counting) modality in conjunction with low magnification of IRIS in a combined instrument offers four significant advantages compared to existing sensing technologies: IRIS i) corrects for any variation in probe immobilization, ii) detects proteins from attomolar to nanomolar concentrations in unprocessed biological samples, iii) unambiguously discriminates nanoparticles tags on a robust and physically large sensor area, iv) detects protein targets with conjugated nanoparticle tags (~40nm diameter), which minimally affect assay kinetics compared to conventional microparticle tagging methods, and v) utilizes components that make the instrument inexpensive, robust, and portable. This platform was successfully validated on patient serum and whole blood samples with documented allergy profiles (ImmunoCAP®, ThermoFisher Scientific)

    Acute Myeloid Leukemia

    Get PDF
    Acute myeloid leukemia (AML) is the most common type of leukemia. The Cancer Genome Atlas Research Network has demonstrated the increasing genomic complexity of acute myeloid leukemia (AML). In addition, the network has facilitated our understanding of the molecular events leading to this deadly form of malignancy for which the prognosis has not improved over past decades. AML is a highly heterogeneous disease, and cytogenetics and molecular analysis of the various chromosome aberrations including deletions, duplications, aneuploidy, balanced reciprocal translocations and fusion of transcription factor genes and tyrosine kinases has led to better understanding and identification of subgroups of AML with different prognoses. Furthermore, molecular classification based on mRNA expression profiling has facilitated identification of novel subclasses and defined high-, poor-risk AML based on specific molecular signatures. However, despite increased understanding of AML genetics, the outcome for AML patients whose number is likely to rise as the population ages, has not changed significantly. Until it does, further investigation of the genomic complexity of the disease and advances in drug development are needed. In this review, leading AML clinicians and research investigators provide an up-to-date understanding of the molecular biology of the disease addressing advances in diagnosis, classification, prognostication and therapeutic strategies that may have significant promise and impact on overall patient survival
    corecore