4,831 research outputs found

    Technical Variables in High-Throughput miRNA Expression Profiling: Much Work Remains to Be Done

    Get PDF
    MicroRNA (miRNA) gene expression profiling has provided important insights into plant and animal biology. However, there has not been ample published work about pitfalls associated with technical parameters in miRNA gene expression profiling. One source of pertinent information about technical variables in gene expression profiling is the separate and more well-established literature regarding mRNA expression profiling. However, many aspects of miRNA biochemistry are unique. For example, the cellular processing and compartmentation of miRNAs, the differential stability of specific miRNAs, and aspects of global miRNA expression regulation require specific consideration. Additional possible sources of systematic bias in miRNA expression studies include the differential impact of pre-analytical variables, substrate specificity of nucleic acid processing enzymes used in labeling and amplification, and issues regarding new miRNA discovery and annotation. We conclude that greater focus on technical parameters is required to bolster the validity, reliability, and cultural credibility of miRNA gene expression profiling studies

    Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling

    Get PDF
    Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org)

    Computational study of cancer

    Get PDF
    In my thesis, I focused on integrative analysis of high-throughput oncogenomic data. This was done in two parts: In the first part, I describe IntOGen, an integrative data mining tool for the study of cancer. This system collates, annotates, pre-processes and analyzes large-scale data for transcriptomic, copy number aberration and mutational profiling of a large number of tumors in multiple cancer types. All oncogenomic data is annotated with ICD-O terms. We perform analysis at different levels of complexity: at the level of genes, at the level of modules, at the level of studies and finally combination of studies. The results are publicly available in a web service. I also present the Biomart interface of IntOGen for bulk download of data. In the final part, I propose a methodology based on sample-level enrichment analysis to identify patient subgroups from high-throughput profiling of tumors. I also apply this approach to a specific biological problem and characterize properties of worse prognosis tumor in multiple cancer types. This methodology can be used in the translational version of IntOGen

    Global gene expression profiling of healthy human brain and its application in studying neurological disorders

    Get PDF
    The human brain is the most complex structure known to mankind and one of the greatest challenges in modern biology is to understand how it is built and organized. The power of the brain arises from its variety of cells and structures, and ultimately where and when different genes are switched on and off throughout the brain tissue. In other words, brain function depends on the precise regulation of gene expression in its sub-anatomical structures. But, our understanding of the complexity and dynamics of the transcriptome of the human brain is still incomplete. To fill in the need, we designed a gene expression model that accurately defines the consistent blueprint of the brain transcriptome; thereby, identifying the core brain specific transcriptional processes conserved across individuals. Functionally characterizing this model would provide profound insights into the transcriptional landscape, biological pathways and the expression distribution of neurotransmitter systems. Here, in this dissertation we developed an expression model by capturing the similarly expressed gene patterns across congruently annotated brain structures in six individual brains by using data from the Allen Brain Atlas (ABA). We found that 84% of genes are expressed in at least one of the 190 brain structures. By employing hierarchical clustering we were able to show that distinct structures of a bigger brain region can cluster together while still retaining their expression identity. Further, weighted correlation network analysis identified 19 robust modules of coexpressing genes in the brain that demonstrated a wide range of functional associations. Since signatures of local phenomena can be masked by larger signatures, we performed local analysis on each distinct brain structure. Pathway and gene ontology enrichment analysis on these structures showed, striking enrichment for brain region specific processes. Besides, we also mapped the structural distribution of the gene expression profiles of genes associated with major neurotransmission systems in the human. We also postulated the utility of healthy brain tissue gene expression to predict potential genes involved in a neurological disorder, in the absence of data from diseased tissues. To this end, we developed a supervised classification model, which achieved an accuracy of 84% and an AUC (Area Under the Curve) of 0.81 from ROC plots, for predicting autism-implicated genes using the healthy expression model as the baseline. This study represents the first use of healthy brain gene expression to predict the scope of genes in autism implication and this generic methodology can be applied to predict genes involved in other neurological disorders

    Integrative analysis of complex genomic and epigenomic maps

    Full text link
    Modern healthcare research demands collaboration across disciplines to build preventive measures and innovate predictive capabilities for curing diseases. Along with the emergence of cutting-edge computational and statistical methodologies, data generation and analysis has become cheaper in the last ten years. However, the complexity of big data due to its variety, volume, and velocity creates new challenges for biologists, physicians, bioinformaticians, statisticians, and computer scientists. Combining data from complex multiple profiles is useful to better understand cellular functions and pathways that regulates cell function to provide insights that could not have been obtained using the individual profiles alone. However, current normalization and artifact correction methods are platform and data type specific, and may require both the training and test sets for any application (e.g. biomarker development). This often leads to over-fitting and reduces the reproducibility of genomic findings across studies. In addition, many bias correction and integration approaches require renormalization or reanalysis if additional samples are later introduced. The motivation behind this research was to develop and evaluate strategies for addressing data integration issues across data types and profiling platforms, which should improve healthcare-informatics research and its application in personalized medicine. We have demonstrated a comprehensive and coordinated framework for data standardization across tissue types and profiling platforms. This allows easy integration of data from multiple data generating consortiums. The main goal of this research was to identify regions of genetic-epigenetic co-ordination that are independent of tissue type and consistent across epigenomics profiling data platforms. We developed multi-‘omic’ therapeutic biomarkers for epigenetic drug efficacy by combining our biomarker regions with drug perturbation data generated in our previous studies. We used an adaptive Bayesian factor analysis approach to develop biomarkers for multiple HDACs simultaneously, allowing for predictions of comparative efficacy between the drugs. We showed that this approach leads to different predictions across breast cancer subtypes compared to profiling the drugs separately. We extended this approach on patient samples from multiple public data resources containing epigenetic profiling data from cancer and normal tissues (The Cancer Genome Atlas, TCGA; NIH Roadmap epigenomics data)

    SITC cancer immunotherapy resource document: a compass in the land of biomarker discovery.

    Get PDF
    Since the publication of the Society for Immunotherapy of Cancer\u27s (SITC) original cancer immunotherapy biomarkers resource document, there have been remarkable breakthroughs in cancer immunotherapy, in particular the development and approval of immune checkpoint inhibitors, engineered cellular therapies, and tumor vaccines to unleash antitumor immune activity. The most notable feature of these breakthroughs is the achievement of durable clinical responses in some patients, enabling long-term survival. These durable responses have been noted in tumor types that were not previously considered immunotherapy-sensitive, suggesting that all patients with cancer may have the potential to benefit from immunotherapy. However, a persistent challenge in the field is the fact that only a minority of patients respond to immunotherapy, especially those therapies that rely on endogenous immune activation such as checkpoint inhibitors and vaccination due to the complex and heterogeneous immune escape mechanisms which can develop in each patient. Therefore, the development of robust biomarkers for each immunotherapy strategy, enabling rational patient selection and the design of precise combination therapies, is key for the continued success and improvement of immunotherapy. In this document, we summarize and update established biomarkers, guidelines, and regulatory considerations for clinical immune biomarker development, discuss well-known and novel technologies for biomarker discovery and validation, and provide tools and resources that can be used by the biomarker research community to facilitate the continued development of immuno-oncology and aid in the goal of durable responses in all patients

    Genomic and proteomic analysis with dynamically growing self organising tree (DGSOT) for measuring clinical outcomes of cancer

    Get PDF
    Genomics and proteomics microarray technologies are used for analysing molecular and cellular expressions of cancer. This creates a challenge for analysis and interpretation of the data generated as it is produced in large volumes. The current review describes a combined system for genetic, molecular interpretation and analysis of genomics and proteomics technologies that offers a wide range of interpreted results. Artificial neural network systems technology has the type of programmes to best deal with these large volumes of analytical data. The artificial system to be recommended here is to be determined from the analysis and selection of the best of different available technologies currently being used or reviewed for microarray data analysis. The system proposed here is a tree structure, a new hierarchical clustering algorithm called a dynamically growing self-organizing tree (DGSOT) algorithm, which overcomes drawbacks of traditional hierarchical clustering algorithms. The DGSOT algorithm combines horizontal and vertical growth to construct a mutlifurcating hierarchical tree from top to bottom to cluster the data. They are designed to combine the strengths of Neural Networks (NN), which have speed and robustness to noise, and hierarchical clustering tree structure which are minimum prior requirement for number of clusters specification and training in order to output results of interpretable biological context. The combined system will generate an output of biological interpretation of expression profiles associated with diagnosis of disease (including early detection, molecular classification and staging), metastasis (spread of the disease to non-adjacent organs and/or tissues), prognosis (predicting clinical outcome) and response to treatment; it also gives possible therapeutic options ranking them according to their benefits for the patient.Key words: Genomics, proteomics, microarray, dynamically growing self-organizing tree (DGSOT)

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure
    corecore