18 research outputs found

    Computational Biomarker Discovery: From Systems Biology to Predictive and Personalized Medicine Applications

    Get PDF
    poster abstractWith the advent of Genome-based Medicine, there is an escalating need for discovering how the modifications of biological molecules, either individually or as an ensemble, can be uniquely associated with human physiological states. This knowledge could lead to breakthroughs in the development of clinical tests known as "biomarker tests" to assess disease risks, early onset, prognosis, and treatment outcome predictions. Therefore, development of molecular biomarkers is a key agenda in the next 5-10 years to take full advantage of the human genome to improve human well-beings. However, the complexity of human biological systems and imperfect instrumentations of high-throughput biological instruments/results have created significant hurdles in biomarker development. Only recently did computational methods become an important player of the research topic, which has seen conventional molecular biomarkers development both extremely long and cost-ineffective. At Indiana Center for Systems Biology and Personalized Medicine, we are developing several computational systems biology strategies to address these challenges. We will show examples of how we approach the problem using a variety of computational techniques, including data mining, algorithm development to take into account of biological contexts, biological knowledge integration, and information visualization. Finally, we outline how research in this direction to derive more robust molecular biomarkers may lead to predictive and personalized medicine. Indiana Center for Systems Biology and Personalized Medicine (CSBPM) was founded in 2007 as an IUPUI signature center by Dr. Jake Chen and his colleagues in the Indiana University School of Informatics, School of Medicine, and School of Science. CSBPM is the only research center in the State of Indiana with the primary goal of pursuing predictive and personalized medicine. CSBPM currently consists of eleven faculty members from the School of Medicine, School of Science, School of Engineering, School of Informatics, and Indiana University Simon Cancer Center. The primary mission of the center is to foster the development and use of systems biology and computational modeling techniques to address challenges in future genome-based medicine. The ultimate goal of the center is to shorten the discovery-to-practice gap between integrative ―Omics‖ biology studies—including genomics, transcriptomics, proteomics, and metabolomics—and predictive and personalized medicine applications

    Multicentre harmonisation of a six-colour flow cytometry panel for naïve/memory T cell immunomonitoring

    Get PDF
    Background. Personalised medicine in oncology needs standardised immunological assays. Flow cytometry (FCM) methods represent an essential tool for immunomonitoring, and their harmonisation is crucial to obtain comparable data in multicentre clinical trials. The objective of this study was to design a harmonisation workflow able to address the most effective issues contributing to intra- and interoperator variabilities in a multicentre project. Methods. The Italian National Institute of Health (Istituto Superiore di Sanita, ISS) managed a multiparametric flow cytometric panel harmonisation among thirteen operators belonging to five clinical and research centres of Lazio region (Italy). The panel was based on a backbone mixture of dried antibodies (anti-CD3, anti-CD4, anti-CD8, anti-CD45RA, and anti-CCR7) to detect naive/memory T cells, recognised as potential prognostic/predictive immunological biomarkers in cancer immunotherapies. The coordinating centre distributed frozen peripheral blood mononuclear cells (PBMCs) and fresh whole blood (WB) samples from healthy donors, reagents, and Standard Operating Procedures (SOPs) to participants who performed experiments by their own equipment, in order to mimic a real-life scenario. Operators returned raw and locally analysed data to ISS for central analysis and statistical elaboration. Results. Harmonised and reproducible results were obtained by sharing experimental set-up and procedures along with centralising data analysis, leading to a reduction of cross-centre variability for naive/memory subset frequencies particularly in the whole blood setting. Conclusion. Our experimental and analytical working process proved to be suitable for the harmonisation of FCM assays in a multicentre setting, where high-quality data are required to evaluate potential immunological markers, which may contribute to select better therapeutic options

    A text mining based approach for biomarker discovery

    Get PDF
    Dissertação de mestrado em BioinformáticaBiomarkers have long been heralded as potential motivators for the emergence of new treatment and diagnostic procedures for disease conditions. However, for many years, the biomarker discovery process could only be achieved through experimental means, serving as a deterrent for their increase in popularity as the usually large number of candidates resulted in a costly and time-consuming discovery process. The increase in computational capabilities has led to a change in the paradigm of biomarker discovery, migrating from the clinical laboratory to in silico environments. Furthermore, text mining, the act of automatically extracting information from text through computational means, has seen a rise in popularity in the biomedical fields. The number of studies and clinical trials in these fields has greatly increased in the past years, making the task of manually examining and annotating these, at the very least, incredibly cumbersome. Adding to this, even though the development of efficient and thorough natural language processing is still an on-going process, the potential for the discovery of common reported and hidden behaviours in the scientific literature is too high to be ignored. Several tools, technologies, pipelines and frameworks already exist capable of, at least, giving a glimpse on how the analysis of the available pile of scientific literature can pave the way for the development of novel medical techniques that might help in the prevention, diagnostic and treatment of diseases. As such, a novel approach is presented in this work for achieving biomarker discov ery, one that integrates both gene-disease associations extracted from current biomedical literature and RNA-Seq gene expression data in an L1-regularization mixed-integer linear programming model for identifying potential biomarkers, potentially providing an optimal and robust genetic signature for disease diagnostic and helping identify novel biomarker candidates. This analysis was carried out on five publicly available RNA-Seq datasets ob tained from the Genomic Data Commons Data Portal, related to breast, colon, lung and prostate cancer, and head and neck squamous cell carcinoma. Hyperparameter optimiza tion was also performed for this approach, and the performance of the optimal set of pa rameters was compared against other machine learning methods.Os biomarcadores há muito que são considerados como os motivadores principais para o desenvolvimento de novos procedimentos de diagnóstico e tratamento de doenças. No entanto, ate há relativamente pouco tempo, o processo de descoberta de biomarcadores estava dependente de métodos experimentais, sendo este um elemento dissuasor da sua aplicação e estudo em massa dado que o número elevado de candidatos implicava um processo de averiguação extremamente dispendioso e demorado. O grande aumento do poder computacional nas últimas décadas veio contrariar esta tendência, levando a migração do processo de descoberta de biomarcadores do laboratório para o ambiente in silico. Para além disso, a aplicação de processos de mineração de textos, que consistem na extração de informação de documentos através de meios computacionais, tem visto um aumento da sua popularidade na comunidade biomédica devido ao aumento exponencial do número de estudos e ensaios clínicos nesta área, tornando todo o processo de analise e anotação manual destes bastante laborioso. A adicionar a isto, apesar do desenvolvimento de métodos eficientes capazes de processar linguagem natural na sua plenitude seja um processo que ainda esteja a decorrer, o potencial para a descoberta de comportamentos reportados e escondidos na literatura e demasiado elevado para ser ignorado. Já existem diversas ferramentas e tecnologias capazes de, pelo menos, dar uma indicação de como a análise da literatura científica disponível pode abrir o caminho para o desenvolvimento de novas técnicas e procedimentos médicos que poder ao auxiliar na prevenção, diagnóstico e tratamento de doenças. Como tal, e apresentado neste trabalho um novo método para realizar a descoberta de biomarcadores, que considera simultaneamente associações entre genes e doenças, já extraídas da literatura biomédica e dados de expressão de genes RNA-Seq num modelo de otimização linear com regularização L1 com variáveis contínuas e inteiras (MILP) para identificar possíveis biomarcadores, sendo capaz potencialmente de providenciar assinaturas genéticas ótimas e robustas para o diagnostico de doenças e ajudar a identificar novos candidatos a biomarcador. Esta análise foi levada a cabo em cinco conjuntos de dados RNA-Seq obtidos através do Portal de Dados do Genomic Data Commons (GDC) relacionados com os cancros da mama, colon, pulmão, próstata, e carcinoma escamoso da cabeça e pescoço. Realizou-se também uma otimização dos hiperparâmetros deste método, e o desempenho do conjunto ideal de parâmetros foi comparado com o de outros métodos de aprendizagem máquina

    Opportunities for improving data sharing and FAIR data practices to advance global mental health

    Get PDF
    It is crucial to optimize global mental health research to address the high burden of mental health challenges and mental illness for individuals and societies. Data sharing and reuse have demonstrated value for advancing science and accelerating knowledge development. The FAIR (Findable, Accessible, Interoperable, and Reusable) Guiding Principles for scientific data provide a framework to improve the transparency, efficiency, and impact of research. In this review, we describe ethical and equity considerations in data sharing and reuse, delineate the FAIR principles as they apply to mental health research, and consider the current state of FAIR data practices in global mental health research, identifying challenges and opportunities. We describe noteworthy examples of collaborative efforts, often across disciplinary and national boundaries, to improve Findability and Accessibility of global mental health data, as well as efforts to create integrated data resources and tools that improve Interoperability and Reusability. Based on this review, we suggest a vision for the future of FAIR global mental health research and suggest practical steps for researchers with regard to study planning, data preservation and indexing, machine-actionable metadata, data reuse to advance science and improve equity, metrics and recognition

    Instrumental drift removal in GC-MS data for breath analysis: the short-term and long term temporal validation of putative biomarkers for COPD

    Get PDF
    Breath analysis holds the promise of a non-invasive technique for the diagnosis of diverse respiratory conditions including COPD and lung cancer. Breath contains small metabolites that may be putative biomarkers of these conditions. However, the discovery of reliable biomarkers is a considerable challenge in the presence of both clinical and instrumental confounding factors. Among the latter, instrumental time drifts are highly relevant, as since question the short and long-term validity of predictive models. In this work we present a methodology to counter instrumental drifts using information from interleaved blanks for a case study of GC-MS data from breath samples. The proposed method includes feature filtering, and additive, multiplicative and multivariate drift corrections, the latter being based on Component Correction. Biomarker discovery was based on Genetic Algorithms in a filter configuration using Fisher´s ratio computed in the Partial Least Squares - Discriminant Analysis subspace as a figure of merit. Using our protocol, we have been able to find nine peaks that provide a statistically significant Area under the ROC Curve (AUC) of 0.75 for COPD discrimination. The method developed has been successfully validated using blind samples in short-term temporal validation. However, in the attempt to use this model for patient screening six months later was not successful. This negative result highlights the importance of increasing validation rigour when reporting biomarker discovery result

    Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis.

    Get PDF
    Machine Learning (ML) plays a crucial role in biomedical research. Nevertheless, it still has limitations in data integration and irreproducibility. To address these challenges, robust methods are needed. Pancreatic ductal adenocarcinoma (PDAC), a highly aggressive cancer with low early detection rates and survival rates, is used as a case study. PDAC lacks reliable diagnostic biomarkers, especially metastatic biomarkers, which remains an unmet need. In this study, we propose an ML-based approach for discovering disease biomarkers, apply it to the identification of a PDAC metastatic composite biomarker candidate, and demonstrate the advantages of harnessing data resources. We utilised primary tumour RNAseq data from five public repositories, pooling samples to maximise statistical power and integrating data by correcting for technical variance. Data were split into train and validation sets. The train dataset underwent variable selection via a 10-fold cross-validation process that combined three algorithms in 100 models per fold. Genes found in at least 80% of models and five folds were considered robust to build a consensus multivariate model. A random forest model was constructed using selected genes from the train dataset and tested in the validation set. We also assessed the goodness of prediction by recalibrating a model using only the validation data. The biological context and relevance of signals was explored through enrichment and pathway analyses using QIAGEN Ingenuity Pathway Analysis and GeneMANIA. We developed a pipeline that can detect robust signatures to build composite biomarkers. We tested the pipeline in PDAC, exploiting transcriptomics data from different sources, proposing a composite biomarker candidate comprised of fifteen genes consistently selected that showed very promising predictive capability. Biological contextualisation revealed links with cancer progression and metastasis, underscoring their potential relevance. All code is available in GitHub. This study establishes a robust framework for identifying composite biomarkers across various disease contexts. We demonstrate its potential by proposing a plausible composite biomarker candidate for PDAC metastasis. By reusing data from public repositories, we highlight the sustainability of our research and the wider applications of our pipeline. The preliminary findings shed light on a promising validation and application path

    Cross-Tissue Transcriptomic Analysis Leveraging Machine Learning Approaches Identifies New Biomarkers for Rheumatoid Arthritis

    Get PDF
    There is an urgent need to identify biomarkers for diagnosis and disease activity monitoring in rheumatoid arthritis (RA). We leveraged publicly available microarray gene expression data in the NCBI GEO database for whole blood (N=1,885) and synovial (N=284) tissues from RA patients and healthy controls. We developed a robust machine learning feature selection pipeline with validation on five independent datasets culminating in 13 genes: TNFAIP6, S100A8, TNFSF10, DRAM1, LY96, QPCT, KYNU, ENTPD1, CLIC1, ATP6V0E1, HSP90AB1, NCL and CIRBP which define the RA score and demonstrate its clinical utility: the score tracks the disease activity DAS28 (p = 7e-9), distinguishes osteoarthritis (OA) from RA (OR 0.57, p = 8e-10) and polyJIA from healthy controls (OR 1.15, p = 2e-4) and monitors treatment effect in RA (p = 2e-4). Finally, the immunoblotting analysis of six proteins on an independent cohort confirmed two proteins, TNFAIP6/TSG6 and HSP90AB1/HSP90

    Non-Coding RNAs in the Brain-Heart Axis: The Case of Parkinson’s Disease

    Get PDF
    Parkinson’s disease (PD) is a complex and heterogeneous disorder involving multiple genetic and environmental influences. Although a wide range of PD risk factors and clinical markers for the symptomatic motor stage of the disease have been identified, there are still no reliable biomarkers available for the early pre-motor phase of PD and for predicting disease progression. High-throughput RNA-based biomarker profiling and modeling may provide a means to exploit the joint information content from a multitude of markers to derive diagnostic and prognostic signatures. In the field of PD biomarker research, currently, no clinically validated RNA-based biomarker models are available, but previous studies reported several significantly disease-associated changes in RNA abundances and activities in multiple human tissues and body fluids. Here, we review the current knowledge of the regulation and function of non-coding RNAs in PD, focusing on microRNAs, long non-coding RNAs, and circular RNAs. Since there is growing evidence for functional interactions between the heart and the brain, we discuss the benefits of studying the role of non-coding RNAs in organ interactions when deciphering the complex regulatory networks involved in PD progression. We finally review important concepts of harmonization and curation of high throughput datasets, and we discuss the potential of systems biomedicine to derive and evaluate RNA biomarker signatures from high-throughput expression data

    INTEGRATIVE SYSTEM BIOLOGY STUDIES ON HIGH THROUGHPUT GENOMICS AND PROTEOMICS DATASET

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)The post genomic era has propelled us to the view that the biological systems are complex network of interacting genes, proteins and small molecules that give rise to biological form and function. The past decade has seen the advent of number of new technologies designed to study the biological systems on a genome wide scale. These new technologies offers an insight in to the activity of thousands of genes and proteins in cell thereby changed the conventional reductionist view of the systems. However the deluge of data surpasses the analytical and critical abilities of the researches and thereby demands the development of new computational methods. The challenge no longer lies in the acquisition of expression profiles, but rather in the interpretation for the results to gain insights into biological mechanisms. In three different case studies, we applied various system biology techniques on publicly available and in-house genomics and proteomics data set to identify sub-network signatures. In First study, we integrated prior knowledge from gene signatures, GSEA and gene/protein network modeling to identify pathways involved in colorectal cancer, while in second, we identified plasma based network signatures for Alzheimer's disease by combining various feature selection and classification approach. In final study, we did an integrated miRNA-mRNA analysis to identify the role of Myeloid Derived Stem Cells (MDSCs) in T-Cell suppression
    corecore