4,384 research outputs found

    Computational Models for Transplant Biomarker Discovery.

    Get PDF
    Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Machine Learning Approaches for Cancer Analysis

    Get PDF
    In addition, we propose many machine learning models that serve as contributions to solve a biological problem. First, we present Zseq, a linear time method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors, such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Studying the abundance of select mRNA species throughout prostate cancer progression may provide some insight into the molecular mechanisms that advance the disease. In the second contribution of this dissertation, we reveal that the combination of proper clustering, distance function and Index validation for clusters are suitable in identifying outlier transcripts, which show different trending than the majority of the transcripts, the trending of the transcript is the abundance throughout different stages of prostate cancer. We compare this model with standard hierarchical time-series clustering method based on Euclidean distance. Using time-series profile hierarchical clustering methods, we identified stage-specific mRNA species termed outlier transcripts that exhibit unique trending patterns as compared to most other transcripts during disease progression. This method is able to identify those outliers rather than finding patterns among the trending transcripts compared to the hierarchical clustering method based on Euclidean distance. A wet-lab experiment on a biomarker (CAM2G gene) confirmed the result of the computational model. Genes related to these outlier transcripts were found to be strongly associated with cancer, and in particular, prostate cancer. Further investigation of these outlier transcripts in prostate cancer may identify them as potential stage-specific biomarkers that can predict the progression of the disease. Breast cancer, on the other hand, is a widespread type of cancer in females and accounts for a lot of cancer cases and deaths in the world. Identifying the subtype of breast cancer plays a crucial role in selecting the best treatment. In the third contribution, we propose an optimized hierarchical classification model that is used to predict the breast cancer subtype. Suitable filter feature selection methods and new hybrid feature selection methods are utilized to find discriminative genes. Our proposed model achieves 100% accuracy for predicting the breast cancer subtypes using the same or even fewer genes. Studying breast cancer survivability among different patients who received various treatments may help understand the relationship between the survivability and treatment therapy based on gene expression. In the fourth contribution, we have built a classifier system that predicts whether a given breast cancer patient who underwent some form of treatment, which is either hormone therapy, radiotherapy, or surgery will survive beyond five years after the treatment therapy. Our classifier is a tree-based hierarchical approach that partitions breast cancer patients based on survivability classes; each node in the tree is associated with a treatment therapy and finds a predictive subset of genes that can best predict whether a given patient will survive after that particular treatment. We applied our tree-based method to a gene expression dataset that consists of 347 treated breast cancer patients and identified potential biomarker subsets with prediction accuracies ranging from 80.9% to 100%. We have further investigated the roles of many biomarkers through the literature. Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. Discovery of gene indicators can be a crucial step in predicting survivability and handling of breast cancer patients. In the fifth contribution, we propose a hierarchical clustering method to separate dissimilar groups of genes in time-series data as outliers. These isolated outliers, genes that trend differently from other genes, can serve as potential biomarkers of breast cancer survivability. In the last contribution, we introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have significant value in guiding treatment decisions. Our study also supports PTGFR, NREP, scaRNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease

    Unveiling Novel Glioma Biomarkers through Multi-omics Integration and Classification

    Get PDF
    Glioma is currently one of the most prevalent types of primary brain cancer. Given its high level of heterogeneity along with the complex biological molecular markers, many efforts have been made to accurately classify the type of glioma in each patient, which, in turn, is critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast- growing technological advances in high throughput sequencing and evolving molecular understanding of glioma biology, its classification has been recently subject to significant alterations. In this study, multiple glioma omics modalities (including mRNA, DNA methylation, and miRNA) from The Cancer Genome Atlas (TCGA) are integrated, while using the revised glioma reclassified labels, with a supervised method based on sparse canonical correlation analysis (DIABLO) to discriminate between glioma types. It was possible to find a set of highly correlated features distinguishing glioblastoma from low- grade gliomas (LGG) that were mainly associated with the disruption of receptor tyrosine kinases signaling pathways and extracellular matrix organization and remodeling. On the other hand, the discrimination of the LGG types was characterized primarily by features involved in ubiquitination and DNA transcription processes. Furthermore, several novel glioma biomarkers likely helpful in both diagnosis and prognosis of the patients were identified, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A and HEPN1. Overall, this classification method allowed to discriminate the different TCGA glioma patients with very high performance, while seeking for common information across multiple data types, ultimately enabling the understanding of essential mechanisms driving glioma heterogeneity and unveiling potential therapeutic targets.O glioma é atualmente um dos tipos mais prevalentes de cancro cerebral primário. Dado o seu elevado nível de heterogeneidade e dada a complexidade dos seus marcadores moleculares biológicos, muitos esforços têm sido realizados para classificar com precisão o tipo de glioma em cada paciente, o que, por sua vez, é fundamental para melhorar o diagnóstico precoce e aumentar a sobrevivência. No entanto, como resultado dos avanços tecnológicos em rápido crescimento na sequenciação de dados e na evolução da com- preensão molecular da biologia do glioma, a sua classificação foi recentemente sujeita a alterações significativas. Neste estudo, múltiplas modalidades ómicas de glioma (in- cluindo mRNA, metilação de DNA e miRNA) provenientes do The Cancer Genome Atlas (TCGA) são integradas, juntamente com a utilização das classes revistas e reclassificadas de glioma, com um método supervisionado baseado em análise de correlação canónica esparsa (DIABLO) para discriminar entre os tipos de glioma. Foi possível encontrar um conjunto de características altamente correlacionadas que distinguem o glioblastoma dos gliomas de baixo grau (LGG) que estavam principalmente associadas à ruptura das vias de sinalização dos receptores de tirosina quinases e à organização e remodelação da matriz extracelular. Por outro lado, a discriminação dos tipos LGG foi caracterizada principalmente por variáveis envolvidas nos processos de ubiquitinação e transcrição de DNA. Além disso, foram identificados vários novos biomarcadores de glioma potencial- mente úteis tanto no diagnóstico quanto no prognóstico dos pacientes, incluindo os genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A e HEPN1. No geral, este método de classificação permitiu discriminar com desempenho muito elevado os diferentes pacientes com glioma, simultaneamente procurando informações comuns entre os vários tipos de dados, permitindo, em última análise, a compreensão de mecanis- mos essenciais que impulsionam a heterogeneidade em glioma e revelam potenciais alvos terapêuticos

    ENDOMET database – A means to identify novel diagnostic and prognostic tools for endometriosis

    Get PDF
    Endometriosis is a common benign hormone reliant inflammatory gynecological disease that affects fertile aged women and has a considerable economic impact on healthcare systems. Symptoms include intense menstrual pain, persistent pelvic pain, and infertility. It is defined by the existence of endometrium-like tissue developing in ectopic locations outside the uterine cavity and inflammation in the peritoneal cavity. Endometriosis presents with multifactorial etiology, and despite extensive research the etiology is still poorly understood. Diagnostic delay from the onset of the disease to when a conclusive diagnosis is reached is between 7–12 years. There is no known cure, although symptoms can be improved with hormonal medications (which often have multiple side effects and prevent pregnancy), or through surgery which carries its own risk. Current non-invasive tools for diagnosis are not sufficiently dependable, and a definite diagnosis is achieved through laparoscopy or laparotomy. This study was based on two prospective cohorts: The ENDOMET study, including 137 endometriosis patients scheduled for surgery and 62 healthy women, and PROENDO that included 138 endometriosis patients and 33 healthy women. Our long-term goal with the current study was to support the discovery of innovative new tools for efficient diagnosis of endometriosis as well as tools to further understand the etiology and pathogenesis of the disease. We set about achieving this goal by creating a database, EndometDB, based on a relational data model, implemented with PostgreSQL programming language. The database allows e.g., for the exploration of global genome-wide expression patterns in the peritoneum, endometrium, and in endometriosis lesions of endometriosis patients as well as in the peritoneum and endometrium of healthy control women of reproductive age. The data collected in the EndometDB was also used for the development and validation of a symptom and biomarker-based predictive model designed for risk evaluation and early prediction of endometriosis without invasive diagnostic methods. Using the data in the EndometDB we discovered that compared with the eutopic endometrium, the WNT- signaling pathway is one of the molecular pathways that undergo strong changes in endometriosis. We then evaluated the potential role for secreted frizzled-related protein 2 (SFRP-2, a WNT-signaling pathway modulator), in improving endometriosis lesion border detection. The SFRP-2 expression visualizes the lesion better than previously used markers and can be used to better define lesion size and that the surgical excision of the lesions is complete.ENDOMET tietokanta – Keino tunnistaa uusi diagnostinen ja ennustava työkalu endometrioosille Endometrioosi on yleinen hyvänlaatuinen, hormoneista riippuvainen tulehduksellinen lisääntymisikäisten naisten gynekologinen sairaus, joka kuormittaa terveydenhuoltojärjestelmää merkittävästi. Endometrioositaudin oireita ovat mm. voimakas kuukautiskipu, jatkuva lantion alueen kipu ja hedelmättömyys. Sairaus määritellään kohdun limakalvon kaltaisen kudoksen esiintymisenä kohdun ulkopuolella sekä siihen liittyvänä vatsakalvon tulehduksena. Endometrioosin etiologia on monitahoinen, ja laajasta tutkimuksesta huolimatta edelleen huonosti tunnettu. Kesto taudin puhkeamisesta lopullisen diagnoosin saamiseen on usein jopa 7–12 vuotta. Sairauteen ei tunneta parannuskeinoa, mutta oireita voidaan lievittää esimerkiksi hormonaalisilla lääkkeillä (joilla on usein monia sivuvaikutuksia ja jotka estävät raskauden) tai leikkauksella, johon liittyy omat tunnetut riskit. Nykyiset ei-invasiiviset diagnoosityökalut eivät ole riittävän luotettavia sairauden tunnistamiseen, ja varma endometrioosin diagnoosi saavutetaan laparoskopian tai laparotomian avulla. Tämä tutkimus perustui kahteen prospektiiviseen kohorttiin: ENDOMET-tutkimuk-seen, johon osallistui 137 endometrioosipotilasta ja 62 terveellistä naista, sekä PROENDO-tutkimukseen, johon osallistui 138 endometrioosipotilasta ja 33 terveellistä naista. Tässä tutkimuksessa pitkän aikavälin tavoitteemme oli löytää uusia työkalujen endometrioosin diagnosointiin, sekä ymmärtää endometrioosin etiologiaa ja patogeneesiä. Ensimmäisessä vaiheessa loimme EndometDB –tietokannan PostgreSQL-ohjelmointi-kielellä. Tämän osittain avoimeen käyttöön vapautetun tietokannan avulla voidaan tutkia genomin, esimerkiksi kaikkien tunnettujen geenien ilmentymistä peritoneumissa, endo-metriumissa ja endometrioosipotilaiden endometrioosileesioissa EndometDB-tietokantaan kerättyjä tietoja käytettiin oireiden ja biomarkkeripohjaisen ennustemallin kehittämiseen ja validointiin. Malli tuottaa riskinarvioinnin endometrioositaudin varhaiseen ennustamiseen ilman laparoskopiaa. Käyttäen EndometDB-tietokannan tietoja havaitsimme, että endo-metrioositautikudoksessa tapahtui voimakkaita geeni-ilmentymisen muutoksia erityisesti geeneissä, jotka liittyvät WNT-signalointireitin säätelyyn. Keskeisin löydös oli, että SFRP-2 proteiinin ilmentyminen oli huomattavasti koholla endometrioosikudoksessa ja SFRP-2 proteiinin immunohistokemiallinen värjäys erottaa endometrioosin tautikudoksen terveestä kudoksesta aiempia merkkiaineita paremmin. Löydetyllä menetelmällä voidaan siten selvittää tautikudoksen laajuus ja tarvittaessa osoittaa, että leikkauksella on kyetty poistamaan koko sairas kudos

    MACHINE LEARNING APPROACHES FOR BIOMARKER IDENTIFICATION AND SUBGROUP DISCOVERY FOR POST-TRAUMATIC STRESS DISORDER

    Get PDF
    Post-traumatic stress disorder (PTSD) is a psychiatric disorder caused by environmental and genetic factors resulting from alterations in genetic variation, epigenetic changes and neuroimaging characteristics. There is a pressing need to identify reliable molecular and physiological biomarkers for accurate diagnosis, prognosis, and treatment, as well to deepen the understanding of PTSD pathophysiology. Machine learning methods are widely used to infer patterns from biological data, identify biomarkers, and make predictions. The objective of this research is to apply machine learning methods for the accurate classification of human diseases from genome-scale datasets, focusing primarily on PTSD.The DoD-funded Systems Biology of PTSD Consortium has recruited combat veterans with and without PTSD for measurement of molecular and physiological data from blood or urine samples with the goal of identifying accurate and specific PTSD biomarkers. As a member of the Consortium with access to these PTSD multiple omics datasets, we first completed a project titled Clinical Subgroup-Specific PTSD Classification and Biomarker Discovery. We applied machine learning approaches to these data to build classification models consisting of molecular and clinical features to predict PTSD status. We also identified candidate biomarkers for diagnosis, which improves our understanding of PTSD pathogenesis. In a second project, entitled Multi-Omic PTSD Subgroup Identification and Clinical Characterization, we applied methods for integrating multiple omics datasets to investigate the complex, multivariate nature of the biological systems underlying PTSD. We identified an optimal 2 PTSD subgroups using two different machine learning approaches from 82 PTSD positive samples, and we found that the subgroups exhibited different remitting behavior as inferred from subjects recalled at a later time point. The results from our association, differential expression, and classification analyses demonstrated the distinct clinical and molecular features characterizing these subgroups.Taken together, our work has advanced our understanding of PTSD biomarkers and subgroups through the use of machine learning approaches. Results from our work should strongly contribute to the precise diagnosis and eventual treatment of PTSD, as well as other diseases. Future work will involve continuing to leverage these results to enable precision medicine for PTSD

    Genomic insights and advanced machine learning: characterizing autism spectrum disorder biomarkers and genetic interactions

    Get PDF
    Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by altered brain connectivity and function. In this study, we employed advanced bioinformatics and explainable AI to analyze gene expression associated with ASD, using data from five GEO datasets. Among 351 neurotypical controls and 358 individuals with autism, we identified 3,339 Differentially Expressed Genes (DEGs) with an adjusted p-value (≤ 0.05). A subsequent meta-analysis pinpointed 342 DEGs (adjusted p-value ≤ 0.001), including 19 upregulated and 10 down-regulated genes across all datasets. Shared genes, pathogenic single nucleotide polymorphisms (SNPs), chromosomal positions, and their impact on biological pathways were examined. We identified potential biomarkers (HOXB3, NR2F2, MAPK8IP3, PIGT, SEMA4D, and SSH1) through text mining, meriting further investigation. Additionally, ‎we shed light on the roles of RPS4Y1 and KDM5D genes in neurogenesis and neurodevelopment. Our analysis detected 1,286 SNPs linked to ASD-related conditions, of which 14 high-risk SNPs were located on chromosomes 10 and X. We highlighted potential missense SNPs associated with FGFR inhibitors, suggesting that it may serve as a promising biomarker for responsiveness to targeted therapies. Our explainable AI model identified the MID2 gene as a potential ASD biomarker. This research unveils vital genes and potential biomarkers, providing a foundation for novel gene discovery in complex diseases

    Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment

    Get PDF
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics

    Transcriptomics in Toxicogenomics, Part III : Data Modelling for Risk Assessment

    Get PDF
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.Peer reviewe
    corecore