1,966 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Challenges of Big Data Analysis

    Full text link
    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

    Precision Medicine Informatics: Principles, Prospects, and Challenges

    Get PDF
    Precision Medicine (PM) is an emerging approach that appears with the impression of changing the existing paradigm of medical practice. Recent advances in technological innovations and genetics, and the growing availability of health data have set a new pace of the research and imposes a set of new requirements on different stakeholders. To date, some studies are available that discuss about different aspects of PM. Nevertheless, a holistic representation of those aspects deemed to confer the technological perspective, in relation to applications and challenges, is mostly ignored. In this context, this paper surveys advances in PM from informatics viewpoint and reviews the enabling tools and techniques in a categorized manner. In addition, the study discusses how other technological paradigms including big data, artificial intelligence, and internet of things can be exploited to advance the potentials of PM. Furthermore, the paper provides some guidelines for future research for seamless implementation and wide-scale deployment of PM based on identified open issues and associated challenges. To this end, the paper proposes an integrated holistic framework for PM motivating informatics researchers to design their relevant research works in an appropriate context.Comment: 22 pages, 8 figures, 5 tables, journal pape

    Haiguste ja koespetsiifiliste DNA metĂŒlatsioonil pĂ”hinevate biomarkerite uurimine

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsiooneDNA-s sisalduv geneetiline informatsioon annab vajalikud juhised organismi kasvuks ja arenguks. Lisaks DNA nukleotiidsele jĂ€rjestusele mĂ”jutavad neid protsesse ka DNA-s esinevad modifikatsioonid. Enim uuritud DNA modifikatsioon on DNA metĂŒlatsioon, mis tĂ€hendab metĂŒĂŒlrĂŒhma lisamist tsĂŒtosiini kĂŒlge. DNA on tihtilugu metĂŒleeritud regiooniti, moodustades niinimetatud metĂŒlatsioonimustreid. Need “mustrid“ osalevad geeniekspressiooni regulatsioonis, lĂŒlitades teatud rakkudes geene sisse ja vĂ€lja vĂ”i kohandades nende aktiivsust. On oluline mĂ€rkida, et DNA metĂŒlatsioon on tugevalt mĂ”jutatud keskkonnateguritest, nimelt vastavalt keskkonnatingimustele vĂ”idakse teatud regioone metĂŒleerida vĂ”i vastupidi, metĂŒĂŒlrĂŒhmi eemaldada. Seega on DNA metĂŒlatsioon ĂŒheks vahelĂŒliks geneetika ja keskkonna vahel. Paljud neist “mustritest“ on omased tavalistele bioloogilistele protsessidele, kuid leidub ka selliseid, mis viitavad haiguse olemasolule. NĂ€iteks on spetsiifilisi metĂŒlatsioonimustreid tĂ€heldatud diabeedi, neuroloogiliste hĂ€irete ja vĂ€hi puhul. SeetĂ”ttu peetakse neid “mustreid“ ka headeks biomarkeri kandidaatideks, sobides iseloomustama nĂ€iteks teatud haiguste kulgu. KĂ€esolev vĂ€itekiri keskendubki DNA metĂŒlatsiooni uurimisele erinevates kudedes ja seisundites, et leida potentsiaalseid biomarkereid. Selleks kasutati erinevaid bioinformaatika ja statistika meetodeid. Kokku viidi lĂ€bi kolm publitseeritud uuringut, mille kĂ€igus uuriti nii koe- kui endometrioosispetsiifilisi biomarkeri kandidaate kui ka DNA metĂŒlatsiooni muutusi emaka endomeetriumi embrĂŒole vastuvĂ”tlikuks muutumise perioodil. Lisaks arendati doktoritöö raames vĂ€lja uudne ja kasutajasĂ”bralik veebirakendus – MethSurv, mis kasutades suurprojekti “The Cancer Genome Atlas” (TCGA) andmeid, vĂ”imaldab kasutajal uurida vĂ€hipatsientide elumust konkreetse DNA metĂŒlatsioonil pĂ”hineva prognostiliste markeri pĂ”hjal.DNA contains the genetic information required for the growth and development of the organism. In addition to the nucleotide sequence, certain chemical modifications influence the activity of the DNA. The most studied DNA modification is DNA methylation, where a methyl group is added to the cytosine base of the DNA. DNA is often methylated within a genomic region, forming so-called “methylation patterns.” These "patterns" are involved in the regulation of gene expression by switching genes in and out of certain cells or adjusting their activity. Environmental factors strongly influence DNA methylation; wherein certain genomic regions may be methylated or unmethylated. Thus, methylation patterns serve as a mediator between the environment and genomes. Many of these "patterns" are inherited in normal biological processes. However, some of these patterns indicate the presence of the disease. For example, specific methylation patterns have been observed in diabetes, neurological disorders, and cancer. Therefore, methylation patterns are considered as biomarker candidates to characterize the progression of certain diseases or normal biological process. This thesis focuses on the study of DNA methylation in different tissues and conditions to identify potential biomarker candidates using various bioinformatics and statistical methods. In total, three studies were included in this thesis to investigate both tissue and endometriosis-specific biomarker candidates as well as changes in DNA methylation during the transition from pre-receptive to the receptive state of the endometrium. In addition, a novel and user-friendly web application MethSurv was developed in this thesis. MethSurv uses methylation and clinical data from the publicly available “The Cancer Genome Atlas” (TCGA). The MethSurv tool is aimed at assisting the scientific community in exploring methylation-based prognostic biomarkers.https://www.ester.ee/record=b522744

    Wavelet-Based Cancer Drug Recommender System

    Get PDF
    A natureza molecular do cancro serve de base para estudos sistemĂĄticos de genomas cancerĂ­genos, fornecendo valiosos insights e permitindo o desenvolvimento de tratamentos clĂ­nicos. Acima de tudo, estes estudos estĂŁo a impulsionar o uso clĂ­nico de informação genĂłmica na escolha de tratamentos, de outro modo nĂŁo expectĂĄveis, em pacientes com diversos tipos de cancro, possibilitando a medicina de precisĂŁo. Com isso em mente, neste projeto combinamos tĂ©cnicas de processamento de imagem, para aprimoramento de dados, e sistemas de recomendação para propor um ranking personalizado de drogas anticancerĂ­genas. O sistema Ă© implementado em Python e testado usando uma base de dados que contĂ©m registos de sensibilidade a drogas, com mais de 310.000 IC50 que, por sua vez, descrevem a resposta de mais de 300 drogas anticancerĂ­genas em 987 linhas celulares cancerĂ­genas. ApĂłs vĂĄrias tarefas de prĂ©-processamento, sĂŁo realizadas duas experiĂȘncias. A primeira experiĂȘncia usa as imagens originais de microarrays de DNA e a segunda usa as mesmas imagens, mas submetidas a uma transformada wavelet. As experiĂȘncias confirmam que as imagens de microarrays de DNA submetidas a transformadas wavelet melhoram o desempenho do sistema de recomendação, otimizando a pesquisa de linhas celulares cancerĂ­genas com perfil semelhante ao da nova linha celular. AlĂ©m disso, concluĂ­mos que as imagens de microarrays de DNA com transformadas de wavelet apropriadas, nĂŁo apenas fornecem informaçÔes mais ricas para a pesquisa de utilizadores similares, mas tambĂ©m comprimem essas imagens com eficiĂȘncia, otimizando os recursos computacionais. Tanto quanto Ă© do nosso conhecimento, este projeto Ă© inovador no que diz respeito ao uso de imagens de microarrays de DNA submetidas a transformadas wavelet, para perfilar linhas celulares num sistema de recomendação personalizado de drogas anticancerĂ­genas

    EpiGe: A machine-learning strategy for rapid classification of medulloblastoma using PCR-based methyl-genotyping

    Get PDF
    Molecular classification of medulloblastoma is critical for the treatment of this brain tumor. Array-based DNA methylation profiling has emerged as a powerful approach for brain tumor classification. However, this technology is currently not widely available. We present a machine-learning decision support system (DSS) that enables the classification of the principal molecular groups—WNT, SHH, and non-WNT/non-SHH—directly from quantitative PCR (qPCR) data. We propose a framework where the developed DSS appears as a user-friendly web-application—EpiGe-App—that enables automated interpretation of qPCR methylation data and subsequent molecular group prediction. The basis of our classification strategy is a previously validated six-cytosine signature with subgroup-specific methylation profiles. This reduced set of markers enabled us to develop a methyl-genotyping assay capable of determining the methylation status of cytosines using qPCR instruments. This study provides a comprehensive approach for rapid classification of clinically relevant medulloblastoma groups, using readily accessible equipment and an easy-to-use web-application.The study was supported by Associations of Parents and Families of Children with Cancer and by funding of the Spanish Ministry of for Science, Innovation and University (grant PI20/00519; PI CL) and the Foundation La MaratĂł TV3 (grant 201921-30; PI CL). We acknowledge the multidisciplinary team who helped in the molecular analyses and care of patients, and the BioBank Hospital Sant Joan de DĂ©u of the Spanish BioBank Network for sample procurement. We also acknowledge Marta Fortuny for communication strategy advice and Eduard Puig for legal assistance and data protection regulations. Authors acknowledge the SJD Fundraising Team.Peer ReviewedArticle signat per 23 autors/es: Soledad GĂłmez-GonzĂĄlez, Joshua Llano, Marta Garcia, Alicia Garrido-Garcia, Mariona Suñol, Isadora Lemos, Sara Perez-Jaume, Noelia Salvador, Nagore Gene-Olaciregui, Raquel Arnau GalĂĄn, Vicente Santa-MarĂ­a, Marta Perez-Somarriba, Alicia Castañeda, JosĂ© Hinojosa, Ursula Winter, Francisco Barbosa Moreira, Fabiana Lubieniecki, Valeria Vazquez, Jaume Mora, Ofelia Cruz, AndrĂ©s Morales La Madrid, Alexandre Perera, Cinzia Lavarino.Postprint (published version
    • 

    corecore