1,966 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Recommended from our members
Computational solutions for omics data
High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.National Institutes of Health (U.S.) (Grant GM081871
Precision Medicine Informatics: Principles, Prospects, and Challenges
Precision Medicine (PM) is an emerging approach that appears with the
impression of changing the existing paradigm of medical practice. Recent
advances in technological innovations and genetics, and the growing
availability of health data have set a new pace of the research and imposes a
set of new requirements on different stakeholders. To date, some studies are
available that discuss about different aspects of PM. Nevertheless, a holistic
representation of those aspects deemed to confer the technological perspective,
in relation to applications and challenges, is mostly ignored. In this context,
this paper surveys advances in PM from informatics viewpoint and reviews the
enabling tools and techniques in a categorized manner. In addition, the study
discusses how other technological paradigms including big data, artificial
intelligence, and internet of things can be exploited to advance the potentials
of PM. Furthermore, the paper provides some guidelines for future research for
seamless implementation and wide-scale deployment of PM based on identified
open issues and associated challenges. To this end, the paper proposes an
integrated holistic framework for PM motivating informatics researchers to
design their relevant research works in an appropriate context.Comment: 22 pages, 8 figures, 5 tables, journal pape
Haiguste ja koespetsiifiliste DNA metĂŒlatsioonil pĂ”hinevate biomarkerite uurimine
VĂ€itekirja elektrooniline versioon ei sisalda publikatsiooneDNA-s sisalduv geneetiline informatsioon annab vajalikud juhised organismi kasvuks ja arenguks. Lisaks DNA nukleotiidsele jĂ€rjestusele mĂ”jutavad neid protsesse ka DNA-s esinevad modifikatsioonid. Enim uuritud DNA modifikatsioon on DNA metĂŒlatsioon, mis tĂ€hendab metĂŒĂŒlrĂŒhma lisamist tsĂŒtosiini kĂŒlge. DNA on tihtilugu metĂŒleeritud regiooniti, moodustades niinimetatud metĂŒlatsioonimustreid. Need âmustridâ osalevad geeniekspressiooni regulatsioonis, lĂŒlitades teatud rakkudes geene sisse ja vĂ€lja vĂ”i kohandades nende aktiivsust. On oluline mĂ€rkida, et DNA metĂŒlatsioon on tugevalt mĂ”jutatud keskkonnateguritest, nimelt vastavalt keskkonnatingimustele vĂ”idakse teatud regioone metĂŒleerida vĂ”i vastupidi, metĂŒĂŒlrĂŒhmi eemaldada. Seega on DNA metĂŒlatsioon ĂŒheks vahelĂŒliks geneetika ja keskkonna vahel. Paljud neist âmustritestâ on omased tavalistele bioloogilistele protsessidele, kuid leidub ka selliseid, mis viitavad haiguse olemasolule. NĂ€iteks on spetsiifilisi metĂŒlatsioonimustreid tĂ€heldatud diabeedi, neuroloogiliste hĂ€irete ja vĂ€hi puhul. SeetĂ”ttu peetakse neid âmustreidâ ka headeks biomarkeri kandidaatideks, sobides iseloomustama nĂ€iteks teatud haiguste kulgu. KĂ€esolev vĂ€itekiri keskendubki DNA metĂŒlatsiooni uurimisele erinevates kudedes ja seisundites, et leida potentsiaalseid biomarkereid. Selleks kasutati erinevaid bioinformaatika ja statistika meetodeid. Kokku viidi lĂ€bi kolm publitseeritud uuringut, mille kĂ€igus uuriti nii koe- kui endometrioosispetsiifilisi biomarkeri kandidaate kui ka DNA metĂŒlatsiooni muutusi emaka endomeetriumi embrĂŒole vastuvĂ”tlikuks muutumise perioodil. Lisaks arendati doktoritöö raames vĂ€lja uudne ja kasutajasĂ”bralik veebirakendus â MethSurv, mis kasutades suurprojekti âThe Cancer Genome Atlasâ (TCGA) andmeid, vĂ”imaldab kasutajal uurida vĂ€hipatsientide elumust konkreetse DNA metĂŒlatsioonil pĂ”hineva prognostiliste markeri pĂ”hjal.DNA contains the genetic information required for the growth and development of the organism. In addition to the nucleotide sequence, certain chemical modifications influence the activity of the DNA. The most studied DNA modification is DNA methylation, where a methyl group is added to the cytosine base of the DNA. DNA is often methylated within a genomic region, forming so-called âmethylation patterns.â These "patterns" are involved in the regulation of gene expression by switching genes in and out of certain cells or adjusting their activity. Environmental factors strongly influence DNA methylation; wherein certain genomic regions may be methylated or unmethylated. Thus, methylation patterns serve as a mediator between the environment and genomes. Many of these "patterns" are inherited in normal biological processes. However, some of these patterns indicate the presence of the disease. For example, specific methylation patterns have been observed in diabetes, neurological disorders, and cancer. Therefore, methylation patterns are considered as biomarker candidates to characterize the progression of certain diseases or normal biological process. This thesis focuses on the study of DNA methylation in different tissues and conditions to identify potential biomarker candidates using various bioinformatics and statistical methods. In total, three studies were included in this thesis to investigate both tissue and endometriosis-specific biomarker candidates as well as changes in DNA methylation during the transition from pre-receptive to the receptive state of the endometrium. In addition, a novel and user-friendly web application MethSurv was developed in this thesis. MethSurv uses methylation and clinical data from the publicly available âThe Cancer Genome Atlasâ (TCGA). The MethSurv tool is aimed at assisting the scientific community in exploring methylation-based prognostic biomarkers.https://www.ester.ee/record=b522744
Wavelet-Based Cancer Drug Recommender System
A natureza molecular do cancro serve de base para estudos sistemĂĄticos de genomas
cancerĂgenos, fornecendo valiosos insights e permitindo o desenvolvimento de
tratamentos clĂnicos. Acima de tudo, estes estudos estĂŁo a impulsionar o uso clĂnico de
informação genómica na escolha de tratamentos, de outro modo não expectåveis, em
pacientes com diversos tipos de cancro, possibilitando a medicina de precisĂŁo.
Com isso em mente, neste projeto combinamos técnicas de processamento de imagem,
para aprimoramento de dados, e sistemas de recomendação para propor um ranking
personalizado de drogas anticancerĂgenas. O sistema Ă© implementado em Python e testado
usando uma base de dados que contém registos de sensibilidade a drogas, com mais de
310.000 IC50 que, por sua vez, descrevem a resposta de mais de 300 drogas
anticancerĂgenas em 987 linhas celulares cancerĂgenas.
ApĂłs vĂĄrias tarefas de prĂ©-processamento, sĂŁo realizadas duas experiĂȘncias. A primeira
experiĂȘncia usa as imagens originais de microarrays de DNA e a segunda usa as mesmas
imagens, mas submetidas a uma transformada wavelet. As experiĂȘncias confirmam que
as imagens de microarrays de DNA submetidas a transformadas wavelet melhoram o
desempenho do sistema de recomendação, otimizando a pesquisa de linhas celulares
cancerĂgenas com perfil semelhante ao da nova linha celular.
AlĂ©m disso, concluĂmos que as imagens de microarrays de DNA com transformadas de
wavelet apropriadas, não apenas fornecem informaçÔes mais ricas para a pesquisa de
utilizadores similares, mas tambĂ©m comprimem essas imagens com eficiĂȘncia,
otimizando os recursos computacionais.
Tanto quanto Ă© do nosso conhecimento, este projeto Ă© inovador no que diz respeito ao uso
de imagens de microarrays de DNA submetidas a transformadas wavelet, para perfilar
linhas celulares num sistema de recomendação personalizado de drogas anticancerĂgenas
EpiGe: A machine-learning strategy for rapid classification of medulloblastoma using PCR-based methyl-genotyping
Molecular classification of medulloblastoma is critical for the treatment of this brain tumor. Array-based DNA methylation profiling has emerged as a powerful approach for brain tumor classification. However, this technology is currently not widely available. We present a machine-learning decision support system (DSS) that enables the classification of the principal molecular groupsâWNT, SHH, and non-WNT/non-SHHâdirectly from quantitative PCR (qPCR) data. We propose a framework where the developed DSS appears as a user-friendly web-applicationâEpiGe-Appâthat enables automated interpretation of qPCR methylation data and subsequent molecular group prediction. The basis of our classification strategy is a previously validated six-cytosine signature with subgroup-specific methylation profiles. This reduced set of markers enabled us to develop a methyl-genotyping assay capable of determining the methylation status of cytosines using qPCR instruments. This study provides a comprehensive approach for rapid classification of clinically relevant medulloblastoma groups, using readily accessible equipment and an easy-to-use web-application.The study was supported by Associations of Parents and Families of Children with Cancer and by funding of the Spanish Ministry of for Science, Innovation and University (grant PI20/00519; PI CL) and the Foundation La MaratĂł TV3 (grant 201921-30; PI CL). We acknowledge the multidisciplinary team who helped in the molecular analyses and care of patients, and the BioBank Hospital Sant Joan de DĂ©u of the Spanish BioBank Network for sample procurement. We also acknowledge Marta Fortuny for communication strategy advice and Eduard Puig for legal assistance and data protection regulations. Authors acknowledge the SJD Fundraising Team.Peer ReviewedArticle signat per 23 autors/es: Soledad GĂłmez-GonzĂĄlez, Joshua Llano, Marta Garcia, Alicia Garrido-Garcia, Mariona Suñol, Isadora Lemos, Sara Perez-Jaume, Noelia Salvador, Nagore Gene-Olaciregui, Raquel Arnau GalĂĄn, Vicente Santa-MarĂa, Marta Perez-Somarriba, Alicia Castañeda, JosĂ© Hinojosa, Ursula Winter, Francisco Barbosa Moreira, Fabiana Lubieniecki, Valeria Vazquez, Jaume Mora, Ofelia Cruz, AndrĂ©s Morales La Madrid, Alexandre Perera, Cinzia Lavarino.Postprint (published version
- âŠ