Search CORE

1,966 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Challenges of Big Data Analysis

Author: Fan Jianqing
Han Fang
Liu Han
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/02/2014
Field of study

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

PubMed Central

Recommended from our members

Computational solutions for omics data

Author: A Butte
A Chatr-aryamontri
A Franceschini
A Joshi
A Lan
A Mortazavi
A Subramanian
A Tanay
AC Jungkamp
AJ Pinho
AK Wong
AR Whitney
B Langmead
B Langmead
B Paten
Bonnie Berger
BP Kelley
C Huttenhower
C Kingsford
C Trapnell
C Trapnell
C Trapnell
C Wang
CH Yeang
CJ Vaske
CS Liao
D Croft
D Earl
D Kim
D Kim
D Park
DB Allison
DB Jaffe
DR Zerbino
E Banks
E Banks
E Cerami
E Nabieva
E Segal
E Yeger-Lotem
EJ Rossin
ER Mardis
ES Lander
ET Wang
F Hach
F Hach
F Markowetz
F Ozsolak
F Vandin
F Vandin
F Vezzi
GE Zinman
H Li
H Li
I Ulitsky
I Ulitsky
IA Adzhubei
J Butler
J Clarke
J Flannick
J Goecks
J Lamb
J Pandey
JC Marioni
JC Venter
Jian Peng
JT Dudley
JT Leek
JT Simpson
JT Simpson
K Rhrissorrakrai
KI Goh
KY Yeung
L Parts
LD Stein
LH Hartwell
LM Heiser
LR Meyer
M Ascano
M Burrows
M Garber
M Gross
M Gstaiger
M Hafner
M Hsi-Yang Fritz
M Kircher
M Koyuturk
M Narayanan
M Reich
M Schatz
M Schmid
M Sirota
M Steffen
M Yandell
MB Gerstein
MB Gerstein
MC Brandon
MC Schatz
MG Grabherr
MH Maathuis
ML Metzker
Mona Singh
N Atias
N de Souza
N Tuncbag
NP Palmer
NT Ingolia
O Hirose
O Litvin
O Ogasawara
O Stegle
O Vanunu
P Ferragina
P Flicek
P Jiang
P Kumar
P Lu
P Shannon
PA Pevzner
PE Compeau
PG Doyle
PO Brown
PR Loh
PR Schmid
R Colak
R Gaujoux
R Li
R Li
R Li
R Singh
RC Gentleman
S Anders
S Batzoglou
S Christley
S Deorowicz
S Erten
S Kohler
S Levy
S Navlakha
S Ng
S Suthram
SA Chowdhury
SD Kahn
SF Altschul
SG Tringe
SL Salzberg
SS Huang
SS Shen-Orr
T Barrett
T Ideker
T Michoel
TS Furey
U Manber
UD Akavia
W Ali
W Li
W Tembe
WJ Kent
X Liu
X Wang
X Zhou
Y Prat
Y Wang
Y Zhang
YA Kim
Z Tu
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2013
Field of study

High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.National Institutes of Health (U.S.) (Grant GM081871

Princeton University Open Access Repository

DSpace@MIT

Crossref

PubMed Central

Precision Medicine Informatics: Principles, Prospects, and Challenges

Author: Afzal Muhammad
Hussain Maqbool
Islam S. M. Riazul
Lee Sungyoung
Publication venue
Publication date: 03/11/2019
Field of study

Precision Medicine (PM) is an emerging approach that appears with the impression of changing the existing paradigm of medical practice. Recent advances in technological innovations and genetics, and the growing availability of health data have set a new pace of the research and imposes a set of new requirements on different stakeholders. To date, some studies are available that discuss about different aspects of PM. Nevertheless, a holistic representation of those aspects deemed to confer the technological perspective, in relation to applications and challenges, is mostly ignored. In this context, this paper surveys advances in PM from informatics viewpoint and reviews the enabling tools and techniques in a categorized manner. In addition, the study discusses how other technological paradigms including big data, artificial intelligence, and internet of things can be exploited to advance the potentials of PM. Furthermore, the paper provides some guidelines for future research for seamless implementation and wide-scale deployment of PM based on identified open issues and associated challenges. To this end, the paper proposes an integrated holistic framework for PM motivating informatics researchers to design their relevant research works in an appropriate context.Comment: 22 pages, 8 figures, 5 tables, journal pape

arXiv.org e-Print Archive

Aberdeen University Research

Haiguste ja koespetsiifiliste DNA metülatsioonil põhinevate biomarkerite uurimine

Author: Modhukur Vijayachitra
Publication venue
Publication date: 29/04/2019
Field of study

Väitekirja elektrooniline versioon ei sisalda publikatsiooneDNA-s sisalduv geneetiline informatsioon annab vajalikud juhised organismi kasvuks ja arenguks. Lisaks DNA nukleotiidsele järjestusele mõjutavad neid protsesse ka DNA-s esinevad modifikatsioonid. Enim uuritud DNA modifikatsioon on DNA metülatsioon, mis tähendab metüülrühma lisamist tsütosiini külge. DNA on tihtilugu metüleeritud regiooniti, moodustades niinimetatud metülatsioonimustreid. Need “mustrid“ osalevad geeniekspressiooni regulatsioonis, lülitades teatud rakkudes geene sisse ja välja või kohandades nende aktiivsust. On oluline märkida, et DNA metülatsioon on tugevalt mõjutatud keskkonnateguritest, nimelt vastavalt keskkonnatingimustele võidakse teatud regioone metüleerida või vastupidi, metüülrühmi eemaldada. Seega on DNA metülatsioon üheks vahelüliks geneetika ja keskkonna vahel. Paljud neist “mustritest“ on omased tavalistele bioloogilistele protsessidele, kuid leidub ka selliseid, mis viitavad haiguse olemasolule. Näiteks on spetsiifilisi metülatsioonimustreid täheldatud diabeedi, neuroloogiliste häirete ja vähi puhul. Seetõttu peetakse neid “mustreid“ ka headeks biomarkeri kandidaatideks, sobides iseloomustama näiteks teatud haiguste kulgu. Käesolev väitekiri keskendubki DNA metülatsiooni uurimisele erinevates kudedes ja seisundites, et leida potentsiaalseid biomarkereid. Selleks kasutati erinevaid bioinformaatika ja statistika meetodeid. Kokku viidi läbi kolm publitseeritud uuringut, mille käigus uuriti nii koe- kui endometrioosispetsiifilisi biomarkeri kandidaate kui ka DNA metülatsiooni muutusi emaka endomeetriumi embrüole vastuvõtlikuks muutumise perioodil. Lisaks arendati doktoritöö raames välja uudne ja kasutajasõbralik veebirakendus – MethSurv, mis kasutades suurprojekti “The Cancer Genome Atlas” (TCGA) andmeid, võimaldab kasutajal uurida vähipatsientide elumust konkreetse DNA metülatsioonil põhineva prognostiliste markeri põhjal.DNA contains the genetic information required for the growth and development of the organism. In addition to the nucleotide sequence, certain chemical modifications influence the activity of the DNA. The most studied DNA modification is DNA methylation, where a methyl group is added to the cytosine base of the DNA. DNA is often methylated within a genomic region, forming so-called “methylation patterns.” These "patterns" are involved in the regulation of gene expression by switching genes in and out of certain cells or adjusting their activity. Environmental factors strongly influence DNA methylation; wherein certain genomic regions may be methylated or unmethylated. Thus, methylation patterns serve as a mediator between the environment and genomes. Many of these "patterns" are inherited in normal biological processes. However, some of these patterns indicate the presence of the disease. For example, specific methylation patterns have been observed in diabetes, neurological disorders, and cancer. Therefore, methylation patterns are considered as biomarker candidates to characterize the progression of certain diseases or normal biological process. This thesis focuses on the study of DNA methylation in different tissues and conditions to identify potential biomarker candidates using various bioinformatics and statistical methods. In total, three studies were included in this thesis to investigate both tissue and endometriosis-specific biomarker candidates as well as changes in DNA methylation during the transition from pre-receptive to the receptive state of the endometrium. In addition, a novel and user-friendly web application MethSurv was developed in this thesis. MethSurv uses methylation and clinical data from the publicly available “The Cancer Genome Atlas” (TCGA). The MethSurv tool is aimed at assisting the scientific community in exploring methylation-based prognostic biomarkers.https://www.ester.ee/record=b522744

DSpace at Tartu University Library

THE BIOLOGY OF GENOMES

Author: Celniker S.
Clark A.
Ponting C.
Weinstock G.
Publication venue
Publication date: 01/05/2010
Field of study

Cold Spring Harbor Laboratory Institutional Repository

Wavelet-Based Cancer Drug Recommender System

Author: Brandão Liliana Carina Pereira
Publication venue
Publication date: 01/01/2020
Field of study

A natureza molecular do cancro serve de base para estudos sistemáticos de genomas cancerígenos, fornecendo valiosos insights e permitindo o desenvolvimento de tratamentos clínicos. Acima de tudo, estes estudos estão a impulsionar o uso clínico de informação genómica na escolha de tratamentos, de outro modo não expectáveis, em pacientes com diversos tipos de cancro, possibilitando a medicina de precisão. Com isso em mente, neste projeto combinamos técnicas de processamento de imagem, para aprimoramento de dados, e sistemas de recomendação para propor um ranking personalizado de drogas anticancerígenas. O sistema é implementado em Python e testado usando uma base de dados que contém registos de sensibilidade a drogas, com mais de 310.000 IC50 que, por sua vez, descrevem a resposta de mais de 300 drogas anticancerígenas em 987 linhas celulares cancerígenas. Após várias tarefas de pré-processamento, são realizadas duas experiências. A primeira experiência usa as imagens originais de microarrays de DNA e a segunda usa as mesmas imagens, mas submetidas a uma transformada wavelet. As experiências confirmam que as imagens de microarrays de DNA submetidas a transformadas wavelet melhoram o desempenho do sistema de recomendação, otimizando a pesquisa de linhas celulares cancerígenas com perfil semelhante ao da nova linha celular. Além disso, concluímos que as imagens de microarrays de DNA com transformadas de wavelet apropriadas, não apenas fornecem informações mais ricas para a pesquisa de utilizadores similares, mas também comprimem essas imagens com eficiência, otimizando os recursos computacionais. Tanto quanto é do nosso conhecimento, este projeto é inovador no que diz respeito ao uso de imagens de microarrays de DNA submetidas a transformadas wavelet, para perfilar linhas celulares num sistema de recomendação personalizado de drogas anticancerígenas

Repositório Comum

EpiGe: A machine-learning strategy for rapid classification of medulloblastoma using PCR-based methyl-genotyping

Author: Arnau Galán Raquel
García López Marta
Garrido Garcia Alicia
Gene Olaciregui Nagore
Gómez González Soledad
Lavarino Cinzia
Lemos Isadora
Llano Viles Joshua
Mora Graupera Jaume
Morales Andrés
Perera Lluna Alexandre
Perez Jaume Sara
Perez Somarriba Marta
Salvador Marcos Noelia
Santa María López Vicente
Suñol Capella Mariona
Publication venue: Elsevier
Publication date: 15/09/2023
Field of study

Molecular classification of medulloblastoma is critical for the treatment of this brain tumor. Array-based DNA methylation profiling has emerged as a powerful approach for brain tumor classification. However, this technology is currently not widely available. We present a machine-learning decision support system (DSS) that enables the classification of the principal molecular groups—WNT, SHH, and non-WNT/non-SHH—directly from quantitative PCR (qPCR) data. We propose a framework where the developed DSS appears as a user-friendly web-application—EpiGe-App—that enables automated interpretation of qPCR methylation data and subsequent molecular group prediction. The basis of our classification strategy is a previously validated six-cytosine signature with subgroup-specific methylation profiles. This reduced set of markers enabled us to develop a methyl-genotyping assay capable of determining the methylation status of cytosines using qPCR instruments. This study provides a comprehensive approach for rapid classification of clinically relevant medulloblastoma groups, using readily accessible equipment and an easy-to-use web-application.The study was supported by Associations of Parents and Families of Children with Cancer and by funding of the Spanish Ministry of for Science, Innovation and University (grant PI20/00519; PI CL) and the Foundation La Marató TV3 (grant 201921-30; PI CL). We acknowledge the multidisciplinary team who helped in the molecular analyses and care of patients, and the BioBank Hospital Sant Joan de Déu of the Spanish BioBank Network for sample procurement. We also acknowledge Marta Fortuny for communication strategy advice and Eduard Puig for legal assistance and data protection regulations. Authors acknowledge the SJD Fundraising Team.Peer ReviewedArticle signat per 23 autors/es: Soledad Gómez-González, Joshua Llano, Marta Garcia, Alicia Garrido-Garcia, Mariona Suñol, Isadora Lemos, Sara Perez-Jaume, Noelia Salvador, Nagore Gene-Olaciregui, Raquel Arnau Galán, Vicente Santa-María, Marta Perez-Somarriba, Alicia Castañeda, José Hinojosa, Ursula Winter, Francisco Barbosa Moreira, Fabiana Lubieniecki, Valeria Vazquez, Jaume Mora, Ofelia Cruz, Andrés Morales La Madrid, Alexandre Perera, Cinzia Lavarino.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Stable expansion of high-grade serous ovarian cancer organoids requires a low-Wnt environment

Author: Berger H.
Braicu E.
Chekerov R.
Darb-Esfahani S.
Hoffmann K.
Kessler M.
Kulbe H.
Mangler M.
Meyer T.
Mollenkopf H.
Sehouli J.
Taube E.
Thillainadarasan S.
Zemojtel T.
Publication venue: 'EMBO'
Publication date: 03/02/2020
Field of study

MPG.PuRe