Search CORE

8,333 research outputs found

A System for Accessible Artificial Intelligence

Author: D. A. Ferrucci
EM Ronald
F Pedregosa
Ignacio Arnaldo
Jason H. Moore
JH Moore
JH Moore
JH Moore
Karthik Kannappan
Randal S. Olson
RS Olson
Sara Silva
William La Cava
William La Cava
Publication venue
Publication date: 10/08/2017
Field of study

While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.Comment: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 worksho

arXiv.org e-Print Archive

Crossref

Bioinformatics challenges for genome-wide association studies

Author: Ahmed
Altshuler
Amundadottir
Askland
Bureau
Bush
Calle
Chang
Chanock
Cook
Culverhouse
Donnelly
Easton
Eiberg
Elbers
Emily
F. W. Asselbergs
Greene
Hahn
Hahn
Hirschhorn
Holmans
Infante
J. H. Moore
Jakobsdottir
Kooperberg
Kraft
Lewontin
Lou
Lunetta
Manolio
Manolio
Marchini
McKinney
McKinney
Mei
Millstein
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Motsinger
Namkung
Nelson
Pan
Pattin
Reich
Reif
Ripperger
Ritchie
Ritchie
Ritchie
S. M. Williams
Schork
Sinnott-Armstrong
Spencer
Thornton-Wells
Torkamani
Velez
Wang
Wilke
Williams
Wongseree
Yu
Yu
Zhang
Publication venue: Oxford University Press
Publication date: 15/02/2010
Field of study

Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods

CiteSeerX

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

UCL Discovery

Dissertations of the University of Groningen

Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

Author: Kang Yun Pyo
Kim Hyung Min
Kwon Sung Won
NGHI TRAN
Nguyen Hoang Anh
Nguyen Phuoc Long
PARK SANG KI
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

Multidisciplinary Digital Publishing Institute

SNU Open Repository and Archive

포항공과대학교

Exploring Complex Disease Gene Relationships Using Simultaneous Analysis

Author: Romano Joseph D
Sarkar Indra Neil
Tharp William G
Publication venue: UVM ScholarWorks
Publication date: 01/01/2014
Field of study

The characterization of complex diseases remains a great challenge for biomedical researchers due to the myriad interactions of genetic and environmental factors. Adaptation of phylogenomic techniques to increasingly available genomic data provides an evolutionary perspective that may elucidate important unknown features of complex diseases. Here an automated method is presented that leverages publicly available genomic data and phylogenomic techniques. The approach is tested with nine genes implicated in the development of Alzheimer Disease, a complex neurodegenerative syndrome. The developed technique, which is an update to a previously described Perl script called “ASAP,” was implemented through a suite of Ruby scripts entitled “ASAP2,” first compiles a list of sequence-similarity based orthologues using PSI-BLAST and a recursive NCBI BLAST+ search strategy, then constructs maximum parsimony phylogenetic trees for each set of nucleotide and protein sequences, and calculates phylogenetic metrics (partitioned Bremer support values, combined branch scores, and Robinson-Foulds distance) to provide an empirical assessment of evolutionary conservation within a given genetic network. This study demonstrates the potential for using automated simultaneous phylogenetic analysis to uncover previously unknown relationships among disease-associated genes that may not have been apparent using traditional, single-gene methods. Furthermore, the results provide the first integrated evolutionary history of an Alzheimer Disease gene network and identify potentially important co-evolutionary clustering around components of oxidative stress pathways

UVM ScholarWorks

Population gene introgression and high genome plasticity for the zoonotic pathogen Streptococcus agalactiae

Author: Abbott
Abby
Almeida
Baily
Bankevich
Beerli
Beerli
Benjamini
Bertels
Bikard
Bisharat
Bishop
Bohnsack
Borchardt
Brett M Probert
Brochet
Bruen
Brynildsrud
Capella-Gutierrez
Chen
Chen
Cheng
Chiara Crestani
Chopra
Christopher D Town
Conrad
Croucher
Da Cunha
Delannoy
Delannoy
Dogan
Edgar
Enright
Erwin
Fernandez
Ferreira
Flores
Fluegge
Garrett H Springer
Gauthier
Glazko
Glazko
Greig
Guglielmini
Gupta
Hayley B Hassler
Heaps
Holt
Imperi
Inouye
Irina M Velsko
Jafar
Jaskowiak
Jeukens
Johri
Jones
Jones
Jorgensen
Joubrel
Kalimuddin
Kim
Konig
Langdon
Librado
Lin
Lindahl
Liu
Liu
Lopez-Sanchez
Loytynoja
Lyhs
Manning
Manning
Martins
Marttinen
Mather
McArthur
Md Tauqeer Alam
Michael J Stanhope
Morse
Murrell
Page
Pal
Paulina D Pavinski Bitar
Pedersen
Petrovska
Pond
Poyart
Price
Qin
Richards
Richards
Richards
Rosinski-Chupin
Ruth N Zadoks
Sahl
Sahl
Sahl
Scheffer
Schrieber
Seemann
Shannon D Manning
Shapiro
Shepheard
Sheppard
Spoor
Springman
Srivastava
Stamatakis
Stoddard
Sukhnanand
Supek
Tettelin
Tettelin
Tian
van der Mee-Marquet
Verani
Viana
Vincent P Richards
Yu
Zadoks
Zankari
Zerbino
Zhang
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/11/2019
Field of study

The influence that bacterial adaptation (or niche partitioning) within species has on gene spillover and transmission among bacteria populations occupying different niches is not well understood. Streptococcus agalactiae is an important bacterial pathogen that has a taxonomically diverse host range making it an excellent model system to study these processes. Here we analyze a global set of 901 genome sequences from nine diverse host species to advance our understanding of these processes. Bayesian clustering analysis delineated twelve major populations that closely aligned with niches. Comparative genomics revealed extensive gene gain/loss among populations and a large pan-genome of 9,527 genes, which remained open and was strongly partitioned among niches. As a result, the biochemical characteristics of eleven populations were highly distinctive (significantly enriched). Positive selection was detected and biochemical characteristics of the dispensable genes under selection were enriched in ten populations. Despite the strong gene partitioning, phylogenomics detected gene spillover. In particular, tetracycline resistance (which likely evolved in the human-associated population) from humans to bovine, canines, seals, and fish, demonstrating how a gene selected in one host can ultimately be transmitted into another, and biased transmission from humans to bovines was confirmed with a Bayesian migration analysis. Our findings show high bacterial genome plasticity acting in balance with selection pressure from distinct functional requirements of niches that is associated with an extensive and highly partitioned dispensable genome, likely facilitating continued and expansive adaptation

Crossref

Enlighten

MPG.PuRe

Novel Bayesian Networks for Genomic Prediction of Developmental Traits in Biomass Sorghum.

Author: Brown Patrick J
Buckler Edward S
Dos Santos Jhonathan PR
Fernandes Samuel B
Garcia Antonio AF
Gore Michael A
Leakey Andrew DB
Lozano Roberto
McCoy Scott
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

The ability to connect genetic information between traits over time allow Bayesian networks to offer a powerful probabilistic framework to construct genomic prediction models. In this study, we phenotyped a diversity panel of 869 biomass sorghum (Sorghum bicolor (L.) Moench) lines, which had been genotyped with 100,435 SNP markers, for plant height (PH) with biweekly measurements from 30 to 120 days after planting (DAP) and for end-of-season dry biomass yield (DBY) in four environments. We evaluated five genomic prediction models: Bayesian network (BN), Pleiotropic Bayesian network (PBN), Dynamic Bayesian network (DBN), multi-trait GBLUP (MTr-GBLUP), and multi-time GBLUP (MTi-GBLUP) models. In fivefold cross-validation, prediction accuracies ranged from 0.46 (PBN) to 0.49 (MTr-GBLUP) for DBY and from 0.47 (DBN, DAP120) to 0.75 (MTi-GBLUP, DAP60) for PH. Forward-chaining cross-validation further improved prediction accuracies of the DBN, MTi-GBLUP and MTr-GBLUP models for PH (training slice: 30-45 DAP) by 36.4-52.4% relative to the BN and PBN models. Coincidence indices (target: biomass, secondary: PH) and a coincidence index based on lines (PH time series) showed that the ranking of lines by PH changed minimally after 45 DAP. These results suggest a two-level indirect selection method for PH at harvest (first-level target trait) and DBY (second-level target trait) could be conducted earlier in the season based on ranking of lines by PH at 45 DAP (secondary trait). With the advance of high-throughput phenotyping technologies, our proposed two-level indirect selection framework could be valuable for enhancing genetic gain per unit of time when selecting on developmental traits

Directory of Open Access Journals

eScholarship - University of California

Enfoques genómicos y transcriptómicos hacia la selección de plantas

Author: Jazayeri Seyed Mehdi
Torres Ronald Villamar
Publication venue: 'Universidad Tecnica de Babahoyo'
Publication date: 01/12/2017
Field of study

Omics era has opened a new window to biology. Genomics and transcriptomics are two well-known fields by which plant selection and breeding are studied more easily and accurately. They provide useful information about the genes, transcripts, their functions those are the principal data for other subsequent approaches. Reference genomes of various plants are available and facilitate genome-based studies. The complex of genomic, transcriptomic data and the findings from variant methods like QTLs (quantitative trait loci), SNPs (single nucleotide polymorphism), CNVs (copy number variant), resequencing, GBS (genome-by-sequencing) are extremely important for plant selection in terms of price and time. The new workflows are routinely using different approaches and mixing them based on the genomic/transcriptomic information in their subsequent steps and are validated during the whole process toward screening genotypes possessing agronomically important desired trait. SNP-Seq presented hereinafter is a new approach for analyzing plants toward selection and screening by SNP sequencing in various genotypes simultaneously. It can accelerate the cycle of plant selection from genotypes to phenotypes in a reverse engineering way.La era Omica ha abierto una nueva ventana a la biología. La genómica y la transcriptómica son dos campos conocidos, con los cuales, la selección y el mejoramiento de plantas se estudian con mayor facilidad y precisión. Proporcionan información útil sobre los genes, las transcripciones, sus funciones y sirven como datos primordiales para otros enfoques posteriores. Los genomas de referencia de varias plantas han sido secuenciados, y están disponibles, facilitando así el acceso a información ómica indispensable para llevar a cabo estudios basados en estos mismos genomas. El total de datos genómicos, transcriptómicos y los hallazgos de métodos variantes que van desde QTL (rasgo cuantitativo), PSN (polimorfismo de un solo nucleótido), NCV (número de copias variante), GBS (genoma por secuencia) son extremadamente importantes para la selección y el mejoramiento de plantas en términos de precio y tiempo. Los nuevos flujos de trabajo utilizan diferentes enfoques basados en la información genómica / transcriptómica en pasos posteriores mezclándolos y se validan durante todo el proceso para seleccionar genotipos que posean un rasgo deseado agronómicamente importante. SNP-Seq, que se presenta en este artículo, es un nuevo enfoque para analizar las plantas hacia la selección y la detección mediante secuenciación de SNP en varios genotipos simultáneamente. Este proceso puede acelerar el ciclo de selección de plantas desde los genotipos a los fenotipos en una forma de ingeniería inversa. &nbsp

Portal de Revistas Científicas de la Universidad Técnica de Babahoyo

Directory of Open Access Journals

Biological data sciences in genome research

Author: Schatz M. C.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/10/2015
Field of study

The last 20 years have been a remarkable era for biology and medicine. One of the most significant achievements has been the sequencing of the first human genomes, which has laid the foundation for profound insights into human genetics, the intricacies of regulation and development, and the forces of evolution. Incredibly, as we look into the future over the next 20 years, we see the very real potential for sequencing more than 1 billion genomes, bringing even deeper insight into human genetics as well as the genetics of millions of other species on the planet. Realizing this great potential for medicine and biology, though, will only be achieved through the integration and development of highly scalable computational and quantitative approaches that can keep pace with the rapid improvements to biotechnology. In this perspective, I aim to chart out these future technologies, anticipate the major themes of research, and call out the challenges ahead. One of the largest shifts will be in the training used to prepare the class of 2035 for their highly interdisciplinary world

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Updates in metabolomics tools and resources: 2014-2015

Author: Misra Biswapriya B.
van der Hooft Justin
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

Enlighten

Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases

Author: Okser Sebastian
Publication venue: Turku Centre for Computer Science
Publication date: 19/08/2015
Field of study

Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have aﬀorded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to eﬀectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including ﬁlter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be eﬀective at predicting the disease phenotypes, but also doing so eﬃciently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.Siirretty Doriast

UTUPub