Search CORE

41 research outputs found

Automating biomedical data science through tree-based pipeline optimization

Author: Andrews Peter C.
Kidd La Creis
Lavender Nicole A.
Moore Jason H.
Olson Randal S.
Urbanowicz Ryan J.
Publication venue
Publication date: 27/01/2016
Field of study

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

arXiv.org e-Print Archive

Scipedia

Predicting the Difficulty of Pure, Strict, Epistatic Models: Metrics for Simulated Model Selection

Author: Fisher Jonathan M
Kiralis Jeff
Moore Jason H
Urbanowicz Ryan J
Publication venue: Dartmouth Digital Commons
Publication date: 01/09/2012
Field of study

Background: Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection. Results: We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability. Conclusions: This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

A Classification and Characterization of Two-Locus, Pure, Strict, Epistatic Models for Simulation and Detection

Author: Granizo-Mackenzie Ambrose L. S.
Kiralis Jeff
Moore Jason H
Urbanowicz Ryan J
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2014
Field of study

BackgroundThe statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: (1) Classify and characterize pure, strict, two-locus epistatic models, (2) Investigate the effect of model ‘architecture’ on detection difficulty, and (3) Explore how adjusting GAMETES constraints influences diversity in the generated models

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection

Author: Ambrose LS Granizo-Mackenzie
B McKinney
C Barber
C Greene
D Shriner
E Brodie III
E Eichler
H Cordell
H Cordell
IB Hallgrímsdóttir
J Cheverud
J Moore
J Moore
J Moore
J Moore
J Moore
J Rambau
Jason H Moore
Jeff Kiralis
LW Hahn
M Wade
N Beerenwinkel
P Phillips
R Culverhouse
R Fisher
R Neuman
RJ Urbanowicz
RJ Urbanowicz
Ryan J Urbanowicz
W Bateson
W Frankel
W Kruskal
W Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Effect of the Achilles Tendon on Trabecular Structure in the Primate Calcaneus

Author: Alemseged
Alexander
Alexander
Barak
Biewener
Bouxsein
Bramble
Cotter
Cotter
Crompton
Day
Day
Deloison
DeSilva
DeSilva
Devlin
Eckstein
Fajardo
Fajardo
Fisk
Frey
Gebo
Griffin
Hanna
Harris
Hodgkinson
Hudelmaier
Ker
Latimer
Lazenby
Lieberman
Lovejoy
Maga
McCann
Milz
Myatt
O'Brien
Payne
Pontzer
Rauwerdink
Reeve
Ridler
Rose
Ryan
Ryan
Sati
Scherf
Sellers
Shaw
Swartz
Thackeray
Trussel
Urbanowicz
Vereecke
Vereecke
White
Whitehouse
Wolff
Zipfel
Publication venue: 'Wiley'
Publication date: 01/10/2013
Field of study

Humans possess the longest Achilles tendon relative to total muscle length of any primate, an anatomy that is beneficial for bipedal locomotion. Reconstructing the evolutionary history of the Achilles tendon has been challenging, in part because soft tissue does not fossilize. The only skeletal evidence for Achilles tendon anatomy in extinct taxa is the insertion site on the calcaneal tuber, which is rarely preserved in the fossil record and, when present, is equivocal for reconstructing tendon morphology. In this study, we used high‐resolution three‐dimensional microcomputed tomography (micro‐CT) to quantify the microstructure of the trabecular bone underlying the Achilles tendon insertion site in baboons, gibbons, chimpanzees, and humans to test the hypothesis that trabecular orientation differs among primates with different tendon morphologies. Surprisingly, despite their very different Achilles tendon lengths, we were unable to find differences between the trabecular properties of chimpanzee and human calcanei in this specific region. There were regional differences within the calcaneus in the degree of anisotropy (DA) in both chimpanzees and humans, though the patterns were similar between the two species (higher DA inferiorly in the calcaneal tuber). Our results suggest that while trabecular bone within the calcaneus varies, it does not respond to the variation of Achilles tendon morphology across taxa in the way we hypothesized. These results imply that internal bone architecture may not be informative for reconstructing Achilles tendon anatomy in early hominins. Anat Rec, 296:1509–1517, 2013. © 2013 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/100175/1/ar22739.pd

Crossref

Deep Blue Documents at the University of Michigan

Open Problems in Extracellular RNA Data Analysis: Insights From an ERCC Online Workshop.

Author: Alexander Roger P
Balaj Leonora
Chang Justin
Kaczor-Urbanowicz Karolina Elżbieta
Kitchen Robert R
LaPlante Emily
Losic Bojan
Mateescu Bogdan
Max Klaas E A
Mestdagh Pieter
Milosavljevic Aleksander
Roth Matthew
Rozowsky Joel
Spengler Ryan M
Stolovitzky Gustavo
Tosar Juan Pablo
Van Nostrand Eric L
White Brian S
Yu Rongshan
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/01/2022
Field of study

We now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communication Consortium (ERCC) hosted a two-day online workshop (April 19-20, 2021) on the unique challenges of exRNA data analysis. The goal was to foster an open dialog about best practices and discuss open problems in the field, focusing initially on small exRNA sequencing data. Video recordings of workshop presentations and discussions are available (https://exRNA.org/exRNAdata2021-videos/). There were three target audiences: experimentalists who generate exRNA sequencing data, computational and data scientists who work with those groups to analyze their data, and experimental and data scientists new to the field. Here we summarize issues explored during the workshop, including progress on an effort to develop an exRNA data analysis challenge to engage the community in solving some of these open problems

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

The Jackson Laboratory: The Mouseion at the JAXlibrary

Ghent University Academic Bibliography

PubMed Central

The human gut symbiont Ruminococcus gnavus shows specificity to blood group A antigen during mucin glycan foraging: Implication for niche colonisation in the gastrointestinal tract

Author: Angulo Jesus
Colvile Anna
Crost Emmanuelle H.
Griffiths Ryan
Hicks Thomas
Juge Nathalie
Latousakis Dimitrios
Martínez Gascueña Ana
Monaco Serena
Ndeh Didier
Owen C. David
Reynolds Raven S.
Spencer Daniel I. R.
Sánchez Salom Laura
Urbanowicz Paulina A.
Van Bakel Wouter
Walpole Samuel
Walsh Martin
Wu Haiyang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/12/2021
Field of study

AU The:human Pleaseconfirmthatallheadinglevelsarerepresentedcorrectly gut symbiont Ruminococcus gnavus displays strain-specific : repertoires of glycoside hydrolases (GHs) contributing to its spatial location in the gut. Sequence similarity network analysis identified strain-specific differences in blood-group endo-β-1,4-galactosidase belonging to the GH98 family. We determined the substrate and linkage specificities of GH98 from R. gnavus ATCC 29149, RgGH98, against a range of defined oligosaccharides and glycoconjugates including mucin. We showed by HPAEC-PAD and LC-FD-MS/MS that RgGH98 is specific for blood group A tetrasaccharide type II (BgA II). Isothermal titration calorimetry (ITC) and saturation transfer difference (STD) NMR confirmed RgGH98 affinity for blood group A over blood group B and H antigens. The molecular basis of RgGH98 strict specificity was further investigated using a combination of glycan microarrays, site-directed mutagenesis, and X-ray crystallography. The crystal structures of RgGH98 in complex with BgA trisaccharide (BgAtri) and of RgGH98 E411A with BgA II revealed a dedicated hydrogen network of residues, which were shown by site-directed mutagenesis to be critical to the recognition of the BgA epitope. We demonstrated experimentally that RgGH98 is part of an operon of 10 genes that is overexpresssed in vitro when R. gnavus ATCC 29149 is grown on mucin as sole carbon source as shown by RNAseq analysis and RT-qPCR confirmed RgGH98 expression on BgA II growth. Using MALDI-ToF MS, we showed that RgGH98 releases BgAtri from mucin and that pretreatment of mucin with RgGH98 confered R. gnavus E1 the ability to grow, by enabling the E1 strain to metabolise BgAtri and access the underlying mucin glycan chain. These data further support that the GH repertoire of R. gnavus strains enable them to colonise different nutritional niches in the human gut and has potential applications in diagnostic and therapeutics against infection

PubMed Central

University of East Anglia digital repository

Rule-based machine learning classification and knowledge discovery for complex problems

Author: Ryan J. Urbanowicz
Urbanowicz Ryan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref