Search CORE

110 research outputs found

How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

Author: Anderson-Cook Christine M.
Fugate Michael L.
Lu Lu
Myers Kary L.
Pawley Norma
Quinlan Kevin R.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page

arXiv.org e-Print Archive

USFSP Digital Archive

Scholar Commons - University of South Florida

Binary Interval Search (BITS): A Scalable Algorithm for Counting Interval Intersections

Author: Gabriel Robins
Ira M. Hall
Kevin Skadron
R. Quinlan
Ryan M. Layer
Publication venue
Publication date
Field of study

Motivation: The comparison of diverse genomic datasets is fundamental to understanding genome biology. Researchers must explore many large datasets of genome intervals (e.g., genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect: that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features is crucial for future discovery. Results: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures such as Graphics Processing Units (GPUs) by illustrating its utility for efficient Monte-Carlo simulations measuring the significance of relationships between sets of genomic intervals

CiteSeerX

Rational Design of Temperature-Sensitive Alleles Using Computational Structure Prediction

Author: B Cunningham
B Lee
C Cortes
Ca Rohl
Christopher S. Poultney
CJ Burges
David Gresham
Dennis E. Shasha
EH Kellogg
G Chakshusmathi
Glenn L. Butterfoss
HM Muller
JM Word
JR Quinlan
K Bajaj
K Drew
KD Pruitt
Kevin Drew
Kristin C. Gunsalus
M Hall
Michelle R. Gutwein
N Eswar
N Siew
R Varadarajan
Richard Bonneau
RJ Dohmen
S Tweedie
SF Altschul
SF Altschul
TW Harris
Vladimir N. Uversky
WS Noble
WS Sandberg
Publication venue: Public Library of Science
Publication date: 02/09/2011
Field of study

Temperature-sensitive (ts) mutations are mutations that exhibit a mutant phenotype at high or low temperatures and a wild-type phenotype at normal temperature. Temperature-sensitive mutants are valuable tools for geneticists, particularly in the study of essential genes. However, finding ts mutations typically relies on generating and screening many thousands of mutations, which is an expensive and labor-intensive process. Here we describe an in silico method that uses Rosetta and machine learning techniques to predict a highly accurate “top 5” list of ts mutations given the structure of a protein of interest. Rosetta is a protein structure prediction and design code, used here to model and score how proteins accommodate point mutations with side-chain and backbone movements. We show that integrating Rosetta relax-derived features with sequence-based features results in accurate temperature-sensitive mutation predictions

Public Library of Science (PLOS)

Crossref

PubMed Central

Old Lessons for New Governance: Safety or Profit and the New Conventional Wisdom

Author: Bc Court
Cheryl Edwards
David Weil
Emile Tompa
Eric Tucker
Geldart
Joan Eakin
Kevin See
Michael Quinlan
Michele Campolieti
Norm Keith
Norman Keith
R V Scrocca
R V Transpav� Inc
See Sybil Geldart
Stefan Dubowski
Toni Schofield
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Crossref

A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

Author: Andreas Wilke
AR Quinlan
B Ewing
B Niu
C Quince
C Quince
C Quince
C von Mering
DH Huson
EA Dinsdale
F Meyer
Folker Meyer
HC Bravo
J Reeder
Jared Wilkening
JC Dohm
JG Caporaso
Kevin P. Keegan
KJ Hoff
KJ McKernan
M Margulies
Mark D'Souza
MJ Pallen
MP Cox
PJ Cock
R Seshadri
RA Freitas
RC Edgar
Scott Markel
SG Tringe
SM Huse
SM Huse
TD Harris
Travis Harrison
V Gomez-Alvarez
V Kunin
VM Markowitz
WC Kao
William L. Trimble
Y Sun
Publication venue: Public Library of Science
Publication date: 07/06/2012
Field of study

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

A Novel Protein LZTFL1 Regulates Ciliary Trafficking of the BBSome and Smoothened

Author: AV Loktev
B Craige
Charles C. Searby
CJ Haycraft
CL Williams
David K. Breslow
DY Nishimura
EA Otto
EM Zdobnov
FR Garcia-Gonzalo
G Ou
Gregory S. Barsh
H Jin
H Kiss
J Kim
JC Kim
JL Badano
JL Tobin
JS Domire
K Arnold
K Mykytyn
KC Corbit
Kevin Bugge
KF Lechtreck
KJ Livak
KM Misura
L Sang
Maxence V. Nachury
MV Nachury
N Kishimoto
NA Zaghloul
NF Berbari
O Soderberg
PJ Ocbina
PV Tran
Q Wei
Qihong Zhang
R Rohatgi
RB Sutton
RJ Quinlan
S Seo
SC Goetz
Seongjin Seo
SK Kim
V Singla
Val C. Sheffield
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Many signaling proteins including G protein-coupled receptors localize to primary cilia, regulating cellular processes including differentiation, proliferation, organogenesis, and tumorigenesis. Bardet-Biedl Syndrome (BBS) proteins are involved in maintaining ciliary function by mediating protein trafficking to the cilia. However, the mechanisms governing ciliary trafficking by BBS proteins are not well understood. Here, we show that a novel protein, Leucine-zipper transcription factor-like 1 (LZTFL1), interacts with a BBS protein complex known as the BBSome and regulates ciliary trafficking of this complex. We also show that all BBSome subunits and BBS3 (also known as ARL6) are required for BBSome ciliary entry and that reduction of LZTFL1 restores BBSome trafficking to cilia in BBS3 and BBS5 depleted cells. Finally, we found that BBS proteins and LZTFL1 regulate ciliary trafficking of hedgehog signal transducer, Smoothened. Our findings suggest that LZTFL1 is an important regulator of BBSome ciliary trafficking and hedgehog signaling

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Systematic Evaluation of Factors Influencing ChIP-Seq Fidelity

Author: A Barski
A Mortazavi
A Valouev
AP Boyle
AR Quinlan
Barbara J Wold
DA Nix
DS Johnson
DS Johnson
E Larschan
EG Wilbanks
G Benson
G Robertson
H Ji
Housheng Hansen He
I Kozarewa
J Rozowsky
Jason D Lieb
JC Dohm
Jennifer Zieba
Joanna O Mieczkowska
JW Ho
Kevin P White
L Teytelman
Matthew Slattery
N Negre
N Negre
Nicolas Negre
NU Rashid
P Kolasinska-Zwierz
Peter J Bickel
PV Kharchenko
PV Kharchenko
Q Li
Qunhua Li
R Jothi
Richard M Myers
RM Myers
S Pepke
S Roy
SE Celniker
Tae-Kyung Kim
Tao Liu
TD Laajala
TS Mikkelsen
WE Johnson
X Feng
X Shirley Liu
Y Zhang
Y Zhang
Yijun Ruan
Yiwen Chen
Yong Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We performed a systematic evaluation of how variations in sequencing depth and other parameters influence interpretation of Chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) experiments. Using Drosophila S2 cells, we generated ChIP-seq datasets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin state bias, open chromatin regions yielded higher coverage, which led to false positives if not corrected and had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP library complexity at high coverage. The removal of reads originating at the same base reduced false-positives while having little effect on detection sensitivity. Even at a depth of ~1 read/bp coverage of mappable genome, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely-used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle datasets with deep coverage

Crossref

Harvard University - DASH

PubMed Central

Carolina Digital Repository

Caltech Authors

Functional similarities between pigeon \u27milk\u27 and mammalian milk : induction of immune gene expression and modification of the microbiota

Pigeon ‘milk’ and mammalian milk have functional similarities in terms of nutritional benefit and delivery of immunoglobulins to the young. Mammalian milk has been clearly shown to aid in the development of the immune system and microbiota of the young, but similar effects have not yet been attributed to pigeon ‘milk’. Therefore, using a chicken model, we investigated the effect of pigeon ‘milk’ on immune gene expression in the Gut Associated Lymphoid Tissue (GALT) and on the composition of the caecal microbiota. Chickens fed pigeon ‘milk’ had a faster rate of growth and a better feed conversion ratio than control chickens. There was significantly enhanced expression of immune-related gene pathways and interferon-stimulated genes in the GALT of pigeon ‘milk’-fed chickens. These pathways include the innate immune response, regulation of cytokine production and regulation of B cell activation and proliferation. The caecal microbiota of pigeon ‘milk’-fed chickens was significantly more diverse than control chickens, and appears to be affected by prebiotics in pigeon ‘milk’, as well as being directly seeded by bacteria present in pigeon ‘milk’. Our results demonstrate that pigeon ‘milk’ has further modes of action which make it functionally similar to mammalian milk. We hypothesise that pigeon ‘lactation’ and mammalian lactation evolved independently but resulted in similarly functional products

Public Library of Science (PLOS)

CiteSeerX

Deakin Research Online

Crossref

Directory of Open Access Journals

PubMed Central

RMIT Research Repository

aCQUIRe

ACQUIRE

FigShare

Mapping H4K20me3 onto the chromatin landscape of senescent cells indicates a function in control of cell senescence and tumor suppression through preservation of genetic and epigenetic stability

Author: A Barski
A Chicas
A Freund
A Ivanov
A Krtolica
A Lujambio
A Van Den Broeck
AG Evertts
AG Evertts
AP Bracken
AR Quinlan
AS Hinrichs
B Langmead
B Sarg
Benjamin A. Garcia
C Michaloglou
C Trapnell
C Zang
Colin Nixon
CS Ross-Innes
D Kim
D Nicetto
David M. Nelson
Desiree Piscitello
DK Shumaker
DM Feldser
Duncan M. Baird
F d’Adda di Fagagna
Farah Jaber-Hijazi
G Ferbeyre
G Schotta
G Schotta
Gabriel L. Otte
GP Dimri
Gunnar Schotta
H Bierhoff
H Paterson
HA Cruickshanks
HA Cruickshanks
Harold Riethman
Hong Wu
IP Pogribny
J Giordano
J Vijg
JA Kreiling
JC Acosta
JC Jeyapalan
Jeffrey S. Pawlikowski
JH Martens
John J. Cole
Kevin T. Norris
KR Blahnik
L Guelen
M Benhamed
M Braig
M Collado
M De Cecco
M De Cecco
M Hahn
M Narita
M Narita
M Shogren-Knaak
M Van Meter
MD Plazas-Mayorca
MF Fraga
MJ Vogel
N Kourmouli
N Martin
Neil A. Robertson
Nicholas Stong
Nicola Neretti
Nikolay A. Pchelintsev
OA Sedelnikova
Peter D. Adams
PP Shah
R Benetti
R Brown
R Di Micco
R Di Micco
R Funayama
R Salama
R Zhang
R Zhang
RA Irizarry
RJ O'Sullivan
RM Marion
RS Hansen
S Phalke
S Rasheed
Shelley L. Berger
SJ Kuerbitz
Steven W. Criscione
T Chandra
T Hori
T Kuilman
T Kuilman
T Shimi
Taranjit Singh Rai
Tony McBryan
TS Rai
TW Kang
VC Gray-Schopfer
VP Tryndyak
W Cosme-Blanco
W Xue
William Clark
X Ye
Y Yokoyama
Z Chen
Z Dou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Histone modification H4K20me3 and its methyltransferase SUV420H2 have been implicated in suppression of tumorigenesis. The underlying mechanism is unclear, although H4K20me3 abundance increases during cellular senescence, a stable proliferation arrest and tumor suppressor process, triggered by diverse molecular cues, including activated oncogenes. Here, we investigate the function of H4K20me3 in senescence and tumor suppression. Results: Using immunofluorescence and ChIP-seq we determine the distribution of H4K20me3 in proliferating and senescent human cells. Altered H4K20me3 in senescence is coupled to H4K16ac and DNA methylation changes in senescence. In senescent cells, H4K20me3 is especially enriched at DNA sequences contained within specialized domains of senescence-associated heterochromatin foci (SAHF), as well as specific families of non-genic and genic repeats. Altered H4K20me3 does not correlate strongly with changes in gene expression between proliferating and senescent cells; however, in senescent cells, but not proliferating cells, H4K20me3 enrichment at gene bodies correlates inversely with gene expression, reflecting de novo accumulation of H4K20me3 at repressed genes in senescent cells, including at genes also repressed in proliferating cells. Although elevated SUV420H2 upregulates H4K20me3, this does not accelerate senescence of primary human cells. However, elevated SUV420H2/H4K20me3 reinforces oncogene-induced senescence-associated proliferation arrest and slows tumorigenesis in vivo. Conclusions: These results corroborate a role for chromatin in underpinning the senescence phenotype but do not support a major role for H4K20me3 in initiation of senescence. Rather, we speculate that H4K20me3 plays a role in heterochromatinization and stabilization of the epigenome and genome of pre-malignant, oncogene-expressing senescent cells, thereby suppressing epigenetic and genetic instability and contributing to long-term senescence-mediated tumor suppression

Crossref

Online Research @ Cardiff

Springer - Publisher Connector

Open Access LMU

PubMed Central

Ulster University's Research Portal

Enlighten

Research Repository and Portal - University of the West of Scotland

FigShare