Search CORE

8 research outputs found

SMaSH: A Benchmarking Toolkit for Human Genome Variant Calling

Author: Bresler Ma'ayan
Curtis Kristal
Hartl Christopher
Jordan Michael I.
Liptrap Jesse
Newcomb Julie
Patterson David
Song Yun S.
Talwalkar Ameet
Terhorst Jonathan
Publication venue
Publication date: 05/01/2014
Field of study

Motivation: Computational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad-hoc and incomplete. Agreement on benchmarking variant calling methods would stimulate development of genomic processing tools and facilitate communication among researchers. Results: We propose SMaSH, a benchmarking methodology for evaluating human genome variant calling algorithms. We generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes, and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on this benchmarking data. Moreover, we illustrate the utility of SMaSH to evaluate the performance of some leading single nucleotide polymorphism (SNP), indel, and structural variant calling algorithms. Availability: We provide free and open access online to the SMaSH toolkit, along with detailed documentation, at smash.cs.berkeley.edu

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

MLSys: The New Frontier of Machine Learning Systems

Author: Alistarh Dan
Alons Gustavo
Andersen David G
Bailis Peter
Bird Sarah
Carlini Nicholas
Catanzaro Bryan
Chayes Jennifer
Chung Eric
Dally Bill
De Sa Christopher
Dean Jeff
Dhillon Inderjit S
Dimakis Alexandros
Dubey Pradeep
Elkan Charles
Fursin Grigori
Ganger Gregory R
Getoor Lise
Gibbons Phillip B
Gibson Garth A
Gonzalez Joseph E
Gottschlich Justin E
Han Song
Hazelwood Kim
Huang Furong
Jaggi Martin
Jamieson Kevin
Jordan Michael I
Joshi Gauri
Khalaf Rania
Knight Jason
Konecny Jakub
Kraska Tim
Kumar Arun
Kyrillidis Anastasios
Lakshmiratan Aparna
Li Jing
Madden Samuel
McMahan H B
Meijer Erik
Mitliagkas Ioannis
Monga Rajat
Murray Derek
Olukotun Kunle
Papailiopoulos Dimitris
Pekhimenko Gennady
Ratner Alexander
Re Christopher
Rekatsinas Theodoros
Rostamizadeh Afshin
Sedghi Hanie
Sen Siddhartha
Smith Virginia
Smola Alex
Song Dawn
Sparks Evan
Stoica Ion
Sze Vivienne
Talwalkar Ameet
Udell Madeleine
Vanschoren Joaquin
Venkataraman Shivaram
Vinayak Rashmi
Weimer Markus
Wilson Andrew G
Xing Eric
Zaharia Matei
Zhang Ce
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Recommended from our members

SM a SH: a benchmarking toolkit for human genome variant calling

Author: Bresler Ma'ayan
Curtis Kristal
Hartl Christopher
Jordan Michael I
Liptrap Jesse
Newcomb Julie
Patterson David
Song Yun S
Talwalkar Ameet
Terhorst Jonathan
Publication venue: eScholarship, University of California
Publication date: 01/10/2014
Field of study

MotivationComputational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad hoc and incomplete. Agreement on benchmarking variant calling methods would stimulate development of genomic processing tools and facilitate communication among researchers.ResultsWe propose SMaSH, a benchmarking methodology for evaluating germline variant calling algorithms. We generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on these benchmarking data. Moreover, we illustrate the utility of SMaSH to evaluate the performance of some leading single-nucleotide polymorphism, indel and structural variant calling algorithms.Availability and implementationWe provide free and open access online to the SMaSH tool kit, along with detailed documentation, at smash.cs.berkeley.ed

eScholarship - University of California

SM a SH: a benchmarking toolkit for human genome variant calling

Author: Albers
Alkan
Ameet Talwalkar
Chen
Christopher Hartl
Church
David Patterson
DePristo
Earl
Frazer
Gnerre
Jesse Liptrap
Jonathan Terhorst
Julie Newcomb
Kedes
Kidd
Kidd
Kristal Curtis
Levy
Li
Lyon
Mardis
Ma’ayan Bresler
Michael I. Jordan
Nekrutenko
Patterson
The 1000 Genomes Project Consortium
The HapMap Consortium
Yalcin
Ye
Yun S. Song
Zook
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

A large-scale evaluation of computational protein function prediction.

Author: A Bairoch
A Sokolov
A Vazquez
Aalt D J van Dijk
Alberto Paccanaro
Alexandra M Schnoes
Alfonso E Romero
AM Schnoes
Ameet S Talwalkar
Amos Bairoch
Andreas M Lisewski
Andrew Wong
Ariane Boehm
Artem Sokolov
Asa Ben-Hur
Avik Datta
B Rost
Barbara Di Camillo
BE Engelhardt
Bhakti Limaye
Burkhard Rost
C Huttenhower
Cajo J F ter Braak
Cedric Landerer
Christian Schaefer
Christine Orengo
Christopher Funk
CJ Jeffery
D Barrell
D Lee
D Pal
D Sarkar
Daisuke Kihara
Damiano Piovesan
Daniel W A Buchan
David T Jones
DD Wang
Denis Krompass
DM Martin
Domenico Cozzetto
Dominik Achten
E Nabieva
EM Marcotte
Enrico Lavezzo
Eric Venner
Esmeralda Vicedo
F Enault
F Pazos
F Xin
Fanny Gatzmann
Florian Auer
Fran Supek
G Wang
Gaurav Pandey
Hagit Shatkay
Hai Fang
Haixuan Yang
Harshal Inamdar
I Friedberg
I Lee
Iddo Friedberg
Ingolf Sommer
J Wu
JA Gerlt
JA Hanley
Jari Björne
JC Costello
JD Watson
Jeffrey M Yunes
Jianlin Cheng
Julian Gough
Jussi Nokso-Koivisto
K Liolios
Karin Verspoor
Kevin Bryson
Kiley Graim
Liang Lan
Liisa Holm
LJ Jensen
M Ashburner
M Deng
M Pellegrini
M Punta
M Punta
Manfred Roos
Marco Falda
Mark Heron
Mark N Wass
Matko Bošnjak
Maximilian Hecht
Meghana Chitale
Michael J E Sternberg
Michael Kiening
Michael L Souza
Michal Linial
MN Wass
Nemanja Djuric
Nives Škunca
NL Nehrt
O Khersonsky
OG Troyanskaya
Olivier Lichtarge
P Bork
P Gaudet
Panče Panov
Paolo Fontana
Patricia C Babbitt
Patrik Koskinen
Peter Hönigschmid
Petri Törönen
Prajwal Bhat
Predrag Radivojac
Qingtian Gong
R Rentzsch
R Sharan
RA Laskowski
Rajendra Joshi
Rebecca Kaßner
Rita Casadio
Robert Rentzsch
S Addou
S Letovsky
Sašo Džeroski
SD Brown
SE Brenner
Sean D Mooney
Serkan Erdin
SF Altschul
Slobodan Vucetic
Stefan Seemayer
Stefanie Kaufmann
Stefano Toppo
Steven E Brenner
Sunitha K Manjari
Susanna Repo
T Hawkins
Tal Ronnen Oron
Tapio Salakoski
Tatjana Braun
Thomas A Hopf
Tobias Hamp
Tobias Wittkop
Tomislav Šmuc
V Portnoy
Weidong Tian
WT Clark
Wyatt T Clark
Xinran Dong
YA Kourmpetis
Yannick Mahlich
Yiannis A I Kourmpetis
Yuanpeng Zhou
Yuhong Guo
Zheng Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools

OPUS Augsburg

Archivio istituzionale della ricerca - Fondazione Edmund Mach

University of Miami: Scholarship Miami

Spiral - Imperial College Digital Repository

Enlighten

MPG.PuRe

Archivio istituzionale della ricerca - Università di Padova

Archive ouverte UNIGE

Crossref

UCL Discovery

PubMed Central

eScholarship - University of California