Search CORE

16 research outputs found

The COMBREX Project: Design, Methodology, and Initial Results

Author: Allen Benjamin
Anton Brian P.
Bateman Alex
Bhagwat Ashok S.
Blumenthal Robert M.
Bollinger J. Martin
Brenner Steven E.
Brown Peter J.
Chang Woo-Suk
Choi Han-Pil
Columbus Linda
Crécy-Lagard Valerié de
DeLisi Charles
Faller Lina L.
Ferguson Donald
Ferrer Manuel
Fomenkov Alexey
Friedberg Iddo
Gadda Giovanni
Galperin Michael Y.
Gobeill Julien
Greiner Russell
Guleria Jyotsna
Haft Daniel
Horn David
Housman Genevieve
Hu Jie
Hu Zhenjun
Hunt John
Karp Peter
Kasif Simon
Klimke William
Klitgord Niels
Krebs Carsten
Letovsky Stanley
Levy-Moonshine Ami
Macelis Dana
Madupu Ramana
Maksad Almaz
Mark McGettrick
Martín María J.
Mazumdar Varun
Miller Jeffrey H.
Monahan Caitlin
Morgan Richard D.
Osmani Lais
Osterman Andrei L.
O’Donovan Claire
Palsson Bernhard
Plata Germán
Pokrzywa Revonda
Rachlin John
Roberts Richard J.
Rochussen Krista
Rodionov Dmitry A.
Rodionova Irina A.
Ruch Patrick
Rudd Kenneth E.
Salzberg Steven L.
Segre Daniel
Setterdahl Aaron
Sjölander Kimmen
Spain James
Steffen Martin
Sutton Granger
Swaminathan Rajeswari
Söll Dieter
Tao Kevin
Tate John
Tchigvintsev Dmitri
Vitkup Dennis
Xu Shuang-yong
Yakunin Alexander F.
Yi-Chien Chang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 05/06/2019
Field of study

© 2013 Brian P. et al.Prior to the “genomic era,” when the acquisition of DNA sequence involved significant labor and expense, the sequencing of genes was strongly linked to the experimental characterization of their products. Sequencing at that time directly resulted from the need to understand an experimentally determined phenotype or biochemical activity. Now that DNA sequencing has become orders of magnitude faster and less expensive, focus has shifted to sequencing entire genomes. Since biochemistry and genetics have not, by and large, enjoyed the same improvement of scale, public sequence repositories now predominantly contain putative protein sequences for which there is no direct experimental evidence of function. Computational approaches attempt to leverage evidence associated with the ever-smaller fraction of experimentally analyzed proteins to predict function for these putative proteins. Maximizing our understanding of function over the universe of proteins in toto requires not only robust computational methods of inference but also a judicious allocation of experimental resources, focusing on proteins whose experimental characterization will maximize the number and accuracy of follow-on predictions.COMBREX is funded by a GO grant from the National Institute of General Medical Sciences (NIGMS) (1RC2GM092602-01).Peer Reviewe

Digital.CSIC

Analysis of protein-coding genetic variation in 60,706 humans

Author: Abboud
Abecasis
Aguilar-Salinas
Altshuler David M.
Ardissino Diego
Arellano-Campos
Atzmon
Aukrust
Banks Eric
Barr
Bell
Bergen
Berghout Joanne
Birnbaum Daniel P.
Bjørkhaug
Blangero
Boehnke Michael
Bowden
Budman
Burtt
Centeno-Cruz
Chambers
Chambert
Clarke
Collins
Cooper David N.
Coppola
Cortes
Cox
Cummings Beryl B.
Córdova
Daly Mark J.
Danesh John
Deflaux Nicole
DePristo Mark
Do Ron
Donnelly Stacey
Duggirala
Duncan Laramie E.
Elosua Roberto
Estrada Karol
Farrall
Fennell Timothy
Fernandez-Lopez
Flannick Jason
Florez Jose C.
Fontanillas
Frayling
Freimer
Fromer Menachem
Fuchsberger
Gabriel Stacey B.
García-Ortiz
Gauthier Laura
Getz Gad
Glatt Stephen J.
Goel
Goldstein Jackie
González-Villalpando
González-Villalpando
Grados
Groop
Gupta Namrata
Gómez-Vázquez
Haiman
Hanis
Hattersley
Henderson
Hill Andrew J.
Hopewell
Howrigan Daniel
Huerta-Chagoya
Hultman Christina M.
Islas-Andrade
Jacobs
Jalilzadeh
Jenkinson
Jiménez-Morale
Karczewski Konrad J.
Kathiresan Sekar
Kiezun Adam
King
Kirov
Kooner
Kosmicki Jack A.
Kurki Mitja I.
Kyriakou
Kähler
Laakso Markku
Lee
Lehman
Lek Monkol
Lyon
MacArthur Daniel G.
MacMahon
Magnusson
Mahajan
Marrugat
Martínez-Hernández
Mathews
McCarroll Steven
McCarthy Mark I.
McGovern Dermot
McPherson Ruth
McVean
Meigs
Meitinger
Mendoza-Caamal
Mercader
Minikel Eric V.
Mohlke
Moonshine Ami Levy
Moran
Moreno-Macías
Morris
Najmi
Natarajan Pradeep
Neale Benjamin M.
Njølstad
O'Donnell-Luria Anne H.
O'Donovan
Ordóñez-Sánchez
Orozco Lorena
Owen
Palotie Aarno
Park
Pauls
Peloso Gina M.
Pierce-Hoffman Emma
Poplin Ryan
Posthuma
Purcell Shaun M.
Revilla-Monsalve
Riba
Ripke
Rivas Manuel A.
Rodríguez-Guillén
Rodríguez-Torres
Rose Samuel A.
Ruano-Rubio Valentin
Ruderfer Douglas M.
Saleheen Danish
Samocha Kaitlin E.
Sandor
Scharf Jeremiah M.
Seielstad
Shakir Khalid
Sklar Pamela
Sladek
Soberón
Spector
Stenson Peter D.
Stevens Christine
Sullivan Patrick F.
Tai
Teslovich
Thomas Brett P.
Tiao Grace
Tsuang Ming T.
Tukiainen Taru
Tuomilehto Jaakko
Tusie-Luna Maria T.
Walford
Ware James S.
Watkins Hugh C.
Weisburd Ben
Wilkens
Williams
Wilson James G.
Won Hong-Hee
Yu Dongmei
Zhao Fengmei
Zou James
Publication venue
Publication date: 01/01/2016
Field of study

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. We describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of truncating variants with 72% having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes

Carolina Digital Repository

Analysis of protein-coding genetic variation in 60,706 humans

Author: A Freischmidt
A Piton
Aarno Palotie
Adam Kiezun
Ami Levy Moonshine
Andrew J. Hill
Anne H. O’Donnell-Luria
B Vicoso
Ben Weisburd
Benjamin M. Neale
Beryl B. Cummings
BF Voight
Brett P. Thomas
Christina M. Hultman
Christine Stevens
CJ Bell
Daniel G. MacArthur
Daniel Howrigan
Daniel P. Birnbaum
Danish Saleheen
David M. Altshuler
David N. Cooper
Dermot McGovern
DF Gudbjartsson
DG MacArthur
DG MacArthur
Diego Ardissino
DN Cooper
Dongmei Yu
Douglas M. Ruderfer
Emma Pierce-Hoffman
Eric Banks
Eric V. Minikel
ET Lim
EV Minikel
FE Dewey
Fengmei Zhao
Gad Getz
Gina M. Peloso
Grace Tiao
H Jeong
H Li
Hong-Hee Won
Hugh C. Watkins
JA Tennessen
Jaakko Tuomilehto
Jack A. Kosmicki
Jackie Goldstein
James G. Wilson
James S. Ware
James Zou
Jason Flannick
Jeremiah M. Scharf
JM Zook
Joanne Berghout
John Danesh
Jose C. Florez
JX Chong
K-I Goh
Kaitlin E. Samocha
Karol Estrada
KE Samocha
Khalid Shakir
Konrad J. Karczewski
Laramie E. Duncan
Laura Gauthier
Lorena Orozco
M Fromer
M Stoneking
MA DePristo
Manuel A. Rivas
Maria T. Tusie-Luna
Mark DePristo
Mark I. McCarthy
Mark J. Daly
Markku Laakso
Menachem Fromer
Michael Boehnke
Ming T. Tsuang
Mitja I. Kurki
MJ Bamshad
Monkol Lek
Namrata Gupta
Nicole Deflaux
P Chagnon
P Sulem
Pamela Sklar
Patrick F. Sullivan
PD Stenson
Peter D. Stenson
Pradeep Natarajan
R Blekhman
Roberto Elosua
Ron Do
Ruth McPherson
Ryan Poplin
S Kathiresan
S Petrovski
S Richards
Samuel A. Rose
Sekar Kathiresan
Shaun M. Purcell
Stacey B. Gabriel
Stacey Donnelly
Stephen J. Glatt
Steven McCarroll
T Rolland
Taru Tukiainen
Timothy Fennell
Valentin Ruano-Rubio
W Fu
Y Itan
Y Xue
Publication venue
Publication date: 01/01/2016
Field of study

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.Peer reviewe

Crossref

VU Research Portal

Online Research @ Cardiff

The Jackson Laboratory: The Mouseion at the JAXlibrary

Harvard University - DASH

PubMed Central

eScholarship - University of California

Oxford University Research Archive

UPF Digital Repository

Helsingin yliopiston digitaalinen arkisto

Apollo (Cambridge)

Additional file 11: Figure S9. of Tools and best practices for data processing in allelic expression analysis

Author: Ami Levy-Moonshine (450739)
Eric Banks (151539)
Pejman Mohammadi (278857)
Stephane Castel (3588383)
Tuuli Lappalainen (95615)
Publication venue
Publication date
Field of study

QC measures improve the power to detect biologically relevant allelic expression at genes that have eQTLs (eGenes), where individuals that are heterozygous for the top eQTL SNP (eSNP) are expected to have more allelic expression than homozygous individuals (extended). a QC measures increase the significance of the difference between heterozygous and homozygous individuals within eGenes. b QC measures reduce the variance of allelic expression between individuals within eGenes. (TIFF 2856Â kb

The Francis Crick Institute

Additional file 13: Table S3. of Tools and best practices for data processing in allelic expression analysis

Author: Ami Levy-Moonshine (450739)
Eric Banks (151539)
Pejman Mohammadi (278857)
Stephane Castel (3588383)
Tuuli Lappalainen (95615)
Publication venue
Publication date
Field of study

Summary of QC problems for AE data, proposed solutions, and potential drawbacks. (XLSX 31Â kb

The Francis Crick Institute

Additional file 1: Figure S1. of Tools and best practices for data processing in allelic expression analysis

Author: Ami Levy-Moonshine (450739)
Eric Banks (151539)
Pejman Mohammadi (278857)
Stephane Castel (3588383)
Tuuli Lappalainen (95615)
Publication venue
Publication date
Field of study

Allelic expression signal from a population of monoclonal versus polyclonal cells. In the latter, standard RNA-sequencing will show allelic imbalance only when the two alleles are systematically differentially expressed, e.g., due to a regulatory variant or imprinting. (TIFF 3238Â kb

The Francis Crick Institute

Additional file 12: Figure S10. of Tools and best practices for data processing in allelic expression analysis

Author: Ami Levy-Moonshine (450739)
Eric Banks (151539)
Pejman Mohammadi (278857)
Stephane Castel (3588383)
Tuuli Lappalainen (95615)
Publication venue
Publication date
Field of study

Complete workflow for AE analysis illustrating appropriate quality control measures and filters. (TIFF 782Â kb

The Francis Crick Institute

Thousands of missed genes found in bacterial genomes and their analysis with COMBREX

Author: Ami Levy-Moonshine
Brian P Anton
Derrick E Wood
Henry Lin
Lais Osmani
Martin Steffen
Rajiswari Swaminathan
Simon Kasif
Steven L Salzberg
Wood Derrick E
Yi-Chien Chang
Publication venue: National Center for Biotechnology Information
Publication date: 01/01/2012
Field of study

The dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST. By analyzing 1,474 prokaryotic genome annotations in GenBank, we identify 13,602 likely missed genes that are homologs to non-hypothetical proteins, and 11,792 likely missed genes that are homologs only to hypothetical proteins, yet have supporting evidence of their protein-coding nature from COMBREX, a newly created gene function database. We also estimate the likelihood that each potential missing gene found is a genuine protein-coding gene using COMBREX. Our analysis of the causes of missed genes suggests that larger annotation centers tend to produce annotations with fewer missed genes than smaller centers, and many of the missed genes are short genes <300 bp. Over 1,000 of the likely missed genes could be associated with phenotype information available in COMBREX. 359 of these genes, found in pathogenic organisms, may be potential targets for pharmaceutical research. The newly identified genes are available on COMBREX’s website.https://doi.org/10.1186/1745-6150-7-3

Crossref

Springer

PubMed Central

Digital Repository at the University of Maryland

Author Correction: Comprehensive comparative analysis of 5′-end RNA-sequencing methods

Author: Adam L. Haber
Ami Levy Moonshine
Aviv Regev
Jen Q. Pan
Joshua Z. Levin
Justin Jacques
Madeline A. Lancaster
Michele A. Busby
Sean K. Simmons
Xi Shi
Xian Adiconis
Zhe Ji
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Enhancement of beta-sheet assembly by cooperative hydrogen bonds potential

Author: Ami Levy-Moonshine
Amir
Backer
Beck
Berman
Bonneau
Bradley
Bradley
Brenner
Brooks
Buck
Burgess
Chen Keasar
Chivian
Cornell
Dahiyat
El-ad David Amir
Fleming
Gibson
Godzik
Heinz
Huggins
Jorgensen
Kabsch
Kalisman
Keasar
Kolinski
Kolinski
Kortemme
Krieger
Krikpatrick
Levinthal
Levitt
Levitt
Li
Liu
Liwo
Maximova
Mayo
McDonald
Murzin
Neria
Orengo
Sokal
Summa
Trosset
Verlet
Weiner
Xiang
Yang
Yao
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: The roughness of energy landscapes is a major obstacle to protein structure prediction, since it forces conformational searches to spend much time struggling to escape numerous traps. Specifically, beta-sheet formation is prone to stray, since many possible combinations of hydrogen bonds are dead ends in terms of beta-sheet assembly. It has been shown that cooperative terms for backbone hydrogen bonds ease this problem by augmenting hydrogen bond patterns that are consistent with beta sheets. Here, we present a novel cooperative hydrogen-bond term that is both effective in promoting beta sheets and computationally efficient. In addition, the new term is differentiable and operates on all-atom protein models

Crossref

PubMed Central