Search CORE

342 research outputs found

Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects

Author: Daniel MacArthur
Gregory Valiant
James Zou
Kaitlin Samocha
Konrad Karczewski
Mark Daly
Monkol Lek
null null
Paul Valiant
Shamil Sunyaev
Siu On Chan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/11/2015
Field of study

As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.United States. National Institutes of Health (U54DK105566)United States. National Institutes of Health (R01GM104371

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association

Author: Altman Russ B.
Karczewski Konrad J.
Snyder Michael
Tatonetti Nicholas P.
Publication venue
Publication date: 01/01/2014
Field of study

Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Структурно-семантичний аналіз еврісемантів української мови (на матеріалі лексико-семантичного поля "річ")

Author: Asselbergs Folkert W
Barnes Michael R
Cushman Mary
Drenos Fotios
Duan Qing
Elbers Clara C
Guo Yiran
Karczewski Konrad J
Keating Brendan J
Lange Leslie A
Lanktree Matthew B
Li Yun
North Kari E
Reiner Alex P
Tragante Vinicius
Wilson James G
Zhang Guosheng
Publication venue: Кримський науковий центр НАН України і МОН України
Publication date: 01/01/2007
Field of study

В статье рассматриваются лексико-семантические особенности эврисемантов в украинском языке, осуществляется их семантическая классификация, методом компонентного анализа проводится структурный анализ. Представлен фрагмент иерархично упорядоченной парадигмы широкозначных имен существительных, состоящий из ЛСГ "Предмет" и "Дело".У статті розглядаються лексико-семантичні особливості еврісемантів української мови, здійснюється їх семантична класифікація, за допомогою компонентного аналізу проводиться структурний аналіз. Подається фрагмент ієрархічно впорядкованої парадигми широкозначних іменників, представлений ЛСГ "Предмет" та "Справа".In this article lexica-semantic peculiarities of everysemantical nouns in Ukrainian are considered. It was made semantic distinguishing and structural analysis of those elements. The everysemants of a lexica-semantic field "Thing", represented by two groups "Subject" and "Work", are disposed in specific hierarchy

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Crossref

PubMed Central

Carolina Digital Repository

Utrecht University Repository

Explore Bristol Research

STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

Author: Dudley Joel T.
Fernald Guy Haskin
Karczewski Konrad J.
Martin Alicia R.
Snyder Michael
Tatonetti Nicholas P.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately

2 and 5–10 hours to process a full exome sequence and

30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Quantifying supercoiling-induced denaturation bubbles in DNA

Author: Adamcik Jozef
Dietler Giovanni
Jeon Jae-Hyung
Karczewski Konrad J.
Metzler Ralf
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 27/02/2013
Field of study

In both eukaryotic and prokaryotic DNA sequences of 30-100 base-pairs rich in AT base-pairs have been identified at which the double helix preferentially unwinds. Such DNA unwinding elements are commonly associated with origins for DNA replication and transcription, and with chromosomal matrix attachment regions. Here we present a quantitative study of local DNA unwinding based on extensive single DNA plasmid imaging. We demonstrate that long-lived single-stranded denaturation bubbles exist in negatively supercoiled DNA, at the expense of partial twist release. Remarkably, we observe a linear relation between the degree of supercoiling and the bubble size, in excellent agreement with statistical modelling. Furthermore, we obtain the full distribution of bubble sizes and the opening probabilities at varying salt and temperature conditions. The results presented herein underline the important role of denaturation bubbles in negatively supercoiled DNA for biological processes such as transcription and replication initiation in vivo

Infoscience - École polytechnique fédérale de Lausanne

Fast DEM collision checks on multicore nodes.

Author: Deelman Ewa
Dongarra J.J.
Karczewski Konrad
Koziara Tomasz
Krestenitis Konstantinos
Weinzierl Tobias
Wyrzykowski Roman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Many particle simulations today rely on spherical or analytical particle shape descriptions. They find non-spherical, triangulated particle models computationally infeasible due to expensive collision detections. We propose a hybrid collision detection algorithm based upon an iterative solve of a minimisation problem that automatically falls back to a brute-force comparison-based algorithm variant if the problem is ill-posed. Such a hybrid can exploit the vector facilities of modern chips and it is well-prepared for the arising manycore era. Our approach pushes the boundary where non-analytical particle shapes and the aligning of more accurate first principle physics become manageable

Durham Research Online

Crossref

SAIGE-GENE plus improves the efficiency and accuracy of set-based rare variant association tests

Author: Bi Wenjian
Daly Mark J.
Dey Kushal K.
Jagadeesh Karthik A.
Karczewski Konrad J.
Lee Seunggeun
Neale Benjamin M.
Zhao Zhangchen
Zhou Wei
Publication venue
Publication date: 01/01/2022
Field of study

Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) SAIGE-GENE+ performs set-based rare variant association tests with improved type 1 error control and computational efficiency by collapsing ultra-rare variants and conducting multiple tests corresponding to different minor allele frequency cutoffs and annotations.Peer reviewe

Crossref

PubMed Central

Helsingin yliopiston digitaalinen arkisto

An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.

Author: Charrier Dominic E.
Deelman Ewa
Dongarra J.J.
Karczewski Konrad
Weinzierl Tobias
Wyrzykowski Roman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness

Durham Research Online

Crossref

Base-specific mutational intolerance near splice sites clarifies the role of nonessential splice nucleotides

Author: Daly Emma
Daly Mark J.
Karczewski Konrad J.
MacArthur Daniel G.
Neale Benjamin M.
Rivas Manuel A.
Samocha Kaitlin E.
Schmandt Ben
Zhang Sidi
Publication venue
Publication date: 01/06/2018
Field of study

Variation in RNA splicing (i.e., alternative splicing) plays an important role in many diseases. Variants near 5' and 3' splice sites often affect splicing, but the effects of these variants on splicing and disease have not been fully characterized beyond the two "essential" splice nucleotides flanking each exon. Here we provide quantitative measurements of tolerance to mutational disruptions by position and reference allele-alternative allele combinations. We show that certain reference alleles are particularly sensitive to mutations, regardless of the alternative alleles into which they are mutated. Using public RNA-seq data, we demonstrate that individuals carrying such variants have significantly lower levels of the correctly spliced transcript, compared to individuals without them, and confirm that these specific substitutions are highly enriched for known Mendelian mutations. Our results propose a more refined definition of the "splice region" and offer a new way to prioritize and provide functional interpretation of variants identified in diagnostic sequencing and association studies.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

Genetic effects on gene expression across human tissues

Author: Abell Nathan S.
Abell Nathan S.
Addington Anjene
Addington Anjene M.
Aguet François
Aguet François
Akey Joshua M.
Ardlie Kristin G.
Ardlie Kristin G.
Balliu Brunilda
Balliu Brunilda
Barcus Mary E.
Barcus Mary E.
Barker Laura K.
Barshir Ruth
Basha Omer
Bates Daniel
Battle Alexis
Billy Li Jin
Bogu Gireesh K.
Branton Philip A.
Branton Philip A.
Bridge Jason
Bridge Jason
Brigham Lori E.
Brigham Lori E.
Brown Andrew
Brown Andrew A.
Brown Christopher D.
Bustamante Carlos D.
Carithers Latarsha J.
Castel Stephane E.
Castel Stephane E.
Chan Joanne
Chen Lin S.
Chen Lin S.
Chiang Colby
Claussnitzer Melina
Conrad Donald F.
Conrad Donald F.
Cox Nancy J.
Cox Nancy J.
Craft Brian
Cummings Beryl B.
Cummings Beryl B.
Damani Farhan N.
Davis David A.
Davis David A.
Davis Joe R.
Davis Joe R.
Delaneau Olivier
Delaneau Olivier
Demanelis Kathryn
Dermitzakis Emmanouil T.
Dermitzakis Emmanouil T.
Diegel Morgan
Doherty Jennifer A.
Engelhardt Barbara E.
Eskin Eleazar
Eskin Eleazar
Feinberg Andrew P.
Fernando Marian S.
Ferreira Pedro G.
Flicek Paul
Foster Barbara A.
Foster Barbara A.
Frésard Laure
Frésard Laure
Gamazon Eric R.
Gamazon Eric R.
Garrido-Martín Diego
Garrido-Martín Diego
Gelfand Ellen T.
Gelfand Ellen T.
Getz Gad
Getz Gad
Gewirtz Ariel D.H.
Gewirtz Ariel D.H.
Gillard Bryan M.
Gillard Bryan M.
Gliner Genna
Gliner Genna
Gloudemans Michael J.
Gloudemans Michael J.
Goldman Mary
Gould Sarah E.
Guan Ping
Guan Ping
Guigo Roderic
Guigó Roderic
Hadley Kane
Hadley Kane
Haeussler Maximilian
Hall Ira M.
Halow Jessica
Han Buhm
Han Buhm
Handsaker Robert E.
Hansen Kasper D.
Hariharan Pushpa
Hasz Richard
Hasz Richard
Haugen Eric
He Amy Z.
He Yuan
He Yuan
Hickey Peter F.
Hormozdiari Farhad
Hormozdiari Farhad
Hou Lei
Howald Cedric
Huang Katherine H.
Huang Katherine H.
Hunter Marcus
Hunter Marcus
Hunter Steven
Jasmine Farzana
Jewell Scott D.
Jewell Scott D.
Jian Ruiqi
Jiang Lihua
Jo Brian
Jo Brian
Johns Christopher
Johns Christopher
Johnson Audra
Johnson Mark
Johnson Mark
Juettemann Thomas
Kang Eun Yong
Karasik Ellen
Karasik Ellen
Karczewski Konrad J.
Kashin Seva
Kaul Rajinder
Kellis Manolis
Kellis Manolis
Kent W. James
Kibriya Muhammad G.
Kim Yungil
Kim-Hellmuth Sarah
Koester Susan
Koester Susan E.
Kopen Gene
Kopen Gene
Kumar Rachna
Kumar Rachna
Kyung Im Hae
Lappalainen Tuuli
Lappalainen Tuuli
Lee Christopher M.
Lee Kristen
Leinweber William F.
Leinweber William F.
Lek Monkol
Lek Monkol
Li Gen
Li Gen
Li Qin
Li Xiao
Li Xiao
Li Xiao
Li Xin
Li Xin
Lin Jessica
Lin Shin
Linder Sandra
Linke Caroline
Little A. Roger
Little A. Roger
Liu Boxiang
Liu Boxiang
Liu Yaping
Lockart Nicole C.
Lockhart Nicole C.
Lonsdale John T.
Lonsdale John T.
MacArthur Daniel G.
MacArthur Daniel G.
Mangul Serghei
Martin Casey
Mash Deborah C.
Mash Deborah C.
Matose Takunda
Maurano Matthew T.
McCarthy Mark I.
McCarthy Mark I.
McDonald Alisa
McDonald Alisa
McDowell Ian C.
McDowell Ian C.
McLean Jeffrey A.
Mestichelli Bernadette
Mestichelli Bernadette
Miklos Mark
Miklos Mark
Mohammadi Pejman
Mohammadi Pejman
Molinie Benoit
Monlong Jean
Montgomery Stephen B.
Montgomery Stephen B.
Montroy Robert G.
Montroy Robert G.
Moore Helen M.
Moore Helen M.
Mosavel Maghboeba
Moser Michael T.
Moser Michael T.
Muñoz-Aguirre Manuel
Myer Kevin
Myer Kevin
Ndungu Anne W.
Nedzel Jared L.
Nedzel Jared L.
Nelson Jemma
Neri Fidencio J.
Nguyen Duyen T.
Nguyen Duyen Y.
Nicolae Dan L.
Nierras Concepcion R.
Nobel Andrew B.
Nobel Andrew B.
Noble Michael S.
Noble Michael S.
Oliva Meritxell
Oliva Meritxell
Ongen Halit
Ongen Halit
Palowitch John J.
Palowitch John J.
Panousis Nikolaos
Papasaikas Panagiotis
Park Yongjin
Park YoSon
Park YoSon
Parsana Princy
Parsana Princy
Paten Benedict
Payne Anthony J.
Peterson Christine B.
Peterson Christine B.
Pierce Brandon L.
Qi Liqun
Quan Jie
Quon Gerald
Rao Abhi
Rao Abhi
Reverter Ferran
Rinaldi Nicola J.
Ripke Stephan
Rizzardi Lindsay F.
Robinson Karna L.
Roche Nancy V.
Roe Brian
Roe Bryan
Rohrer Daniel C.
Rohrer Daniel C.
Rosenbloom Kate R.
Ruffier Magali
Sabatti Chiara
Sabatti Chiara
Saha Ashis
Saha Ashis
Salvatore Michael
Salvatore Michael
Sammeth Michael
Sandstrom Richard
Scott Alexandra J.
Segrè Ayellet V.
Segrè Ayellet V.
Shabalin Andrey A.
Shabalin Andrey A.
Shad Saboor
Shad Saboor
Sheppard Dan
Shimko Tyler C.
Siminoff Laura A.
Singh Shilpi
Skol Andrew
Smith Anna M.
Smith Kevin S.
Snyder Michael P.
Sobin Leslie
Sobin Leslie
Sodaei Reza
Stamatoyannopoulos John
Stephens Matthew
Stranger Barbara E.
Stranger Barbara E.
Stranger Barbara E.
Strober Benjamin J.
Strober Benjamin J.
Struewing Jeffery P.
Struewing Jeffery P.
Sul Jae Hoon
Sul Jae Hoon
Sullivan Timothy J.
Tabor David E.
Tang Hua
Taylor Kieron
Teran Nicole A.
Thomas Jeffrey A.
Thomas Jeffrey A.
Tomaszewski Maria M.
Traino Heather M.
Trevanion Stephen J.
Trowbridge Casandra A.
Tsang Emily K.
Tsang Emily K.
Tsang Emily K.
Tukiainen Taru
Tukiainen Taru
Um Ki Sung
Undale Anita H.
Urbut Sarah
Valentino Kimberly M.
Valley Dana
Valley Dana R.
van de Bunt Martijn
Van Wittenberghe Nicholas
Vatanian Negin
Vaught Jimmie B.
Vivian John
Volpi Simona
Volpi Simona
Walters Gary
Walters Gary
Wang Gao
Wang Li
Wang Meng
Washington Michael
Washington Michael
Wen Xiaoquan
Wen Xiaoquan
Wheeler Joseph
Wheeler Joseph
Wright Fred A.
Wright Fred A.
Wu Fan
Xi Hualin S.
Yeger-Lotem Esti
Yong Kang Eun
Zappala Zachary
Zappala Zachary
Zaugg Judith B.
Zerbino Daniel R.
Zhang Hailei
Zhang Rui
Zhou Yi-Hui
Zhou Yi-Hui
Zhu Jingchun
Publication venue: Macmillan Publishers
Publication date: 01/01/2017
Field of study

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.Postprint (published version

Princeton University Open Access Repository

UNIL IRIS | Institutional Research Information System

University of Miami: Scholarship@Miami

UPF Digital Repository

Archive ouverte UNIGE

DSpace@MIT

UPCommons. Portal del coneixement obert de la UPC

Oxford University Research Archive

Discovery Research Portal

UPCommons (Universitat Politècnica de Catalunya)