Search CORE

12 research outputs found

A Deep Catalogue of Protein-Coding Variation in 983,578 Individuals

Author: Abecasis Gonçalo
Alegre-Díaz Jesús
Backman Joshua
Bai Xiaodong
Balasubramanian Suganthi
Bao Suying
Baras Aris
Berumen Jaime
Boutkov Boris
Bovijn Jonas
Cantor Michael
Chen Siying
Collins Rory
Cremona M Laura
Di Gioia Alessandro
Emberson Jonathan
Ganel Liron
Gelfman Sahar
Gokhale Sujit
Gorovits Alexander
Habegger Lukas
Hawes Alicia
Joseph Tyler
Kang Hyun Min
Kapoor Manav
Kessler Michael D
Kuri-Morales Pablo
Locke Adam E
Lopez Alexander
Mansfield Adam
Marchini Jonathan
Marcketta Anthony
Maxwell Evan
Mitra George
Nafde Mona
Overton John D
Rajagopal Veera M
Regeneron Genetics Center
Reid Jeffrey G
RGC-ME Cohort Partners
Salerno William
Sharma Deepika
Shuldiner Alan R
Staples Jeffrey
Sun Kathie Y
Tapia-Conyer Roberto
Thornton Timothy
Torres Jason
Varela Jennifer Rico
Zhang Chuanyi
Publication venue: DigitalCommons@TMC
Publication date: 01/07/2024
Field of study

Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser

DigitalCommons@The Texas Medical Center

Evaluation of next-generation sequencing software in mapping and assembly

Author: A Bashir
A Bateman
AC McHardy
AD Smith
B Langmead
BinBin Wang
C Trapnell
CA Tilford
D Campagna
D Hernandez
D Weese
DR Bentley
DR Zerbino
DS Horner
DW Bryant Jr
ER Mardis
ER Mardis
ES Lander
EW Myers
F Sanger
H Jiang
H Li
H Li
H Li
H Lin
HL Eaves
J Butler
JC Dohm
JC Venter
JO Korbel
JR Miller
JR Miller
JT Simpson
JT Simpson
K Chen
KE Holt
L Engstrand
L Noe
M Margulies
M Pop
M Pop
MC Schatz
MJ Chaisson
ML Metzker
MS Hossain
N Homer
N Malhis
NL Clement
O Morozova
O Morozova
P Flicek
P Flicek
P Medvedev
PA Pevzner
PJ Campbell
PJ Hurd
R Staden
RF Service
RL Warren
RQ Li
RQ Li
Rui Jiang
SC Schuster
SM Rumble
Suying Bao
WingKeung Kwan
WJ Ansorge
WR Jeck
Xu Ma
Y Chen
YJ Kim
You-Qiang Song
Z Ning
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.published_or_final_versio

Crossref

HKU Scholars Hub

Deciphering the mechanisms of genetic disorders by high throughput genomic data

Author: Bao Suying
鲍素莹
Publication venue: 'The University of Hong Kong Libraries'
Publication date: 01/01/2013
Field of study

A new generation of non-Sanger-based sequencing technologies, so called “next-generation” sequencing (NGS), has been changing the landscape of genetics at unprecedented speed. In particular, our capacity in deciphering the genotypes underlying phenotypes, such as diseases, has never been greater. However, before fully applying NGS in medical genetics, researchers have to bridge the widening gap between the generation of massively parallel sequencing output and the capacity to analyze the resulting data. In addition, even a list of candidate genes with potential causal variants can be obtained from an effective NGS analysis, to pinpoint disease genes from the long list remains a challenge. The issue becomes especially difficult when the molecular basis of the disease is not fully elucidated. New NGS users are always bewildered by a plethora of options in mapping, assembly, variant calling and filtering programs and may have no idea about how to compare these tools and choose the “right” ones. To get an overview of various bioinformatics attempts in mapping and assembly, a series of performance evaluation work was conducted by using both real and simulated NGS short reads. For NGS variant detection, the performances of two most widely used toolkits were assessed, namely, SAM tools and GATK. Based on the results of systematic evaluation, a NGS data processing and analysis pipeline was constructed. And this pipeline was proved a success with the identification of a mutation (a frameshift deletion on Hnrnpa1, p.Leu181Valfs*6) related to congenital heart defect (CHD) in procollagen type IIA deficient mice. In order to prioritize risk genes for diseases, especially those with limited prior knowledge, a network-based gene prioritization model was constructed. It consists of two parts: network analysis on known disease genes (seed-based network strategy)and network analysis on differential expression (DE-based network strategy). Case studies of various complex diseases/traits demonstrated that the DE-based network strategy can greatly outperform traditional gene expression analysis in predicting disease-causing genes. A series of simulation work indicated that the DE-based strategy is especially meaningful to diseases with limited prior knowledge, and the model’s performance can be further advanced by integrating with seed-based network strategy. Moreover, a successful application of the network-based gene prioritization model in influenza host genetic study further demonstrated the capacity of the model in identifying promising candidates and mining of new risk genes and pathways not biased toward our current knowledge. In conclusion, an efficient NGS analysis framework from the steps of quality control and variant detection, to those of result analysis and gene prioritization has been constructed for medical genetics. The novelty in this framework is an encouraging attempt to prioritize risk genes for not well-characterized diseases by network analysis on known disease genes and differential expression data. The successful applications in detecting genetic factors associated with CHD and influenza host resistance demonstrated the efficacy of this framework. And this may further stimulate more applications of high throughput genomic data in dissecting the genetic components of human disorders in the near future.published_or_final_versionBiochemistryDoctoralDoctor of Philosoph

HKU Scholars Hub

Retraction Note: Evaluation of next-generation sequencing software in mapping and assembly

Author: BinBin Wang
Rui Jiang
SuYing Bao
WingKeung Kwan
Xu Ma
You-Qiang Song
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

PKU-PET-II: A novel SiPM-based PET imaging system for small animals

Author: Association
Baixuan Xu
Bao
Cherry
Du
Goertzen
Harteveld
Huan Xu
Ivan Vuletic
James
Jinming Zhang
Kun Yang
Kun Zhou
Kwon
Llosá
Lu
Lu
Mackewn
Milbrath
Otte
O’Neill
Prasad
Qiushi Ren
Sato
Schaart
Sihao Zhu
Song
Suying Li
Tetrault
Wang
Wang
Weissleder
Xiangxi Meng
Xuan
Yamamoto
Zhaoheng Xie
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A deep catalogue of protein-coding variation in 983,578 individuals

Author: Abecasis Goncalo
Alegre Jesus
Backman Joshua
Bai Xiaodong
Balasubramanian Suganthi
Bao Suying
Baras Aris
Berumen Jaime
Boutkov Boris
Bovijn Jonas
Cantor Michael
Chen Siying
Collins Rory
Cremona M Laura
Di Gioia Alessandro
Emberson Jonathan
Ganel Liron
Gelfman Sahar
Gokhale Sujit
Gorovits Alexander
Habegger Lukas
Hawes Alicia
Joseph Tyler
Kang Hyun Min
Kapoor Manav
Kessler Michael D
Kuri-Morales Pablo
Locke Adam E
Lopez Alexander
Mansfield Adam
Marchini Jonathan
Marcketta Anthony
Maxwell Evan
Mitra George
Nafde Mona
Overton John D
Rajagopal Veera M
Regeneron Genetics Center
Reid Jeffrey G
RGC-ME Cohort Partners
Salerno William
Sharma Deepika
Shuldiner Alan R
Staples Jeffrey
Sun Kathie Y
Tapia-Conyer Roberto
Thornton Timothy
Torres Jason
Varela Jennifer Rico
Zhang Chuanyi
Publication venue: Springer Nature
Publication date: 20/05/2024
Field of study

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser

Oxford University Research Archive

Evaluation of next-generation sequencing software in mapping and assembly

Author: A Bashir
A Bateman
AC McHardy
AD Smith
B Langmead
BinBin Wang
C Trapnell
CA Tilford
D Campagna
D Hernandez
D Weese
DR Bentley
DR Zerbino
DS Horner
DW Bryant Jr
ER Mardis
ER Mardis
ES Lander
EW Myers
F Sanger
H Jiang
H Li
H Li
H Li
H Lin
HL Eaves
J Butler
JC Dohm
JC Venter
JO Korbel
JR Miller
JR Miller
JT Simpson
JT Simpson
K Chen
KE Holt
L Engstrand
L Noe
M Margulies
M Pop
M Pop
MC Schatz
MJ Chaisson
ML Metzker
MS Hossain
N Homer
N Malhis
NL Clement
O Morozova
O Morozova
P Flicek
P Flicek
P Medvedev
PA Pevzner
PJ Campbell
PJ Hurd
R Staden
RF Service
RL Warren
RQ Li
RQ Li
Rui Jiang
SC Schuster
SM Rumble
Suying Bao
WingKeung Kwan
WJ Ansorge
WR Jeck
Xu Ma
Y Chen
YJ Kim
You-Qiang Song
Z Ning
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An overview of the engineered graphene nanostructures and nanocomposites

Author: Abanin
Allen
Alvi
Ang
Archer
Asapu
Babrauskas
Baby
Bao
Becerril
Biswas
Blankenburg
Bo
Bo
Bonanni
Bose
Brownson
Brownson
Brownson
Bu
Bunch
Cao
Chandra
Chandra
Chen
Chen
Chen
Chiu
Choi
Choi
Cipiriano
Dan
Dasari
Dasari
Davies
Dresselhaus
Du
Du
Du
Duquesne
Eda
Fan
Fan
Fan
Feng
Feng
Ferrari
Fileti
Fowler
Gao
Gao
Gao
Garaj
Geim
Goh
Gomez De Arco
Goncalves
Guardia
Guo
Guo
Guo
Gwon
He
He
Heremans
Higginbotham
Hong
Horacek
Hu
Huang
Huang
Huang
Huang
Huang
Huang
Huang
Hwang
Jablan
James
Jiang
Kalita
Kang
Kashiwagi
Kashiwagi
Kashiwagi
Kashiwagi
Katsnelson
Keeley
Kim
Kim
Kim
Koenig
Kuila
Lakshmi
Lee
Lee
Li
Li
Li
Li
Li
Li
Li
Li
Li
Li
Li
Li
Li
Liang
Liang
Liao
Lim
Lin
Lin
Liu
Liu
Loh
Lotya
Lu
Lu
Lu
Lu
Lu
Lu
Lu
Luo
Mao
McAllister
Merchant
Meyer
Mishra
Mishra
Mohanty
Moon
Nair
Ni
Nie
Novoselov
Novoselov
Ouyang
Park
Patake
Peeterbroeck
Pei
Perera
Perera
Qu
Rafiee
Rahman
Rajeshwar
Rakhi
Rao
Ratinac
Recher
Robinson
Roy-Mayhew
Scharf
Schniepp
Schrier
Seol
Sevinçli
Shaju
Shan
Shan
Shang
Shi
Shi
Shi
Shi
Song
Song
Song
Stankovich
Stoller
Stoller
Su
Sun
Sun
Sun
Sun
Sun
Sundaram
Suying
Tan
Tang
Tang
Tang
Tang
Tang
Tang
Tung
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wei
Wei
Woan
Wu
Wu
Wu
Wu
Wu
Wu
Xia
Xia
Xiong
Xu
Xu
Xue
Yan
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yin
Yu
Yu
Yu
Yu
Zang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zhao
Zhao
Zheng
Zhou
Zhou
Zhou
Zhu
Zhu
Zhu
Zhu
Zhu
Zhu
Zuo
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2013
Field of study

Crossref