Search CORE

706 research outputs found

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations.

Author: Kwok Pui-Yan
Levy-Sakin Michal
Wong Karen HY
Publication venue: eScholarship, University of California
Publication date: 01/08/2018
Field of study

The human reference genome is used extensively in modern biological research. However, a single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population. Using 10× Genomics (10×G) "Linked-Read" technology, we perform whole genome sequencing (WGS) and de novo assembly on 17 individuals across five populations. We identify 1842 breakpoint-resolved non-reference unique insertions (NUIs) that, in aggregate, add up to 2.1 Mb of so far undescribed genomic content. Among these, 64% are considered ancestral to humans since they are found in non-human primate genomes. Furthermore, 37% of the NUIs can be found in the human transcriptome and 14% likely arose from Alu-recombination-mediated deletion. Our results underline the need of a set of human reference genomes that includes a comprehensive list of alternative haplotypes to depict the complete spectrum of genetic diversity across populations

Directory of Open Access Journals

eScholarship - University of California

Limit theorems for functions of marginal quantiles

Author: Babu G. Jogesh
Bai Zhidong
Choi Kwok Pui
Mangalam Vasudevan
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 22/04/2011
Field of study

Multivariate distributions are explored using the joint distributions of marginal sample quantiles. Limit theory for the mean of a function of order statistics is presented. The results include a multivariate central limit theorem and a strong law of large numbers. A result similar to Bahadur's representation of quantiles is established for the mean of a function of the marginal quantiles. In particular, it is shown that

\sqrt{n}\Biggl(\frac{1}{n}\sum_{i=1}^n\phi\bigl(X_{n:i}^{(1)},...,X_{n:i}^{(d)}\bigr)-\bar{\gamma}\Biggr)=\frac{1}{\sqrt{n}}\sum_{i=1}^nZ_{n,i}+\mathrm{o}_P(1)

n\rightarrow\infty

, where

\bar{\gamma}

is a constant and

Z_{n,i}

are i.i.d. random variables for each

n

. This leads to the central limit theorem. Weak convergence to a Gaussian process using equicontinuity of functions is indicated. The results are established under very general conditions. These conditions are shown to be satisfied in many commonly occurring situations.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ287 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

Kinetic FP-TDI assay for SNP allele frequency determination

Author: Kwok Pui-Yan
Latif Sherif Medhat
Xiao Ming
Publication venue: Digital Commons@Becker
Publication date: 01/01/2003
Field of study

Digital Commons@Becker

AT excursion: a new approach to predict replication origins in viral genomes by locating AT-rich regions

Author: Chew David SH
Choi Kwok Pui
Leung Ming-Ying
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind of genomes might not work well for another. In this paper, we propose the AT excursion method, which is a score-based approach, to quantify local AT abundance in genomic sequences and use the identified high scoring segments for predicting replication origins. This method has the advantages of requiring no preset window size and having rigorous criteria to evaluate statistical significance of high scoring segments. Results We have evaluated the AT excursion method by checking its predictions against known replication origins in herpesviruses and comparing its performance with an existing base weighted score method (BWS1). Out of 43 known origins, 39 are predicted by either one or the other method and 26 origins are predicted by both. The excursion method identifies six origins not predicted by BWS1, showing that the AT excursion method is a valuable complement to BWS1. We have also applied the AT excursion method to two other families of double stranded DNA viruses, the poxviruses and iridoviruses, of which very few replication origins are documented in the public domain. The prediction results are made available as supplementary materials at <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Preliminary investigation shows that the proposed method works well on some larger genomes too. Conclusion The AT excursion method will be a useful computational tool for identifying replication origins in a variety of genomic sequences.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays

Author: Choi Kwok Pui
Leong Hon Wai
Ning Kang
Zhang Louxin
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

The broad applicability of gene expression profiling to genomic analyses has generated huge demand for mass production of microarrays and hence for improving the cost effectiveness of microarray fabrication. We developed a post-processing method for deriving a good synthesis strategy. In this paper, we assessed all the known efficient methods and our post-processing method for reducing the number of synthesis cycles for manufacturing a DNA-chip of a given set of oligos. Our experimental results on both simulated and 52 real datasets show that no single method consistently gives the best synthesis strategy, and post-processing an existing strategy is necessary as it often reduces the number of synthesis cycles further

Crossref

PubMed Central

ScholarBank@NUS

Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences

ConReg-R: Extrapolative recalibration of the empirical distribution of p-values to improve false discovery rate estimates

Author: Choi Kwok Pui
Karuturi R Krishna Murthy
Li Juntao
Paramita Puteri
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity. Results We propose a new extrapolative method called <it>Constrained Regression Recalibration </it>(ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (<it>π</it>0) and FDR are estimated after the recalibration. Conclusions ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments. Reviewers The manuscript was reviewed by Prof. Vladimir Kuznetsov, Prof. Philippe Broet, and Prof. Hongfang Liu (nominated by Prof. Yuriy Gusev).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

ScholarBank@NUS

Application of next generation sequencing to CEPH cell lines to discover variants associated with FDA approved chemotherapeutics

Author: Hariani GD
Havener T
Kwok Pui-Yan
Kwok PY
Lam EJ
McLeod HL
Motsinger-Reif AA
Wagner MJ
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

After publication of this work [1], it has come to our attention that there is an error in the author list of the initial version of this manuscript; rather than Ernest J Lam, the second author of the manuscript should be listed as Ernest T Lam

Crossref

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

eScholarship - University of California

Comprehensive Analysis of Human Subtelomeres by Whole Genome Mapping

Author: Abid Heba Z.
Kwok Pui-Yan
Riethman Harold
Xiao Ming
Young Eleanor
Publication venue: ODU Digital Commons
Publication date: 01/01/2020
Field of study

Detailed comprehensive knowledge of the structures of individual long-range telomere-terminal haplotypes are needed to understand their impact on telomere function, and to delineate the population structure and evolution of subtelomere regions. However, the abundance of large evolutionarily recent segmental duplications and high levels of large structural variations have complicated both the mapping and sequence characterization of human subtelomere regions. Here, we use high throughput optical mapping of large single DNA molecules in nanochannel arrays for 154 human genomes from 26 populations to present a comprehensive look at human subtelomere structure and variation. The results catalog many novel long-range subtelomere haplotypes and determine the frequencies and contexts of specific subtelomeric duplicons on each chromosome arm, helping to clarify the currently ambiguous nature of many specific subtelomere structures as represented in the current reference sequence (HG38). The organization and content of some duplicons in subtelomeres appear to show both chromosome arm and population-specific trends. Based upon these trends we estimate a timeline for the spread of these duplication blocks

Directory of Open Access Journals

eScholarship - University of California

Old Dominion University