Search CORE

190 research outputs found

Haplotype Threading Using the Positional Burrows-Wheeler Transform

Author: Sanaullah Ahsan
Zhang Shaoije
Zhi Degui
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Publication date: 01/01/2022
Field of study

In the classic model of population genetics, one haplotype (query) is considered as a mosaic copy of segments from a number of haplotypes in a panel, or threading the haplotype through the panel. The Li and Stephens model parameterized this problem using a hidden Markov model (HMM). However, HMM algorithms are linear to the sample size, and can be very expensive for biobank-scale panels. Here, we formulate the haplotype threading problem as the Minimal Positional Substring Cover problem, where a query is represented by a mosaic of a minimal number of substring matches from the panel. We show that this problem can be solved by a sequential set of greedy set maximal matches. Moreover, the solution space can be bounded by the left-most and the right-most solutions by the greedy approach. Based on these results, we formulate and solve several variations of this problem. Although our results are yet to be generalized to the cases with mismatches, they offer a theoretical framework for designing methods for genotype imputation and haplotype phasing

Dagstuhl Research Online Publication Server

Detecting transcription of ribosomal protein pseudogenes in diverse human tissues from RNA-seq data

Author: Srinivasasainagendra Vinodh
Tonner Peter
Zhang Shaojie
Zhi Degui
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2012
Field of study

Background: Ribosomal proteins (RPs) have about 2000 pseudogenes in the human genome. While anecdotal reports for RP pseudogene transcription exists, it is unclear to what extent these pseudogenes are transcribed. The RP pseudogene transcription is difficult to identify in microarrays due to potential cross-hybridization between transcripts from the parent genes and pseudogenes. Recently, transcriptome sequencing (RNA-seq) provides an opportunity to ascertain the transcription of pseudogenes. A challenge for pseudogene expression discovery in RNA-seq data lies in the difficulty to uniquely identify reads mapped to pseudogene regions, which are typically also similar to the parent genes. Results: Here we developed a specialized pipeline for pseudogene transcription discovery. We first construct a composite genome that includes the entire human genome sequence as well as mRNA sequences of real ribosomal protein genes. We then map all sequence reads to the composite genome, and only exact matches were retained. Moreover, we restrict our analysis to strictly defined mappable regions and calculate the RPKM values as measurement of pseudogene transcription levels. We report evidences for the transcription of RP pseudogenes in 16 human tissues. By analyzing the Human Body Map 2.0 study RNA-sequencing data using our pipeline, we identified that one ribosomal protein (RP) pseudogene (PGOHUM-249508) is transcribed with RPKM 170 in thyroid. Moreover, three other RP pseudogenes are transcribed with RPKM \u3e 10, a level similar to that of the normal RP genes, in white blood cell, kidney, and testes, respectively. Furthermore, an additional thirteen RP pseudogenes are of RPKM \u3e 5, corresponding to the 20-30 percentile among all genes. Unlike ribosomal protein genes that are constitutively expressed in almost all tissues, RP pseudogenes are differentially expressed, suggesting that they may contribute to tissue-specific biological processes. Conclusions: Using a specialized bioinformatics method, we identified the transcription of ribosomal protein pseudogenes in human tissues using RNA-seq data

Crossref

Springer - Publisher Connector

PubMed Central

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Efficient Haplotype Block Matching in Bi-Directional PBWT

Author: Naseri Ardalan
Yue William
Zhang Shaojie
Zhi Degui
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)
Publication date: 01/01/2021
Field of study

Efficient haplotype matching search is of great interest when large genotyped cohorts are becoming available. Positional Burrows-Wheeler Transform (PBWT) enables efficient searching for blocks of haplotype matches. However, existing efficient PBWT algorithms sweep across the haplotype panel from left to right, capturing all exact matches. As a result, PBWT does not account for mismatches. It is also not easy to investigate the patterns of changes between the matching blocks. Here, we present an extension to PBWT, called bi-directional PBWT that allows the information about the blocks of matches to be present at both sides of each site. We also present a set of algorithms to efficiently merge the matching blocks or examine the patterns of changes on both sides of each site. The time complexity of the algorithms to find and merge matching blocks using bi-directional PBWT is linear to the input size. Using real data from the UK Biobank, we demonstrate the run time and memory efficiency of our algorithms. More importantly, our algorithms can identify more blocks by enabling tolerance of mismatches. Moreover, by using mutual information (MI) between the forward and the reverse PBWT matching block sets as a measure of haplotype consistency, we found the MI derived from European samples in the 1000 Genomes Project is highly correlated (Spearman correlation r=0.87) with the deCODE recombination map

Dagstuhl Research Online Publication Server

Potential of tropical maize populations for improving an elite maize hybrid

Author: Li Mingshun
Li Xinhai
Liu Zhipeng
Wang Jianjun
Yong Hongjun
Zhang Degui
Zhang Shihuang
Publication venue: Maydica
Publication date: 19/09/2012
Field of study

Identifying exotic maize (Zea mays L) populations possessing favorable new alleles lacking in local elite hybrids is an important strategy for improving maize hybrids. Selection of an appropriate breeding method will increase the chance of successfully transferring these favorable new alleles into elite inbred lines of local hybrids. The objec¬tives of this study were to: (i) evaluate 14 maize populations from CIMMYT and identify those containing favorable alleles for grain yield, ear length, ear diameter, kernel length, plant height, and ear height that are lacking in a local super hybrid [Jidan261 (W9706 × Ji853)], and to (ii) determine which inbred parent should be improved. These re¬sults showed that the populations Pob43, Pob501, and La Posta had positive and significant numbers of favorable alleles not found in hybrid W9706 × Ji853 that could be used for simultaneous improvement of its grain yield, ear length, and kernel length, and that population QPM-Y was also a good donor for improvement of ear diameter and kernel length in the hybrid. Based on allele frequencies in the two inbred lines and the donor population, when the populations Pob43, La Posta, Pob501, and QPM-Y were used as donors, inbred line W9706 would be improved by selfing the F1 of the cross W9706 × donor population. These results suggested that CIMMYT germplasm has potential to improve temperate elite hybrids. The relationship between GCA and SCA from a previous study and the parameters obtained from the Dudley method are discussed. The results showed that the values of Lplμ’ esti¬mates obtained by applying the Dudley method had the same trend as GCA effects for grain yield but a less clear trend for ear length, while the trends in the relationship value were reversed for SCA between these populations and Lancaster-derived lines

CREA Journals (Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria)

A hidden markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns

Author: Chen Guo-Bo
Liu Nianjun
Wu Jihua
Zhang Kui
Zhi Degui
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information. Meantime, many haplotypes and partial haplotype patterns have been previously identified due to tight linkage disequilibrium (LD) among these clustered genes thus can be incorporated to facilitate haplotype inference. In this paper, we developed a hidden Markov model (HMM) based method that can incorporate identified haplotypes or partial haplotype patterns for haplotype inference from present-absent data of clustered genes (e.g., KIR genes). We compared its performance with an expectation maximization (EM) based method previously developed in terms of haplotype assignments and haplotype frequency estimation through extensive simulations for KIR genes. The simulation results showed that the new HMM based method outperformed the previous method when some incorrect haplotypes were included as identified haplotypes and/or the standard deviation of haplotype frequencies were small. We also compared the performance of our method with two methods that do not use previously identified haplotypes and haplotype patterns, including an EM based method, HPALORE, and a HMM based method, MaCH. Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference. The new software package HaploHMM is available and can be downloaded at http://www.soph.uab.edu/ssg/files/People/KZhang/HaploHMM/haplohmm-index.html

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

University of Queensland eSpace

Inhibition of glycolytic enzyme hexokinase II (HK2) suppresses lung tumor growth

Author: Degui Lin
Huanan Wang
Ji Wang
Lei Wang
Yibin Deng
Yingjie Zhang
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector

Model selection and structure specification in ultra-high dimensional generalised semi-varying coefficient models

Author: Ke Yuan
Li Degui
Zhang Wenyang
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 29/10/2015
Field of study

In this paper, we study the model selection and structure specification for the generalised semi-varying coefficient models (GSVCMs), where the number of potential covariates is allowed to be larger than the sample size.We first propose a penalised likelihood method with the LASSO penalty function to obtain the preliminary estimates of the functional coefficients. Then, using the quadratic approximation for the local log-likelihood function and the adaptive group LASSO penalty (or the local linear approximation of the group SCAD penalty) with the help of the preliminary estimation of the functional coefficients, we introduce a novel penalised weighted least squares procedure to select the significant covariates and identify the constant coefficients among the coefficients of the selected covariates, which could thus specify the semiparametric modelling structure. The developed model selection and structure specification approach not only inherits many nice statistical properties from the local maximum likelihood estimation and nonconcave penalised likelihood method, but also computationally attractive thanks to the computational algorithm that is proposed to implement our method. Under some mild conditions, we establish the asymptotic properties for the proposed model selection and estimation procedure such as the sparsity and oracle property.We also conduct simulation studies to examine the finite sample performance of the proposed method, and finally apply the method to analyse a real data set, which leads to some interesting findings

arXiv.org e-Print Archive

Crossref

White Rose Research Online

Nonparametric Homogeneity Pursuit in Functional-Coefficient Models

Author: Chen Jia
Li Degui
Wei Lingling
Zhang Wenyang
Publication venue: 'Informa UK Limited'
Publication date: 01/12/2021
Field of study

This paper explores homogeneity of coefficient functions in nonlinear models with functional coefficients and identifies the underlying semiparametric modelling structure. With initial kernel estimates, we combine the classic hierarchical clustering method with a generalised version of the information criterion to estimate the number of clusters, each of which has a common functional coefficient, and determine the membership of each cluster. To identify a possible semi-varying coefficient modelling framework, we further introduce a penalised local least squares method to determine zero coefficients, non-zero constant coefficients and functional coefficients which vary with an index variable. Through the nonparametric kernel-based cluster analysis and the penalised approach, we can substantially reduce the number of unknown parametric and nonparametric components in the models, thereby achieving the aim of dimension reduction. Under some regularity conditions, we establish the asymptotic properties for the proposed methods including the consistency of the homogeneity pursuit. Numerical studies, including Monte-Carlo experiments and two empirical applications, are given to demonstrate the finite-sample performance of our methods

White Rose Research Online

Erratum to: Inhibition of glycolytic enzyme hexokinase II (HK2) suppresses lung tumor growth

Author: Degui Lin
H Wang
Huanan Wang
Ji Wang
Lei Wang
Yibin Deng
Yingjie Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref