Search CORE

Conservation and implications of eukaryote transcriptional regulatory regions across multiple species

Author: Deng Minghua
Fu Wenjiang J
Li Dayong
Liu Xue
Qian Minping
Sun Fengzhu
Wan Lin
Zhang Donglei
Zhu Lihuang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts. Results We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or <it>OsALYL1</it>, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes. Conclusion Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.</p

Springer - Publisher Connector

Harvard University - DASH

Recommended from our members

DiseaseConnect: a comprehensive web server for mechanism-based disease–disease connections

Author: Chaudhary Preet M.
Chen Jeremy J. W.
Crandall Edward
Li Wenyuan
Liu Chun-Chi
Loscalzo Joseph
Mayzus Ilya
Rzhetsky Andrey
Sun Fengzhu
Tseng Yu-Ting
Waterman Michael
Wu Chia-Yu
Zhou Xianghong Jasmine
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/06/2014
Field of study

The DiseaseConnect (http://disease-connect.org) is a web server for analysis and visualization of a comprehensive knowledge on mechanism-based disease connectivity. The traditional disease classification system groups diseases with similar clinical symptoms and phenotypic traits. Thus, diseases with entirely different pathologies could be grouped together, leading to a similar treatment design. Such problems could be avoided if diseases were classified based on their molecular mechanisms. Connecting diseases with similar pathological mechanisms could inspire novel strategies on the effective repositioning of existing drugs and therapies. Although there have been several studies attempting to generate disease connectivity networks, they have not yet utilized the enormous and rapidly growing public repositories of disease-related omics data and literature, two primary resources capable of providing insights into disease connections at an unprecedented level of detail. Our DiseaseConnect, the first public web server, integrates comprehensive omics and literature data, including a large amount of gene expression data, Genome-Wide Association Studies catalog, and text-mined knowledge, to discover disease–disease connectivity via common molecular mechanisms. Moreover, the clinical comorbidity data and a comprehensive compilation of known drug–disease relationships are additionally utilized for advancing the understanding of the disease landscape and for facilitating the mechanism-based development of new drug treatments

National Chung Hsing University Institutional Repository

Public Library of Science (PLOS)

Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads

Author: A Brady
AC McHardy
B Beszteri
B Langmead
DB Rusch
DC Richter
DH Huson
DR Kelley
EJ Biers
Emmanuel Dias-Neto
FE Angly
Fengzhu Sun
GL Rosen
GW Tyson
H Li
H Teeling
J Peterson
J Qin
Jacob A. Cram
JC Venter
Jed A. Fuhrman
JL Morgan
JS Liu
K Kurokawa
K Liolios
K Mavromatis
KE Nelson
Li C. Xia
M Monzoorul Haque
NN Diaz
PA Vaishampayan
PJ Turnbaugh
PJ Turnbaugh
PJ Turnbaugh
R Sandberg
R Stepanauskas
RJ Case
RM Engeman
S Chatterji
SF Altschul
SR Gill
T Woyke
Ting Chen
VM Markowitz
Y Chen
YW Wu
Publication venue: Public Library of Science
Publication date: 06/12/2011
Field of study

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes

A model-based approach to selection of tag SNPs

Author: A Barron
A Thomas
AP Dempster
B Halldórsson
BV Halldórsson
CE Shannon
CS Carlson
CS Carlson
D Botstein
DC Crawford
DC Crawford
EC Anderson
Fengzhu Sun
G Schwarz
GA McVean
H Akaike
H Mannila
J Besag
JD Wall
JD Wall
JFC Kingman
JN Hirschhorn
K Zhang
K Zhang
K Zhang
L Breiman
L Excoffier
L Li
LE Baum
Lei M Li
LR Rabiner
M Koivisto
M Nothnagel
M Stephens
MJ Daly
N Li
N Patil
Pierre Nicolas
S Lin
SB Gabriel
SE Ptak
T Niu
TG Schulze
The International HapMap Consortium
TM Cover
W Zhai
X Ke
X Sun
Z Liu
Z Meng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available

Springer - Publisher Connector

HAL Descartes

Hal-Diderot

Novel compound heterozygous mutations in the OTOF Gene identified by whole-exome sequencing in auditory neuropathy spectrum disorder

Author: BY Choi
CC Morton
D Duman
Dengke Ma
Fei Liu
Fengzhu Tang
I Rouillon
I Roux
Jianping Liang
L Yuan
Liang Xu
M Choi
M Rodriguez-Ballesteros
M Shizuka
Min Liu
Min Shi
N Hilgert
N Mahdieh
O Diaz-Horta
Qingqing Wang
Qiutian Lu
QJ Wang
QJ Zhang
R Varga
R Varga
RA Reynoso
S Delmaghani
S Liu
S Yasunaga
S Yasunaga
TB Kim
V Migliosi
VK Manchaiah
WM Roberts
X Cheng
Y Iwasa
Y Lu
Yuecai Qiu
Yulan Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Method for stereo mapping based on ObjectARX and pipeline technology

Author: Chen Tianen
Lin Zongjian
Liu Fengzhu
Publication venue
Publication date: 04/08/2011
Field of study

Stereo mapping is an important way to acquire 4D production. Based on the development of the stereo mapping and the characteristics of ObjectARX and pipeline technology, a new stereo mapping scheme which can realize the interaction between the AutoCAD and digital photogrammetry system is offered by ObjectARX and pipeline technology. An experiment is made in order to make sure the feasibility with the example of the software MAP-AT (Modern Aerial Photogrammetry Automatic Triangulation), the experimental results show that this scheme is feasible and it has very important meaning for the realization of the acquisition and edit integration

Scipedia

Sparse generalized linear model with L 0 approximation for feature selection and prediction with big omics data

Author: Dermot P. McGovern
Fengzhu Sun
Zhenqiu Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2017
Field of study

Abstract Background Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L 1, SCAD and MC+. However, none of the existing algorithms optimizes L 0, which penalizes the number of nonzero features directly. Results In this paper, we develop a novel sparse generalized linear model (GLM) with L 0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L 0 optimization directly. Even though the original L 0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms (L 0ADRIDGE) for L 0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored. Conclusions The developed Software L 0ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge