Search CORE

45 research outputs found

Application of Data Pipelining Technology in Cheminformatics and Bioinformatics

Author: Mao Linyong
Publication venue
Publication date: 01/12/2002
Field of study

Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Master of Sciences in the School of Informatics Indiana University December 2002Data pipelining is the processing, analysis, and mining of large volumes of data through a branching network of computational steps. A data pipelining system consists of a collection of modular computational components and a network for streaming data between them. By defining a logical path for data through a network of computational components and configuring each component accordingly, a user can create a protocol to perform virtually any desired function with data and extract knowledge from them. A set of data pipelines were constructed to explore the relationship between the biodegradability and structural properties of halogenated aliphatic compounds in a data set in which each compound has one degradation rate and nine structure-derived properties. After training, the data pipeline was able to calculate the degradation rates of new compounds with a relatively accurate rate. A second set of data pipelines was generated to cluster new DNA sequences. The data pipelining technology was applied to identify a core sequence to represent a DNA cluster and construct the 95% confidence distance interval for the cluster. The result shows that 74% of the DNA sequences were correctly clustered and there was no false clustering

IUPUIScholarWorks

Genome-wide analysis of Dongxiang wild rice (Oryza rufipogon Griff.) to investigate lost/acquired genes during rice domestication

Author: Fantao Zhang
Jiankun Xie
Linyong Mao
Rui Chen
Shan Gao
Shuangyong Yan
Tao Xu
Xiangdong Luo
Xiwen Chen
Zhenfeng Wu
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

This file reports the functional annotation of 99,092 DXWR transcripts from the NCBI NR database using the software blast2go. This file is in the tab delimited format and can be opened using the software Excel. (TXT 12649Â kb

Springer - Publisher Connector

FigShare

Combining comparative genomics with de novo motif discovery to identify human transcription factor DNA-binding motifs

Author: A Prakash
CT Harbison
G Pavesi
G Pavesi
J Gertz
J Hu
Linyong Mao
M Blanchette
M Kellis
M Tompa
S Aerts
S Aerts
S Sinha
SR Eddy
T Wang
W Jim Zheng
WW Wasserman
X Li
X Xie
Y Liu
Publication venue: BioMed Central
Publication date: 12/12/2006
Field of study

BACKGROUND: As more and more genomes are sequenced, comparative genomics approaches provide a methodology for identifying conserved regulatory elements that may be involved in gene regulations. RESULTS: We developed a novel method to combine comparative genomics with de novo motif discovery to identify human transcription factor binding motifs that are overrepresented and conserved in the upstream regions of a set of co-regulated genes. The method is validated by analyzing a well-characterized muscle specific gene set, and the results showed that our approach performed better than the existing programs in terms of sensitivity and prediction rate. CONCLUSION: The newly developed method can be used to extract regulatory signals in co-regulated genes, which can be derived from the microarray clustering analysis

Crossref

Springer - Publisher Connector

PubMed Central

Genome-Wide Identification and Analysis of Grape Aldehyde Dehydrogenase (ALDH) Gene Superfamily

Author: A Marchler-Bauer
AJ Wood
AP Stines
B Jackson
C Brocker
C Brocker
C Li
Chad Brocker
CX Gao
D Bartels
D Bartels
D Huang
D Rambaldi
E Lyons
EV Koonin
F Liu
G Szekely
G Witz
GA Tuskan
GJ Kelly
GK Smyth
H Esterbauer
H Parkinson
H Tang
HH Kirch
HH Kirch
HH Kirch
Hua Wang
I Searle
JC Jimenez-Lopez
JK Zhu
JK Zhu
K Oguchi
K Tamura
KS Ling
L Gautier
Linyong Mao
M Fujita
M García-Ríos
M Ishitani
M Riera
M Su
Miguel A. Blazquez
N Stiti
N Strizhov
NA Sophos
O Jaillon
P Horton
P Rice
R Chenna
R Sunkar
R Velasco
R Wise
S Cannon
S Kotchoni
S Lehmann
S Zenoni
SA Goff
SA Marchitti
SO Kotchoni
SO Kotchoni
V Chinnusamy
V Vasiliou
Vasilis Vasiliou
W Jakobi
W Wang
WJ Black
X Wang
Xiangjing Yin
Xiping Wang
Y Benjamini
Yucheng Zhang
Z Wu
Zhangjun Fei
Publication venue: Public Library of Science
Publication date: 15/02/2012
Field of study

The completion of the grape genome sequencing project has paved the way for novel gene discovery and functional analysis. Aldehyde dehydrogenases (ALDHs) comprise a gene superfamily encoding NAD(P)(+)-dependent enzymes that catalyze the irreversible oxidation of a wide range of endogenous and exogenous aromatic and aliphatic aldehydes. Although ALDHs have been systematically investigated in several plant species including Arabidopsis and rice, our knowledge concerning the ALDH genes, their evolutionary relationship and expression patterns in grape has been limited.A total of 23 ALDH genes were identified in the grape genome and grouped into ten families according to the unified nomenclature system developed by the ALDH Gene Nomenclature Committee (AGNC). Members within the same grape ALDH families possess nearly identical exon-intron structures. Evolutionary analysis indicates that both segmental and tandem duplication events have contributed significantly to the expansion of grape ALDH genes. Phylogenetic analysis of ALDH protein sequences from seven plant species indicates that grape ALDHs are more closely related to those of Arabidopsis. In addition, synteny analysis between grape and Arabidopsis shows that homologs of a number of grape ALDHs are found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the speciation of the grape and Arabidopsis. Microarray gene expression analysis revealed large number of grape ALDH genes responsive to drought or salt stress. Furthermore, we found a number of ALDH genes showed significantly changed expressions in responses to infection with different pathogens and during grape berry development, suggesting novel roles of ALDH genes in plant-pathogen interactions and berry development.The genome-wide identification, evolutionary and expression analysis of grape ALDH genes should facilitate research in this gene family and provide new insights regarding their evolution history and functional roles in plant stress tolerance

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Arabidopsis gene co-expression network and its functional modules

Author: Dash Sudhansu
Dickerson Julie
Mao Linyong
Van Hemert John
Publication venue
Publication date: 21/10/2009
Field of study

BackgroundBiological networks characterize the interactions of biomolecules at a systems-level. One important property of biological networks is the modular structure, in which nodes are densely connected with each other, but between which there are only sparse connections. In this report, we attempted to find the relationship between the network topology and formation of modular structure by comparing gene co-expression networks with random networks. The organization of gene functional modules was also investigated. ResultsWe constructed a genome-wide Arabidopsis gene co-expression network (AGCN) by using 1094 microarrays. We then analyzed the topological properties of AGCN and partitioned the network into modules by using an efficient graph clustering algorithm. In the AGCN, 382 hub genes formed a clique, and they were densely connected only to a small subset of the network. At the module level, the network clustering results provide a systems-level understanding of the gene modules that coordinate multiple biological processes to carry out specific biological functions. For instance, the photosynthesis module in AGCN involves a very large number (> 1000) of genes which participate in various biological processes including photosynthesis, electron transport, pigment metabolism, chloroplast organization and biogenesis, cofactor metabolism, protein biosynthesis, and vitamin metabolism. The cell cycle module orchestrated the coordinated expression of hundreds of genes involved in cell cycle, DNA metabolism, and cytoskeleton organization and biogenesis. We also compared the AGCN constructed in this study with a graphical Gaussian model (GGM) based Arabidopsis gene network. The photosynthesis, protein biosynthesis, and cell cycle modules identified from the GGM network had much smaller module sizes compared with the modules found in the AGCN, respectively. ConclusionThis study reveals new insight into the topological properties of biological networks. The preferential hub-hub connections might be necessary for the formation of modular structure in gene co-expression networks. The study also reveals new insight into the organization of gene functional modules.This article is from BMC Bioinformatics 10 (2009): 346. Posted with permission.</p

Digital Repository @ Iowa State University (ISU)

Population differentiation in allele frequencies of obesity-associated SNPs

Author: Linyong Mao
Michael Campbell
William M. Southerland
Yayin Fang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2017
Field of study

Abstract Background Obesity is emerging as a global health problem, with more than one-third of the world’s adult population being overweight or obese. In this study, we investigated worldwide population differentiation in allele frequencies of obesity-associated SNPs (single nucleotide polymorphisms). Results We collected a total of 225 obesity-associated SNPs from a public database. Their population-level allele frequencies were derived based on the genotype data from 1000 Genomes Project (phase 3). We used hypergeometric model to assess whether the effect allele at a given SNP is significantly enriched or depleted in each of the 26 populations surveyed in the 1000 Genomes Project with respect to the overall pooled population. Our results indicate that 195 out of 225 SNPs (86.7%) possess effect alleles significantly enriched or depleted in at least one of the 26 populations. Populations within the same continental group exhibit similar allele enrichment/depletion patterns whereas inter-continental populations show distinct patterns. Among the 225 SNPs, 15 SNPs cluster in the first intron region of the FTO gene, which is a major gene associated with body-mass index (BMI) and fat mass. African populations exhibit much smaller blocks of LD (linkage disequilibrium) among these15 SNPs while European and Asian populations have larger blocks. To estimate the cumulative effect of all variants associated with obesity, we developed the personal composite genetic risk score for obesity. Our results indicate that the East Asian populations have the lowest averages of the composite risk scores, whereas three European populations have the highest averages. In addition, the population-level average of composite genetic risk scores is significantly correlated (R2 = 0.35, P = 0.0060) with obesity prevalence. Conclusions We have detected substantial population differentiation in allele frequencies of obesity-associated SNPs. The results will help elucidate the genetic basis which may contribute to population disparities in obesity prevalence

Directory of Open Access Journals

<it>Arabidopsis </it>gene co-expression network and its functional modules

Author: Dash Sudhansu
Dickerson Julie A
Mao Linyong
Van Hemert John L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2009
Field of study

Abstract Background Biological networks characterize the interactions of biomolecules at a systems-level. One important property of biological networks is the modular structure, in which nodes are densely connected with each other, but between which there are only sparse connections. In this report, we attempted to find the relationship between the network topology and formation of modular structure by comparing gene co-expression networks with random networks. The organization of gene functional modules was also investigated. Results We constructed a genome-wide Arabidopsis gene co-expression network (AGCN) by using 1094 microarrays. We then analyzed the topological properties of AGCN and partitioned the network into modules by using an efficient graph clustering algorithm. In the AGCN, 382 hub genes formed a clique, and they were densely connected only to a small subset of the network. At the module level, the network clustering results provide a systems-level understanding of the gene modules that coordinate multiple biological processes to carry out specific biological functions. For instance, the photosynthesis module in AGCN involves a very large number (> 1000) of genes which participate in various biological processes including photosynthesis, electron transport, pigment metabolism, chloroplast organization and biogenesis, cofactor metabolism, protein biosynthesis, and vitamin metabolism. The cell cycle module orchestrated the coordinated expression of hundreds of genes involved in cell cycle, DNA metabolism, and cytoskeleton organization and biogenesis. We also compared the AGCN constructed in this study with a graphical Gaussian model (GGM) based Arabidopsis gene network. The photosynthesis, protein biosynthesis, and cell cycle modules identified from the GGM network had much smaller module sizes compared with the modules found in the AGCN, respectively. Conclusion This study reveals new insight into the topological properties of biological networks. The preferential hub-hub connections might be necessary for the formation of modular structure in gene co-expression networks. The study also reveals new insight into the organization of gene functional modules.</p

Directory of Open Access Journals

Arabidopsis gene co-expression network and its functional modules

Author: Bmc Bioinformatics
John L Van Hemert
Julie A Dickerson
Linyong Mao
Sudhansu Dash
Publication venue
Publication date: 01/01/2009
Field of study

Digital Repository @ Iowa State University (ISU)

CiteSeerX

Springer - Publisher Connector

PubMed Central

Additional file 1: Table S1. of Population differentiation in allele frequencies of obesity-associated SNPs

Author: Linyong Mao (184164)
Michael Campbell (562588)
William Southerland (4586470)
Yayin Fang (4586473)
Publication venue
Publication date
Field of study

Effect allele frequencies in 26 populations for obesity SNPs. The table lists 225 obesity-associated SNPs and their effect allele frequencies in 26 populations surveyed in the 1000 Genomes Project. (XLSX 229Â kb

FigShare

Additional file 2: Table S2. of Population differentiation in allele frequencies of obesity-associated SNPs

Author: Linyong Mao (184164)
Michael Campbell (562588)
William Southerland (4586470)
Yayin Fang (4586473)
Publication venue
Publication date
Field of study

GWA studies of obesity. The table lists 29 GWA studies of obesity, the major ethnic group in each GWA study, and their references. (DOCX 70Â kb

FigShare