Search CORE

61 research outputs found

mDAG: a web-based tool for analyzing microarray data with multiple treatments

Author: Phan Vinhthuy T
Sutter Thomas R
Vo Nam S
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

PubMed Central

Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets

Author: Berry Michael W
Heinrich Kevin
Homayouni Ramin
Phan Vinhthuy
Roy Sujoy
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background Identification of transcription factors (TFs) responsible for modulation of differentially expressed genes is a key step in deducing gene regulatory pathways. Most current methods identify TFs by searching for presence of DNA binding motifs in the promoter regions of co-regulated genes. However, this strategy may not always be useful as presence of a motif does not necessarily imply a regulatory role. Conversely, motif presence may not be required for a TF to regulate a set of genes. Therefore, it is imperative to include functional (biochemical and molecular) associations, such as those found in the biomedical literature, into algorithms for identification of putative regulatory TFs that might be explicitly or implicitly linked to the genes under investigation. Results In this study, we present a Latent Semantic Indexing (LSI) based text mining approach for identification and ranking of putative regulatory TFs from microarray derived differentially expressed genes (DEGs). Two LSI models were built using different term weighting schemes to devise pair-wise similarities between 21,027 mouse genes annotated in the Entrez Gene repository. Amongst these genes, 433 were designated TFs in the TRANSFAC database. The LSI derived TF-to-gene similarities were used to calculate TF literature enrichment p-values and rank the TFs for a given set of genes. We evaluated our approach using five different publicly available microarray datasets focusing on TFs Rel, Stat6, Ddit3, Stat5 and Nfic. In addition, for each of the datasets, we constructed gold standard TFs known to be functionally relevant to the study in question. Receiver Operating Characteristics (ROC) curves showed that the log-entropy LSI model outperformed the tf-normal LSI model and a benchmark co-occurrence based method for four out of five datasets, as well as motif searching approaches, in identifying putative TFs. Conclusions Our results suggest that our LSI based text mining approach can complement existing approaches used in systems biology research to decipher gene regulatory networks by providing putative lists of ranked TFs that might be explicitly or implicitly associated with sets of DEGs derived from microarray experiments. In addition, unlike motif searching approaches, LSI based approaches can reveal TFs that may indirectly regulate genes

University of Memphis Digital Commons

University of Tennessee, Knoxville: Trace

Springer - Publisher Connector

PubMed Central

Exploiting the bootstrap method to analyze patterns of gene expression

Author: AC Davison
H Yang
N Vo
Nam S Vo
P Glaus
Vinhthuy Phan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Method for Constructing Large DNA Codesets

Author: Vinhthuy Phan
Publication venue
Publication date
Field of study

The word design problem is an important problem in DNA computing. The goal is to design a set of single-stranded DNA molecules that are structurefree and mutually non-crosshybridizing. We present an innovative method of constructing such large sets, so called codesets, whose degree of structurefreeness and non-crosshybridization is controlled by a given parameter. Using a well-known, simplified model of hybridization affinity, known as the h-distance model, our method produces sets larger than those produced under similar models, for example, a similar (slightly less realistic) model can construct 108 8-mers, while our construction produces 256 8-mers with the same constraints in terms of the strands ’ mutual hybridization affinity and GC-content. We further provide a justification for the claimed large sizes of produced sets by estimating their relative sizes to those of theoretically maximal sets. The existence of such large sets of structure-free, non-crosshybridizing strands is necessary for largescale DNA computations and approaches. Current methods that employ more realistic estimates of DNA hybridization affinity have been unable to produce sets large enough to encode more than several thousands or even hundred parameters. It seems feasible to extend our methodology to include more realistic assumptions, such as making use of known thermodynamic parameters

CiteSeerX

Alignment of short reads to multiple genomes using hashing

Author: Phan Vinhthuy
Tran Quang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/09/2014
Field of study

Background Recent advances in biotechnology have enabled highthroughput sequencing of genomes based on large numbers of short reads. Current methods [1,2], however, depend mostly on aligning reads to only one reference genome at a time, making it difficult to differentiate sequencing errors from single nucleotide variants (SNV). Materials and methods Inspired by [3], we propose a new method that attempts to take advantage of multiple genomes and SNV information to align reads. This approach is promising in that it allows us to distinguish between sequencing errors and SNV. Our proposed alignment algorithm uses read fragments to identify seeds and extend these seeds to find occurrences of reads in the genome. In this study, we have developed and implemented an algorithm using multiple genomes that captures genomic variations, indexes the multiple genomes and operates short read alignment on a collection of genomes. The preliminary result was validated on Aspergillus fumigatus

University of Memphis Digital Commons

Springer - Publisher Connector

PubMed Central

How to build a system that manages bioinformatics content

Author: Allen Thomas
Vinhthuy Phan
Publication venue
Publication date: 01/12/2007
Field of study

We discuss how to build systems with sophisticated features based on a core set of bioinformatics tools. Beside from being the front-end server to the core analyses, such a system can manage users, bioinformatics content and shared data. Using tools available in the open-source community, we show how to build a system with such sophisticated features with a relatively minimal effort, especially in comparison to building such a system from scratch. We discuss the notion of Model-Content Management-View, under whose framework similar applications with such desirable features beyond the field Bioinformatics can be built. Our system can be viewed a

University of Memphis Digital Commons

CiteSeerX