Search CORE

3 research outputs found

Finding Sequence Features in Tissue-specific Sequences

Author: Engel James Douglas
Hero III Alfred O.
Rao Arvind
States David J.
Publication venue
Publication date: 01/01/2007
Field of study

The discovery of motifs underlying gene expression is a challenging one. Some of these motifs are known transcription factors, but sequence inspection often provides valuable clues, even discovery of novel motifs with uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes, such as development and disease progression. In this work, we present an approach to the principled selection of motifs (not necessarily transcription factor sites) and examine its application to several questions in current bioinformatics research. There are two main contributions of this work: Firstly, we introduce a new metric for variable selection during classification, and secondly, we investigate a problem of finding specific sequence motifs that underlie tissue specific gene expression. In conjunction with the SVM classifier we find these motifs and discover several novel motifs which have not yet been attributed with any particular functional role (eg: TFBS binding motifs). We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue specific regulatory potential of any conserved sequence element identified from genome-wide studies. Finally, we propose the utility of this developed framework to not only aid discovery of discriminatory motifs, but also to examine the role of any motif of choice in co-regulation or co-expression of gene groups.Comment: 11 pages,9 figure

arXiv.org e-Print Archive

CiteSeerX

Understanding Distal Transcriptional Regulation from Sequence Motif, Network Inference and Interactome Perspectives

Author: Engel James Douglas
Hero III Alfred O.
Rao Arvind
States David J.
Publication venue
Publication date: 21/03/2008
Field of study

Gene regulation in higher eukaryotes involves a complex interplay between the gene proximal promoter and distal genomic elements (such as enhancers) which work in concert to drive spatio-temporal expression. The experimental characterization of gene regulatory elements is a very complex and resource-intensive process. One of the major goals in computational biology is the \textit{in-silico} annotation of previously uncharacterized elements using results from the subset of known, annotated, regulatory elements. The computational annotation of these hitherto uncharacterized regions would require an identification of features that have good predictive value for regulatory behavior. In this work, we study transcriptional regulation as a problem in heterogeneous data integration, across sequence, expression and interactome level attributes. Using the example of the \textit{Gata2} gene and its recently discovered urogenital enhancers \cite{Khandekar2004} as a case study, we examine the predictive value of various high throughput functional genomic assays in characterizing these enhancers and their regulatory role. Observing results from the application of modern statistical learning methodologies for each of these data modalities, we propose a set of attributes that are most discriminatory in the localization and behavior of these enhancers

arXiv.org e-Print Archive

Complexity-Regularized Multiresolution Density Estimation

Author: Rebecca M. Willett
Publication venue
Publication date: 01/01/2004
Field of study

Abstract — The density estimation method proposed in this paper employs piecewise polynomial fits on adaptive dyadic partitions. The proposed estimator enjoys the minimax adaptivity associated with wavelet-based density estimators as well as the following additional advantages: estimates are guaranteed to be non-negative, theoretical bounds provide an indication of performance even for small sample sizes, and the method can be extended to free-degree piecewise polynomial estimation, which allows the data to adaptively determine the smoothness of the underlying basis functions. I. MULTISCALE PENALIZED LIKELIHOOD ESTIMATION We estimate a density, f: [0, 1] → [0, ∞), from n observations using a minimum description length/coding theoretic approach to regularization and information theoretic techniques based on the Li-Barron bound and its extension [1, 2]. We accomplis

CiteSeerX