3 research outputs found
Finding Sequence Features in Tissue-specific Sequences
The discovery of motifs underlying gene expression is a challenging one. Some
of these motifs are known transcription factors, but sequence inspection often
provides valuable clues, even discovery of novel motifs with uncharacterized
function in gene expression. Coupled with the complexity underlying
tissue-specific gene expression, there are several motifs that are putatively
responsible for expression in a certain cell type. This has important
implications in understanding fundamental biological processes, such as
development and disease progression. In this work, we present an approach to
the principled selection of motifs (not necessarily transcription factor sites)
and examine its application to several questions in current bioinformatics
research.
There are two main contributions of this work: Firstly, we introduce a new
metric for variable selection during classification, and secondly, we
investigate a problem of finding specific sequence motifs that underlie tissue
specific gene expression. In conjunction with the SVM classifier we find these
motifs and discover several novel motifs which have not yet been attributed
with any particular functional role (eg: TFBS binding motifs). We hypothesize
that the discovery of these motifs would enable the large-scale investigation
for the tissue specific regulatory potential of any conserved sequence element
identified from genome-wide studies.
Finally, we propose the utility of this developed framework to not only aid
discovery of discriminatory motifs, but also to examine the role of any motif
of choice in co-regulation or co-expression of gene groups.Comment: 11 pages,9 figure
Understanding Distal Transcriptional Regulation from Sequence Motif, Network Inference and Interactome Perspectives
Gene regulation in higher eukaryotes involves a complex interplay between the
gene proximal promoter and distal genomic elements (such as enhancers) which
work in concert to drive spatio-temporal expression. The experimental
characterization of gene regulatory elements is a very complex and
resource-intensive process. One of the major goals in computational biology is
the \textit{in-silico} annotation of previously uncharacterized elements using
results from the subset of known, annotated, regulatory elements.
The computational annotation of these hitherto uncharacterized regions would
require an identification of features that have good predictive value for
regulatory behavior.
In this work, we study transcriptional regulation as a problem in
heterogeneous data integration, across sequence, expression and interactome
level attributes. Using the example of the \textit{Gata2} gene and its recently
discovered urogenital enhancers \cite{Khandekar2004} as a case study, we
examine the predictive value of various high throughput functional genomic
assays in characterizing these enhancers and their regulatory role. Observing
results from the application of modern statistical learning methodologies for
each of these data modalities, we propose a set of attributes that are most
discriminatory in the localization and behavior of these enhancers
Complexity-Regularized Multiresolution Density Estimation
Abstract β The density estimation method proposed in this paper employs piecewise polynomial fits on adaptive dyadic partitions. The proposed estimator enjoys the minimax adaptivity associated with wavelet-based density estimators as well as the following additional advantages: estimates are guaranteed to be non-negative, theoretical bounds provide an indication of performance even for small sample sizes, and the method can be extended to free-degree piecewise polynomial estimation, which allows the data to adaptively determine the smoothness of the underlying basis functions. I. MULTISCALE PENALIZED LIKELIHOOD ESTIMATION We estimate a density, f: [0, 1] β [0, β), from n observations using a minimum description length/coding theoretic approach to regularization and information theoretic techniques based on the Li-Barron bound and its extension [1, 2]. We accomplis