3 research outputs found

    Finding Sequence Features in Tissue-specific Sequences

    Full text link
    The discovery of motifs underlying gene expression is a challenging one. Some of these motifs are known transcription factors, but sequence inspection often provides valuable clues, even discovery of novel motifs with uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes, such as development and disease progression. In this work, we present an approach to the principled selection of motifs (not necessarily transcription factor sites) and examine its application to several questions in current bioinformatics research. There are two main contributions of this work: Firstly, we introduce a new metric for variable selection during classification, and secondly, we investigate a problem of finding specific sequence motifs that underlie tissue specific gene expression. In conjunction with the SVM classifier we find these motifs and discover several novel motifs which have not yet been attributed with any particular functional role (eg: TFBS binding motifs). We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue specific regulatory potential of any conserved sequence element identified from genome-wide studies. Finally, we propose the utility of this developed framework to not only aid discovery of discriminatory motifs, but also to examine the role of any motif of choice in co-regulation or co-expression of gene groups.Comment: 11 pages,9 figure

    Understanding Distal Transcriptional Regulation from Sequence Motif, Network Inference and Interactome Perspectives

    Full text link
    Gene regulation in higher eukaryotes involves a complex interplay between the gene proximal promoter and distal genomic elements (such as enhancers) which work in concert to drive spatio-temporal expression. The experimental characterization of gene regulatory elements is a very complex and resource-intensive process. One of the major goals in computational biology is the \textit{in-silico} annotation of previously uncharacterized elements using results from the subset of known, annotated, regulatory elements. The computational annotation of these hitherto uncharacterized regions would require an identification of features that have good predictive value for regulatory behavior. In this work, we study transcriptional regulation as a problem in heterogeneous data integration, across sequence, expression and interactome level attributes. Using the example of the \textit{Gata2} gene and its recently discovered urogenital enhancers \cite{Khandekar2004} as a case study, we examine the predictive value of various high throughput functional genomic assays in characterizing these enhancers and their regulatory role. Observing results from the application of modern statistical learning methodologies for each of these data modalities, we propose a set of attributes that are most discriminatory in the localization and behavior of these enhancers

    Complexity-Regularized Multiresolution Density Estimation

    No full text
    Abstract β€” The density estimation method proposed in this paper employs piecewise polynomial fits on adaptive dyadic partitions. The proposed estimator enjoys the minimax adaptivity associated with wavelet-based density estimators as well as the following additional advantages: estimates are guaranteed to be non-negative, theoretical bounds provide an indication of performance even for small sample sizes, and the method can be extended to free-degree piecewise polynomial estimation, which allows the data to adaptively determine the smoothness of the underlying basis functions. I. MULTISCALE PENALIZED LIKELIHOOD ESTIMATION We estimate a density, f: [0, 1] β†’ [0, ∞), from n observations using a minimum description length/coding theoretic approach to regularization and information theoretic techniques based on the Li-Barron bound and its extension [1, 2]. We accomplis
    corecore