Search CORE

230 research outputs found

Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression

Author: Baldi
Fengzhu Sun
Ivo Grosse
Jaeger J.
Smyth G.K.
Wentian Li
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 26/03/2004
Field of study

One important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression models, gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data,

\hat{L}(D|M)

, and the expected maximum likelihood of the model given an ensemble of surrogate data with randomly permuted label,

\hat{L}(D_0|M)

. Typically, the computational burden for obtaining

\hat{L}(D_0|M)

is immense, often exceeding the limits of computing available resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme-value problem. We present the derivation of an asymptotic distribution of the extreme-value as well as its mean, median, and variance. Using this distribution, we propose two gene selection criteria, and we apply them to two microarray datasets and three classification tasks for illustration.Comment: to be published in Journal of Computational Biology (2004

arXiv.org e-Print Archive

Crossref

InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites

Author: Eggeling Ralf
Grau Jan
Grosse Ivo
Publication venue
Publication date: 29/11/2016
Field of study

Summary: Recent studies have shown that the traditional position weight matrix model is often insufficient for modeling transcription factor binding sites, as intra-motif dependencies play a significant role for an accurate description of binding motifs. Here, we present the Java application InMoDe, a collection of tools for learning, leveraging and visualizing such dependencies of putative higher order. The distinguishing feature of InMoDe is a robust model selection from a class of parsimonious models, taking into account dependencies only if justified by the data while choosing for simplicity otherwise. Availability and Implementation: InMoDe is implemented in Java and is available as command line application, as application with a graphical user-interface, and as an integration into Galaxy on the project website at http://www.jstacs.de/index.php/InMoDe.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Extended Sunflower Hidden Markov Models for the recognition of homotypic cis-regulatory modules}

Author: Eggeling Ralf
Grosse Ivo
Lemnian Ioana M.
Publication venue: OASIcs - OpenAccess Series in Informatics. German Conference on Bioinformatics 2013
Publication date: 01/01/2013
Field of study

The transcription of genes is often regulated not only by transcription factors binding at single sites per promoter, but by the interplay of multiple copies of one or more transcription factors binding at multiple sites forming a cis-regulatory module. The computational recognition of cis-regulatory modules from ChIP-seq or other high-throughput data is crucial in modern life and medical sciences. A common type of cis-regulatory modules are homotypic clusters of binding sites, i.e., clusters of binding sites of one transcription factor. For their recognition the homotypic Sunflower Hidden Markov Model is a promising statistical model. However, this model neglects statistical dependences among nucleotides within binding sites and flanking regions, which makes it not well suited for de-novo motif discovery. Here, we propose an extension of this model that allows statistical dependences within binding sites, their reverse complements, and flanking regions. We study the efficacy of this extended homotypic Sunflower Hidden Markov Model based on ChIP-seq data from the Human ENCODE Project and find that it often outperforms the traditional homotypic Sunflower Hidden Markov Model

Dagstuhl Research Online Publication Server

A general approach for discriminative de-novo motif discovery from highthroughput data

Author: Grau Jan
Grosse Ivo
Keilwagen Jens
Posch Stefan
Publication venue: Berichte aus dem Julius Kühn-Institut
Publication date: 19/09/2013
Field of study

De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research

Crossref

JKI Open Journal Systems (Julius Kühn-Institut)

PubMed Central

Recommended from our members

Cross-kingdom comparison of the developmental hourglass.

Author: Drost Hajk-Georg
Grosse Ivo
Janitza Philipp
Quint Marcel
Publication venue: Curr Opin Genet Dev
Publication date: 01/08/2017
Field of study

The developmental hourglass model has its foundations in classic anatomical studies by von Baer and Haeckel. In this context, even the conservation of animal body plans has been explained by evolutionary constraints acting on mid-embryogenic development. Recent studies have shown that developmental hourglass patterns also exist on the transcriptomic level, mirroring the corresponding morphological patterns. The identification of similar patterns in embryonic, post-embryonic, and life cycle spanning transcriptomes in plant and fungus development, however, contradict the notion of a direct coupling between morphological and molecular patterns. To explain the existence of hourglass patterns across kingdoms and developmental processes, we propose the organizational checkpoint model that integrates the developmental hourglass model into a framework of transcriptome switches

Apollo (Cambridge)

Comparison of NML and Bayesian scoring criteria for learning parsimonious Markov models

Author: Eggeling Ralf
Grosse Ivo
Myllymäki Petri
Roos Teemu Teppo
Publication venue
Publication date: 01/01/2012
Field of study

Parsimonious Markov models, a generalization of variable order Markov models, have been recently introduced for modeling biological sequences. Up to now, they have been learned by Bayesian approaches. However, there is not always sufficient prior knowledge available and a fully uninformative prior is difficult to define. In order to avoid cumbersome cross validation procedures for obtaining the optimal prior choice, we here adapt scoring criteria for Bayesian networks that approximate the Normalized Maximum Likelihood (NML) to parsimonious Markov models. We empirically compare their performance with the Bayesian approach by classifying splice sites, an important problem from computational biology.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto

MotifAdjuster: a tool for computational reassessment of transcription factor binding site annotations

Author: Baumbach Jan
Grosse Ivo
Keilwagen Jens
Kohl Thomas A
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

MotifAdjuster helps to detect errors in binding site annotations

Crossref

Springer

Springer - Publisher Connector

PubMed Central

Publications at Bielefeld University

Recommended from our members

Prediction of regulatory targets of alternative isoforms of the epidermal growth factor receptor in a glioblastoma cell line.

Author: Ardell David H
Eckert Alexander W
Grosse Ivo
Kappler Matthias
Kotrba Johanna
Vordermark Dirk
Weinholdt Claus
Wichmann Henri
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

BackgroundThe epidermal growth factor receptor (EGFR) is a major regulator of proliferation in tumor cells. Elevated expression levels of EGFR are associated with prognosis and clinical outcomes of patients in a variety of tumor types. There are at least four splice variants of the mRNA encoding four protein isoforms of EGFR in humans, named I through IV. EGFR isoform I is the full-length protein, whereas isoforms II-IV are shorter protein isoforms. Nevertheless, all EGFR isoforms bind the epidermal growth factor (EGF). Although EGFR is an essential target of long-established and successful tumor therapeutics, the exact function and biomarker potential of alternative EGFR isoforms II-IV are unclear, motivating more in-depth analyses. Hence, we analyzed transcriptome data from glioblastoma cell line SF767 to predict target genes regulated by EGFR isoforms II-IV, but not by EGFR isoform I nor other receptors such as HER2, HER3, or HER4.ResultsWe analyzed the differential expression of potential target genes in a glioblastoma cell line in two nested RNAi experimental conditions and one negative control, contrasting expression with EGF stimulation against expression without EGF stimulation. In one RNAi experiment, we selectively knocked down EGFR splice variant I, while in the other we knocked down all four EGFR splice variants, so the associated effects of EGFR II-IV knock-down can only be inferred indirectly. For this type of nested experimental design, we developed a two-step bioinformatics approach based on the Bayesian Information Criterion for predicting putative target genes of EGFR isoforms II-IV. Finally, we experimentally validated a set of six putative target genes, and we found that qPCR validations confirmed the predictions in all cases.ConclusionsBy performing RNAi experiments for three poorly investigated EGFR isoforms, we were able to successfully predict 1140 putative target genes specifically regulated by EGFR isoforms II-IV using the developed Bayesian Gene Selection Criterion (BGSC) approach. This approach is easily utilizable for the analysis of data of other nested experimental designs, and we provide an implementation in R that is easily adaptable to similar data or experimental designs together with all raw datasets used in this study in the BGSC repository, https://github.com/GrosseLab/BGSC

eScholarship - University of California