Search CORE

3 research outputs found

Clinically driven semi-supervised class discovery in gene expression data

Author: Diego Ardigò
Israel Steinfeld
Ivana Zavaroni
Roy Navon
Zohar Yakhini
Publication venue
Publication date: 09/08/2008
Field of study

Abstract Motivation: Unsupervised class discovery in gene expression data relies on the statistical signals in the data to exclusively drive the results. It is often the case, however, that one is interested in constraining the search space to respect certain biological prior knowledge while still allowing a flexible search within these boundaries. Results: We develop an approach to semi-supervised class discovery. One component of our approach uses clinical sample information to constrain the search space and guide the class discovery process to yield biologically relevant partitions. A second component consists of using known biological annotation of genes to drive the search, seeking partitions that manifest strong differential expression in specific sets of genes. We develop efficient algorithmics for these tasks, implementing both approaches and combinations thereof. We show that our method is robust enough to detect known clinical parameters in accordance with expected clinical values. We also use our method to elucidate cardiovascular disease (CVD) putative risk factors. Availability: MonoClaD (Monotone Class Discovery). See http://bioinfo.cs.technion.ac.il/people/zohar/MonoClad/ Supplementary information: Supplementary data is available at http://bioinfo.cs.technion.ac.il/people/zohar/MonoClad/software.html Contact: [email protected]

Open Access Repository

Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs

Author: Leibovich Limor
Yakhini Zohar
Publication venue
Publication date: 30/07/2013
Field of study

Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Springer - Publisher Connector

Clinically driven semi-supervised class discovery in gene expression data

Author: ARDIGÒ D
NAVON R
STEINFELD I
YAKHINI Z
ZAVARONI I.
Publication venue
Publication date
Field of study

MOTIVATION: Unsupervised class discovery in gene expression data relies on the statistical signals in the data to exclusively drive the results. It is often the case, however, that one is interested in constraining the search space to respect certain biological prior knowledge while still allowing a flexible search within these boundaries. RESULTS: We develop an approach to semi-supervised class discovery. One component of our approach uses clinical sample information to constrain the search space and guide the class discovery process to yield biologically relevant partitions. A second component consists of using known biological annotation of genes to drive the search, seeking partitions that manifest strong differential expression in specific sets of genes. We develop efficient algorithmics for these tasks, implementing both approaches and combinations thereof. We show that our method is robust enough to detect known clinical parameters in accordance with expected clinical values. We also use our method to elucidate cardiovascular disease (CVD) putative risk factors

Archivio istituzionale della Ricerca - Università degli Studi di Parma