69 research outputs found

    Novel Rank-Based Statistical Methods Reveal MicroRNAs with Differential Expression in Multiple Cancer Types

    Get PDF
    BACKGROUND:MicroRNAs (miRNAs) regulate target genes at the post-transcriptional level and play important roles in cancer pathogenesis and development. Variation amongst individuals is a significant confounding factor in miRNA (or other) expression studies. The true character of biologically or clinically meaningful differential expression can be obscured by inter-patient variation. In this study we aim to identify miRNAs with consistent differential expression in multiple tumor types using a novel data analysis approach. METHODS:Using microarrays we profiled the expression of more than 700 miRNAs in 28 matched tumor/normal samples from 8 different tumor types (breast, colon, liver, lung, lymphoma, ovary, prostate and testis). This set is unique in putting emphasis on minimizing tissue type and patient related variability using normal and tumor samples from the same patient. We develop scores for comparing miRNA expression in the above matched sample data based on a rigorous characterization of the distribution of order statistics over a discrete state set, including exact p-values. Specifically, we compute a Rank Consistency Score (RCoS) for every miRNA measured in our data. Our methods are also applicable in various other contexts. We compare our methods, as applied to matched samples, to paired t-test and to the Wilcoxon Signed Rank test. RESULTS:We identify consistent (across the cancer types measured) differentially expressed miRNAs. 41 miRNAs are under-expressed in cancer compared to normal, at FDR (False Discovery Rate) of 0.05 and 17 are over-expressed at the same FDR level. Differentially expressed miRNAs include known oncomiRs (e.g miR-96) as well as miRNAs that were not previously universally associated with cancer. Specific examples include miR-133b and miR-486-5p, which are consistently down regulated and mir-629* which is consistently up regulated in cancer, in the context of our cohort. Data is available in GEO. Software is available at: http://bioinfo.cs.technion.ac.il/people/zohar/RCoS

    Analysis of Expression Patterns: The Scope of the Problem, the Problem of Scope

    Get PDF
    Studies of the expression patterns of many genes simultaneously lead to the observation that even in closely related pathologies, there are numerous genes that are differentially expressed in consistent patterns correlated to each sample type. The early uses of the enabling technology, microarrays, was focused on gathering mechanistic biological insights. The early findings now pose another clear challenge, finding ways to effectively use this kind of information to develop diagnostics

    Analysis of Expression Patterns: The Scope of the Problem, the Problem of Scope

    Get PDF
    Studies of the expression patterns of many genes simultaneously lead to the observation that even in closely related pathologies, there are numerous genes that are differentially expressed in consistent patterns correlated to each sample type. The early uses of the enabling technology, microarrays, was focused on gathering mechanistic biological insights. The early findings now pose another clear challenge, finding ways to effectively use this kind of information to develop diagnostics

    Agilent Laboratories

    No full text
    [11]) demonstrate the discovery of putative disease subtypes from gene expression data. The underlying computational problem is to partition the set of sample tissues into statistically meaningful classes. In this paper we present a novel approach to class discovery and develop automatic analysis methods. Our approach is based on statistically scoring candidate partitions according to the overabundance of genes that separate the different classes. Indeed, in biological datasets, an overabundance of genes separating known classes is typically observed. we measure overabundance against a stochastic null model. This allows for highlighting subtle, yet meaningful, partitions that are supported on a small subset of the genes. Using simulated annealing we explore the space of all possible partitions of the set of samples, seeking partitions with statistically significant overabundance of differentially expressed genes. We demonstrate the performance of our methods on synthetic data, where we recover planted partitions. Finally, we turn to tumor expression datasets, and show that we find several highly pronounced partitions. 1

    Clustering gene expression patterns

    No full text
    gene expression patterns, clustering, random graphs With the advance of hybridization array technology researchers can measure expression levels of sets of genes across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. We describe the problem of clustering multi-condition gene expression patterns. We define an appropriate stochastic model of the input, and use this model for performance evaluations. We present a 0(n(log(n)) c)-time algorithm that recovers cluster structures with high probability, in this model, where n is the number of genes. In addition to the theoretical treatment, we suggest practical heuristic improvements to the algorithm. We demonstrate the algorithm’s performance first on simulated data, and then on actual gene expression data

    Zero-One Permanent is #P-Complete, A Simpler Proof

    No full text
    In 1979, Valiant proved that computing the permanent of a 01-matrix is #PComplete. In this paper we present another proof for the same result. Our proof uses "black box" methodology, which facilitates its presentation. We also prove that deciding whether the permanent is divisible by a small prime is #P-Hard. We conclude by proving that a polynomially bounded function can not be #P-Complete under "reasonable" complexity assumptions. 1 Introduction The permanent has been the object of study by mathematicians since first appearing in the work of Cauchy and Binet in 1812. Despite its syntactical similarity to the determinant, no efficient procedure for computing the permanent is known. In 1979, Valiant provided a reason for this difficulty. In a landmark paper ([Val79a]) he showed that the permanent function is complete for the class #P of enumeration problems. Moreover, Valiant proved that even for 01-matrices, the problem remains #P-Complete. Valiant's proof has two parts. In the fir..

    Class Discovery in Gene Expression Data

    No full text
    Recent studies (Alizadeh et al, [1]; Bittner et al,[5]; Golub et al, [11]) demonstrate the discovery of putative disease subtypes from gene expression data. The underlying computational problem is to partition the set of sample tissues into statistically meaningful classes. In this paper we present a novel approach to class discovery and develop automatic analysis methods. Our approach is based on statistically scoring candidate partitions according to the overabundance of genes that separate the different classes. Indeed, in biological datasets, an overabundance of genes separating known classes is typically observed. we measure overabundance against a stochastic null model. This allows for highlighting subtle, yet meaningful, partitions that are supported on a small subset of the genes. Using simulated annealing we explore the space of all possible partitions of the set of samples, seeking partitions with statistically significant overabundance of differentially expressed genes. We ..
    • …
    corecore