Search CORE

72 research outputs found

Compositional Mining of Multi-Relational Biological Datasets

Author: Jin Ying
Murali T.M.
Ramakrishnan Naren
Publication venue
Publication date: 01/01/2007
Field of study

High-throughput biological screens are yielding ever-growing streams of information about multiple aspects of cellular activity. As more and more categories of datasets come online, there is a corresponding multitude of ways in which inferences can be chained across them, motivating the need for compositional data mining algorithms. In this paper, we argue that such compositional data mining can be effectively realized by functionally cascading redescription mining and biclustering algorithms as primitives. Both these primitives mirror shifts of vocabulary that can be composed in arbitrary ways to create rich chains of inferences. Given a relational database and its schema, we show how the schema can be automatically compiled into a compositional data mining program, and how different domains in the schema can be related through logical sequences of biclustering and redescription invocations. This feature allows us to rapidly prototype new data mining applications, yielding greater understanding of scientific datasets. We describe two applications of compositional data mining: (i) matching terms across categories of the Gene Ontology and (ii) understanding the molecular mechanisms underlying stress response in human cells

Computer Science Technical Reports @Virginia Tech

CiteSeerX

BicSPAM: flexible biclustering using sequential patterns

Author: A Ben-Dor
A Califano
A Patrikainen
A Prelić
A Serin
A Tanay
AA Alizadeh
AR Donders
C Creighton
C Ding
C Tang
D Bozdağ
D Martin
DS Hochbaum
F Zhu
G Atluri
G Bebek
G Getz
G Pandey
GF Berriz
H Choi
H Toivonen
H Wang
J Bellay
J Han
J Ihmels
J Liu
J Liu
J Pei
J Wang
J Yang
JA Hartigan
K Sim
K Yip
L Lazzeroni
M Charrad
M de Souto
M Steinbach
MA Mahfouz
MJ Zaki
NR Mabroukeh
O Troyanskaya
P Carmona-Saez
P Fournier-Viger
Q Fang
Q Sheng
R Henriques
R Henriques
R Martinez
Rui Henriques
S Barkow
S Hochreiter
S Madeira
S Tavazoie
Sara C Madeira
SC Madeira
SS Young
T Calders
T Hellem
TR Golub
U Alon
X Yan
Y Huang
Y Okada
Y Okada
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Neural Biclustering in Gene Expression Analysis

Author: Barbiero P.
Bertotti A.
Ciravegna G.
Cirrincione G.
Piccolo E.
Publication venue: country:USA
Publication date: 01/01/2017
Field of study

Clustering in high dimensional spaces is a very difficult task. Dealing with DNA microarrays is even more difficult because gene subsets are coregulated and coexpressed only under specific conditions. Biclusterng addresses the problem of finding such submanifolds by exploiting both gene and condition (tissue) clustering. The paper proposes a self-organizing neural network, GH EXIN, which builds a hierarchical tree by adapting its architecture to data. It is integrated in a framework in which gene and tissue clustering are alternated and controlled by the quality of the bicluster. Examples of the approach and a biological validation of results are also given

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series

Author: A Ben-Dor
A Prelic
A Tanay
AP Gasch
Arlindo L Oliveira
C Wu
D Gusfield
D Martin
E Yang
GJ McLachlan
IP Androulakis
IV Mechelen
J Liu
J Liu
J Liu
J Liu
L Ji
L Ji
M Koyuturk
MC Teixeira
MF Sagot
Q Sheng
R Peeters
S Lonardi
Sara C Madeira
SC Madeira
SC Madeira
SC Madeira
SC Madeira
SC Madeira
TM Murali
Y Cheng
Y Zhang
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters. Methods In this work, we propose <it>e</it>-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters. Results We present results in real data showing the effectiveness of <it>e</it>-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in <it>Saccharomyces cerevisiae </it>in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series. Discussion The identification of co-regulated genes, involved in specific biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to efficiently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specific regulatory mechanisms. Availability A prototype implementation of the algorithm coded in Java together with the dataset and examples used in the paper is available in <url>http://kdbio.inesc-id.pt/software/e-ccc-biclustering</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Unsupervised Algorithms for Microarray Sample Stratification

Author: Cattelani Luca
Federico Antonio
Fratello Michele
Greco Dario
Pavel Alisa
Scala Giovanni
Serra Angela
Publication venue: Springer, UK
Publication date: 01/01/2022
Field of study

The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A method for visual identification of small sample subgroups and potential biomarkers

Author: Fontes Magnus
Soneson Charlotte
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

In order to find previously unknown subgroups in biomedical data and generate testable hypotheses, visually guided exploratory analysis can be of tremendous importance. In this paper we propose a new dissimilarity measure that can be used within the Multidimensional Scaling framework to obtain a joint low-dimensional representation of both the samples and variables of a multivariate data set, thereby providing an alternative to conventional biplots. In comparison with biplots, the representations obtained by our approach are particularly useful for exploratory analysis of data sets where there are small groups of variables sharing unusually high or low values for a small group of samples.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS460 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Lund University Publications