Search CORE

142 research outputs found

Improving the performance of the iterative signature algorithm for the identification of relevant patterns

Author: Agresti
Barkow
Berriz
Cheng
Datta
Eisen
Fisz
Gasch
Ihmels
Ihmels
Lui
Madeira
Moura
Moura
Pinheiro
Prelic
Tanay
Publication venue: 'Wiley'
Publication date
Field of study

The iterative signature algorithm (ISA) has become very attractive to detect co-regulated genes from microarray data matrices and can be a useful tool for the identification of similar patterns in many other kinds of numerical data matrices. Nevertheless, its algorithmic strategy exhibits some limitations since it is based on statistical behavior of the average and considers averages weighted by scores not necessarily positive. Hence, we propose to take the median instead of the average and to use absolutes scores in ISA's structure. Furthermore, a generalized function is also introduced in the algorithm in order to improve its algorithmic strategy for detecting high value or low value biclusters. The effects of these simple modifications on the performance of the biclustering algorithm are evaluated through an experimental comparative study involving synthetic data sets and real data from the organism Saccharomyces cerevisiae. The experimental results show that the proposed variations of ISA outperform the original version in many situations. Absolute scores in ISA are shown to be essential for the correct interpretation of the biclusters found by the algorithm. The median instead of the average turns the biclustering algorithm more resilient to outliers in the data sets. Copyright © 2011 Wiley Periodicals, Inc

Crossref

Repositório Institucional da Universidade de Aveiro

BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data

Author: Gonçalves Joana P
Madeira Sara C
Oliveira Arlindo L
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The ability to monitor changes in expression patterns over time, and to observe the emergence of coherent temporal responses using expression time series, is critical to advance our understanding of complex biological processes. Biclustering has been recognized as an effective method for discovering local temporal expression patterns and unraveling potential regulatory mechanisms. The general biclustering problem is NP-hard. In the case of time series this problem is tractable, and efficient algorithms can be used. However, there is still a need for specialized applications able to take advantage of the temporal properties inherent to expression time series, both from a computational and a biological perspective. Findings BiGGEsTS makes available state-of-the-art biclustering algorithms for analyzing expression time series. Gene Ontology (GO) annotations are used to assess the biological relevance of the biclusters. Methods for preprocessing expression time series and post-processing results are also included. The analysis is additionally supported by a visualization module capable of displaying informative representations of the data, including heatmaps, dendrograms, expression charts and graphs of enriched GO terms. Conclusion BiGGEsTS is a free open source graphical software tool for revealing local coexpression of genes in specific intervals of time, while integrating meaningful information on gene annotations. It is freely available at: <url>http://kdbio.inesc-id.pt/software/biggests</url>. We present a case study on the discovery of transcriptional regulatory modules in the response of <it>Saccharomyces cerevisiae </it>to heat stress.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data

Author: A Ben-Dor
A Dharan
A Prelic
A Schliep
A Tanay
A Yip
B Pontes
C Cano
C Gallo
DD Lewis
EL Lehmann
F Angiulli
F Divina
GF Berriz
H Turner
H Wang
IS Dhillon
J Liu
J Yang
JA Hartigan
Jin-Kao Hao
JS Aguilar-Ruiz
K Bryan
K Cheng
L Lazzeroni
L Teng
Mourad Elloumi
R Agrawal
R Balasubramaniyan
S Barkow
S Bergmann
S Bleuler
S Mitra
S Tavazoie
SC Madeira
SC Madeira
SD Peddada
T Hofmann
U Maulik
W Gaul
Wassim Ayadi
X Liu
Y Cheng
Y Cheng
Y Christinat
Y Luan
Y Okada
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called <it>biclustering</it>. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. Methods We introduce <it>BiMine</it>, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, <it>BiMine </it>relies on a new evaluation function called <it>Average Spearman's rho </it>(ASR). Second, <it>BiMine </it>uses a new tree structure, called <it>Bicluster Enumeration Tree </it>(BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, <it>BiMine </it>introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters. Results The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that <it>BiMine </it>competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Okina

Development of Biclustering Techniques for Gene Expression Data Modeling and Mining

Author: Xie Juan
Publication venue: Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange
Publication date: 01/01/2018
Field of study

The next-generation sequencing technologies can generate large-scale biological data with higher resolution, better accuracy, and lower technical variation than the arraybased counterparts. RNA sequencing (RNA-Seq) can generate genome-scale gene expression data in biological samples at a given moment, facilitating a better understanding of cell functions at genetic and cellular levels. The abundance of gene expression datasets provides an opportunity to identify genes with similar expression patterns across multiple conditions, i.e., co-expression gene modules (CEMs). Genomescale identification of CEMs can be modeled and solved by biclustering, a twodimensional data mining technique that allows clustering of rows and columns in a gene expression matrix, simultaneously. Compared with traditional clustering that targets global patterns, biclustering can predict local patterns. This unique feature makes biclustering very useful when applied to big gene expression data since genes that participate in a cellular process are only active in specific conditions, thus are usually coexpressed under a subset of all conditions. The combination of biclustering and large-scale gene expression data holds promising potential for condition-specific functional pathway/network analysis. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-Seq data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, especially for scRNA-Seq data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. QUBIC2, a novel biclustering algorithm, is designed for large-scale bulk RNA-Seq and single-cell RNA-seq (scRNA-Seq) data analysis. Critical novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression; (ii) adopted the Gaussian mixture distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes; (iii) utilized a Dual strategy to expand the core biclusters, aiming to save dropouts from the background; and (iv) developed a statistical framework to evaluate the significances of all the identified biclusters. Method validation on comprehensive data sets suggests that QUBIC2 had superior performance in functional modules detection and cell type classification. The applications of temporal and spatial data demonstrated that QUBIC2 could derive meaningful biological information from scRNA-Seq data. Also presented in this dissertation is QUBICR. This R package is characterized by an 82% average improved efficiency compared to the source C code of QUBIC. It provides a set of comprehensive functions to facilitate biclustering-based biological studies, including the discretization of expression data, query-based biclustering, bicluster expanding, biclusters comparison, heatmap visualization of any identified biclusters, and co-expression networks elucidation. In the end, a systematical summary is provided regarding the primary applications of biclustering for biological data and more advanced applications for biomedical data. It will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency

Public Research Access Institutional Repository and Information Exchange

Extracting expression modules from perturbational gene expression compendia

Author: A Joshi
A Prelić
A Tanay
A Tanay
AL Barabási
AW Rives
C Stark
CE Horak
CT Harbison
D Pe'er
DJ Reiss
Dk Lee
E Ragni
E Ravasz
E Segal
E Segal
G Getz
G Lesage
GD Bader
GK Smyth
H Kitano
I Laloux
I Laloux
J Ihmels
J Ihmels
J Supper
JA Ubersax
JDJ Han
L Lazzeroni
LA Amaral
LF Wu
LH Hartwell
M Ashburner
M Gaisne
M Halkidi
M Schmid
Martin Kuiper
MB Eisen
MG Walker
MZ Bao
N Bolshakova
N Metropolis
P D'haeseleer
Patrick Van Dijck
Q Sheng
R Albert
R Shamir
R Tanaka
S Barkow
S Bergmann
S Bergmann
S Erdman
S Hohmann
S Kirkpatrick
S Maere
SC Madeira
SK Kim
Steven Maere
T Ideker
T Michoel
TR Hughes
W Zhang
X Cui
Y Benjamini
Y Cheng
Y Kluger
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Compendia of gene expression profiles under chemical and genetic perturbations constitute an invaluable resource from a systems biology perspective. However, the perturbational nature of such data imposes specific challenges on the computational methods used to analyze them. In particular, traditional clustering algorithms have difficulties in handling one of the prominent features of perturbational compendia, namely partial coexpression relationships between genes. Biclustering methods on the other hand are specifically designed to capture such partial coexpression patterns, but they show a variety of other drawbacks. For instance, some biclustering methods are less suited to identify overlapping biclusters, while others generate highly redundant biclusters. Also, none of the existing biclustering tools takes advantage of the staple of perturbational expression data analysis: the identification of differentially expressed genes. Results We introduce a novel method, called ENIGMA, that addresses some of these issues. ENIGMA leverages differential expression analysis results to extract expression modules from perturbational gene expression data. The core parameters of the ENIGMA clustering procedure are automatically optimized to reduce the redundancy between modules. In contrast to the biclusters produced by most other methods, ENIGMA modules may show internal substructure, i.e. subsets of genes with distinct but significantly related expression patterns. The grouping of these (often functionally) related patterns in one module greatly aids in the biological interpretation of the data. We show that ENIGMA outperforms other methods on artificial datasets, using a quality criterion that, unlike other criteria, can be used for algorithms that generate overlapping clusters and that can be modified to take redundancy between clusters into account. Finally, we apply ENIGMA to the Rosetta compendium of expression profiles for <it>Saccharomyces cerevisiae </it>and we analyze one pheromone response-related module in more detail, demonstrating the potential of ENIGMA to generate detailed predictions. Conclusion It is increasingly recognized that perturbational expression compendia are essential to identify the gene networks underlying cellular function, and efforts to build these for different organisms are currently underway. We show that ENIGMA constitutes a valuable addition to the repertoire of methods to analyze such data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

Construction of gene regulatory networks using biclustering and bayesian networks

Author: A Ben-Dor
A Faisal
A Prelic
A Tanay
AC Lozano
AP Gasch
C Wolfe
CT Ronald
D Jesse
D Reiss
F Azuaje
Fadhl M Alakwaa
FM Al-Akwaa
FM Alakwaa
G Bader
G Fung
G Stolovitzky
I Avila-Campillo
J Ihmels
KO Cheng
MD Dyer
N Friedman
Nahed H Solouma
O Troyanskaya
P D haeseleer
P D'haeseleer
P Shannon
Pe Dana
PTSG Spellman
R Bonneau
R Guthke
S Barkow
S Datta
S Kauffman
S Maere
S Tavazoie
SC Madeira
T Chen
TM Murali
X Liu
Xw Chen
Y Assenov
Y Cheng
Yasser M Kadah
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Understanding gene interactions in complex living systems can be seen as the ultimate goal of the systems biology revolution. Hence, to elucidate disease ontology fully and to reduce the cost of drug development, gene regulatory networks (GRNs) have to be constructed. During the last decade, many GRN inference algorithms based on genome-wide data have been developed to unravel the complexity of gene regulation. Time series transcriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to the large number of genes. Dimensionality is one of the interesting problems in GRN modelling. Results In this paper, we develop a biclustering function enrichment analysis toolbox (BicAT-plus) to study the effect of biclustering in reducing data dimensions. The network generated from our system was validated via available interaction databases and was compared with previous methods. The results revealed the performance of our proposed method. Conclusions Because of the sparse nature of GRNs, the results of biclustering techniques differ significantly from those of previous methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Data Mining Using the Crossing Minimization Paradigm

Author: Abdullah Ahsan
Publication venue: University of Stirling
Publication date: 01/01/2007
Field of study

Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis

CiteSeerX

Stirling Online Research Repository

Propagation-Based Biclustering Algorithm for Extracting Inclusion-Maximal Motifs

Author: Boryczko Krzysztof
Orzechowski Patryk
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 11/07/2016
Field of study

Biclustering, which is simultaneous clustering of columns and rows in data matrix, became an issue when classical clustering algorithms proved not to be good enough to detect similar expressions of genes under subset of conditions. Biclustering algorithms may be also applied to different datasets, such as medical, economical, social networks etc. In this article we explain the concept beneath hybrid biclustering algorithms and present details of propagation-based biclustering, a novel approach for extracting inclusion-maximal gene expression motifs conserved in gene microarray data. We prove that this approach may successfully compete with other well-recognized biclustering algorithms

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)