Search CORE

382 research outputs found

Development of Biclustering Techniques for Gene Expression Data Modeling and Mining

Author: Xie Juan
Publication venue: Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange
Publication date: 01/01/2018
Field of study

The next-generation sequencing technologies can generate large-scale biological data with higher resolution, better accuracy, and lower technical variation than the arraybased counterparts. RNA sequencing (RNA-Seq) can generate genome-scale gene expression data in biological samples at a given moment, facilitating a better understanding of cell functions at genetic and cellular levels. The abundance of gene expression datasets provides an opportunity to identify genes with similar expression patterns across multiple conditions, i.e., co-expression gene modules (CEMs). Genomescale identification of CEMs can be modeled and solved by biclustering, a twodimensional data mining technique that allows clustering of rows and columns in a gene expression matrix, simultaneously. Compared with traditional clustering that targets global patterns, biclustering can predict local patterns. This unique feature makes biclustering very useful when applied to big gene expression data since genes that participate in a cellular process are only active in specific conditions, thus are usually coexpressed under a subset of all conditions. The combination of biclustering and large-scale gene expression data holds promising potential for condition-specific functional pathway/network analysis. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-Seq data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, especially for scRNA-Seq data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. QUBIC2, a novel biclustering algorithm, is designed for large-scale bulk RNA-Seq and single-cell RNA-seq (scRNA-Seq) data analysis. Critical novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression; (ii) adopted the Gaussian mixture distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes; (iii) utilized a Dual strategy to expand the core biclusters, aiming to save dropouts from the background; and (iv) developed a statistical framework to evaluate the significances of all the identified biclusters. Method validation on comprehensive data sets suggests that QUBIC2 had superior performance in functional modules detection and cell type classification. The applications of temporal and spatial data demonstrated that QUBIC2 could derive meaningful biological information from scRNA-Seq data. Also presented in this dissertation is QUBICR. This R package is characterized by an 82% average improved efficiency compared to the source C code of QUBIC. It provides a set of comprehensive functions to facilitate biclustering-based biological studies, including the discretization of expression data, query-based biclustering, bicluster expanding, biclusters comparison, heatmap visualization of any identified biclusters, and co-expression networks elucidation. In the end, a systematical summary is provided regarding the primary applications of biclustering for biological data and more advanced applications for biomedical data. It will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency

Public Research Access Institutional Repository and Information Exchange

Using geodesic space density gradients for network community detection

Author: Al-Maadeed Somaya
Mahmood Arif
Rajpoot Nasir M.
Small Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Many real world complex systems naturally map to network data structures instead of geometric spaces because the only available information is the presence or absence of a link between two entities in the system. To enable data mining techniques to solve problems in the network domain, the nodes need to be mapped to a geometric space. We propose this mapping by representing each network node with its geodesic distances from all other nodes. The space spanned by the geodesic distance vectors is the geodesic space of that network. Position of different nodes in the geodesic space encode the network structure. In this space, considering a continuous density field induced by each node, density at a specific point is the summation of density fields induced by all nodes. We drift each node in the direction of positive density gradient using an iterative algorithm till each node reaches a local maximum. Due to the network structure captured by this space, the nodes that drift to the same region of space belong to the same communities in the original network. We use the direction of movement and final position of each node as important clues for community membership assignment. The proposed algorithm is compared with more than ten state of the art community detection techniques on two benchmark networks with known communities using Normalized Mutual Information criterion. The proposed algorithm outperformed these methods by a significant margin. Moreover, the proposed algorithm has also shown excellent performance on many real-world networks

Qatar University Institutional Repository

Warwick Research Archives Portal Repository

Community mining using three closely joint techniques based on community mutual membership and refinement strategy

Author: Ghalamzan E. Amir M.
Jiao Licheng
Liu Huan
Shang Ronghua
Publication venue: 'Elsevier BV'
Publication date: 01/12/2017
Field of study

Community structure has become one of the central studies of the topological structure of complex networks in the past decades. Although many advanced approaches have been proposed to identify community structure, those state-of-the-art methods still lack efficiency in terms of a balance between stability, accuracy and computation time. Here, we propose an algorithm with different stages, called TJA-net, to efficiently identify communities in a large network with a good balance between accuracy, stability and computation time. First, we propose an initial labeling algorithm, called ILPA, combining K-nearest neighbor (KNN) and label propagation algorithm (LPA). To produce a number of sub-communities automatically, ILPA iteratively labels a node in a network using the labels of its adjacent nodes and their index of closeness. Next, we merge sub-communities using the mutual membership of two communities. Finally, a refinement strategy is designed for modifying the label of the wrongly clustered nodes at boundaries. In our approach, we propose and use modularity density as the objective function rather than the commonly used modularity. This can deal with the issue of the resolution limit for different network structures enhancing the result precision. We present a series of experiments with artificial and real data set and compare the results obtained by our proposed algorithm with the ones obtained by the state-of-the-art algorithms, which shows the effectiveness of our proposed approach. The experimental results on large-scale artificial networks and real networks illustrate the superiority of our algorithm

University of Lincoln Institutional Repository

University of Birmingham Research Portal

A general co-expression network-based approach to gene expression analysis: comparison and applications

Author: A Aggarwal
A Alizadeh
A Barabasi
A Gasch
A Ghazalpour
A Presson
A Thalamuthu
Angela K Dean
AY Ng
C Cooper
C Harbison
C Stark
D Altman
D Ellis
D Weston
D Zhu
E Boyle
E Keller
E Ravasz
F Azuaje
H Jeong
H Lee
I Jordan
J Herrero
J Jaeger
J Ruan
J Stuart
J Tegner
Jianhua Ruan
JJ Faith
KS Jones
L Elo
M Davidich
M Eisen
M Garey
M Meila
M Newman
M Newman
M Oldham
M Ray
M Shipp
M Siegal
MR Carlson
N Friedman
P Fjallstrom
P Magwene
P Rousseeuw
P Tamayo
P Tsaparas
R Albert
R Tibshirani
S Carter
S Dwight
S Horvath
SV Dongen
U Brandes
V Srinivasasainagendra
V van Noort
Weixiong Zhang
X Zhou
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Co-expression network-based approaches have become popular in analyzing microarray data, such as for detecting functional gene modules. However, co-expression networks are often constructed by ad hoc methods, and network-based analyses have not been shown to outperform the conventional cluster analyses, partially due to the lack of an unbiased evaluation metric. Results Here, we develop a general co-expression network-based approach for analyzing both genes and samples in microarray data. Our approach consists of a simple but robust rank-based network construction method, a parameter-free module discovery algorithm and a novel reference network-based metric for module evaluation. We report some interesting topological properties of rank-based co-expression networks that are very different from that of value-based networks in the literature. Using a large set of synthetic and real microarray data, we demonstrate the superior performance of our approach over several popular existing algorithms. Applications of our approach to yeast, Arabidopsis and human cancer microarray data reveal many interesting modules, including a fatal subtype of lymphoma and a gene module regulating yeast telomere integrity, which were missed by the existing methods. Conclusions We demonstrated that our novel approach is very effective in discovering the modular structures in microarray data, both for genes and for samples. As the method is essentially parameter-free, it may be applied to large data sets where the number of clusters is difficult to estimate. The method is also very general and can be applied to other types of data. A MATLAB implementation of our algorithm can be downloaded from <url>http://cs.utsa.edu/~jruan/Software.html</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics

Author: A Arenas
A Capocci
A Hinneburg
A Lancichinetti
A Lancichinetti
AK Ramani
C Baerveldt
D Ekman
D Krioukov
DJ Watts
DL Nelson
ER Gansner
F Radicchi
G Palla
G Tibély
H Yu
I Kovacs
I Vragovic
István A. Kovács
J Moody
JB Axelsen
JD Han
JM Kumpula
JM Thevelein
JP Bagrow
JP Eckmann
JW Berry
K Komurov
M Blatt
M Fiedler
M Girvan
M Grendar
M Rosvall
ME Newman
ME Newman
ML Clark
Máté S. Szalay
N Bertin
Olaf Sporns
P Csermely
P Pons
Peter Csermely
PM Kim
Robin Palotai
S Fortunato
S Fortunato
S Fortunato
T Nepusz
TS Evans
V Latora
VD Blondel
WW Zachary
Y-Y Ahn
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

Background: Network communities help the functional organization and evolution of complex networks. However, the development of a method, which is both fast and accurate, provides modular overlaps and partitions of a heterogeneous network, has proven to be rather difficult. Methodology/Principal Findings: Here we introduce the novel concept of ModuLand, an integrative method family determining overlapping network modules as hills of an influence function-based, centrality-type community landscape, and including several widely used modularization methods as special cases. As various adaptations of the method family, we developed several algorithms, which provide an efficient analysis of weighted and directed networks, and (1) determine pervasively overlapping modules with high resolution; (2) uncover a detailed hierarchical network structure allowing an efficient, zoom-in analysis of large networks; (3) allow the determination of key network nodes and (4) help to predict network dynamics. Conclusions/Significance: The concept opens a wide range of possibilities to develop new approaches and applications including network routing, classification, comparison and prediction.Comment: 25 pages with 6 figures and a Glossary + Supporting Information containing pseudo-codes of all algorithms used, 14 Figures, 5 Tables (with 18 module definitions, 129 different modularization methods, 13 module comparision methods) and 396 references. All algorithms can be downloaded from this web-site: http://www.linkgroup.hu/modules.ph

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

ELTE Digital Institutional Repository (EDIT)

Recent Advances in Social Data and Artificial Intelligence 2019

Author
Publication venue: 'MDPI AG'
Publication date: 12/08/2022
Field of study

The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace

Directory of Open Access Books (DOAB)

Large-scale community detection based on node membership grade and sub-communities integration

Author: Jiao Licheng
Li Yangyang
Luo Shuang
Shang Ronghua
Stolkin Rustam
Publication venue: 'Elsevier BV'
Publication date: 15/06/2015
Field of study

University of Birmingham Research Portal

Specialized dynamical properties of promiscuous residues revealed by simulated conformational ensembles

Author: Abecasis G. R.
Alessandro Pandini
Altschul S. F.
Altshuler D. M.
Amadei A.
Aranda B.
Arianna Fornili
Bahar I.
Bahar I.
Bahar I.
Berendsen H.
Bhardwaj N.
Bobay B. G.
Boehr D. D.
Bogan A. A.
Bordogna A.
Bouvier B.
Brookes A. J.
Camacho C.
Carbonell P.
Chandonia J.-M.
Cover T. M.
Cukuroglu E.
Cumming G.
Daily M. D.
Dasgupta B.
Davis F. P.
de Groot B. L.
de Groot B. L.
De Simone A.
del Sol A.
DeLano W.
Dobbins S. E.
Dong Q.
Doruker P.
Dosztányi Z.
Dunbrack R. L.
Dyson H. J.
Echave J.
Ekman D.
Erijman A.
Essmann U.
Eyrisch S.
Fernández A.
Ferrer-Costa C.
Fong J. H.
Fornili A.
Franca Fraternali
Fraternali F.
Goldenberg O.
Haliloglu T.
Haliloglu T.
Hamosh A.
Han J.-D. J.
Hess B.
Hess B.
Higurashi M.
Higurashi M.
Hub J. S.
Hui-Chun Lu
Humphris E. L.
Jeong H.
Jones S.
Jorgensen W.
Kabsch W.
Kar G.
Keskin O.
Keskin O.
Keskin O.
Keskin O.
Keskin O.
Kiel C.
Kim P. M.
Kim P. M.
Kim S.
Kleinjung J.
Kohn J. E.
Kortemme T.
Krissinel E.
Kuttner Y. Y.
Kuzu G.
Lange O. F.
Li X.
Liu L.
Lounnas V.
Maguid S.
Margreitter C.
Martin A. C. R.
Meireles L.
Meireles L. M. C.
Micheletti C.
Mittag T.
Münz M.
Nussinov R.
Pandini A.
Pandini A.
Pandini A.
Pandini A.
Pandini A.
Park B. H.
Patil A.
Patil A.
Peters J. H.
Petrov D.
Poirot O.
Qin H.
R-Development-Core-Team
Rajamani D.
Roulston M.
Rousseeuw P.
Schlitter J.
Schäfer H.
Seeliger D.
Sherry S. T.
Stein A.
Tsai C.-J.
Tuncbag N.
Tuncbag N.
Tyagi M.
van der Spoel D.
Van Gunsteren W.
Vogel C.
Vogel C.
Volkman B. F.
Wells J. A.
Winget J. M.
Wolfe R.
Yogurtcu O. N.
Zen A.
Zen A.
Zhang Q. C.
Zheng W.
Zhu X.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 18/10/2013
Field of study

The ability to interact with different partners is one of the most important features in proteins. Proteins that bind a large number of partners (hubs) have been often associated with intrinsic disorder. However, many examples exist of hubs with an ordered structure, and evidence of a general mechanism promoting promiscuity in ordered proteins is still elusive. An intriguing hypothesis is that promiscuous binding sites have specific dynamical properties, distinct from the rest of the interface and pre-existing in the protein isolated state. Here, we present the first comprehensive study of the intrinsic dynamics of promiscuous residues in a large protein data set. Different computational methods, from coarse-grained elastic models to geometry-based sampling methods and to full-atom Molecular Dynamics simulations, were used to generate conformational ensembles for the isolated proteins. The flexibility and dynamic correlations of interface residues with a different degree of binding promiscuity were calculated and compared considering side chain and backbone motions, the latter both on a local and on a global scale. The study revealed that (a) promiscuous residues tend to be more flexible than nonpromiscuous ones, (b) this additional flexibility has a higher degree of organization, and (c) evolutionary conservation and binding promiscuity have opposite effects on intrinsic dynamics. Findings on simulated ensembles were also validated on ensembles of experimental structures extracted from the Protein Data Bank (PDB). Additionally, the low occurrence of single nucleotide polymorphisms observed for promiscuous residues indicated a tendency to preserve binding diversity at these positions. A case study on two ubiquitin-like proteins exemplifies how binding promiscuity in evolutionary related proteins can be modulated by the fine-tuning of the interface dynamics. The interplay between promiscuity and flexibility highlighted here can inspire new directions in protein-protein interaction prediction and design methods. © 2013 American Chemical Society

Crossref

PubMed Central

King's Research Portal

Brunel University Research Archive

Modular design and analysis of synthetic biochemical networks

Author: Roekel van, H.W.H.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2015
Field of study

Repository TU/e

Pure OAI Repository