297 research outputs found

    Mining Biological Networks towards Protein complex Detection and Gene-Disease Association

    Get PDF
    Large amounts of biological data are continuously generated nowadays, thanks to the advancements of high-throughput experimental techniques. Mining valuable knowledge from such data still motivates the design of suitable computational methods, to complement the experimental work which is often bound by considerable time and cost requirements. Protein complexes or groups of interacting proteins, are key players in most cellular events. The identification of complexes not only allows to better understand normal biological processes but also to uncover Disease-triggering malfunctions. Ultimately, findings in this research branch can highly enhance the design of effective medical treatments. The aim of this research is to detect protein complexes in protein-protein interaction networks and to associate the detected entities to diseases. The work is divided into three main objectives: first, develop a suitable method for the identification of protein complexes in static interaction networks; second, model the dynamic aspect of protein interaction networks and detect complexes accordingly; and third, design a learning model to link proteins, and subsequently protein complexes, to diseases. In response to these objectives, we present, ProRank+, a novel complex-detection approach based on a ranking algorithm and a merging procedure. Then, we introduce DyCluster, which uses gene expression data, to model the dynamics of the interaction networks, and we adapt the detection algorithm accordingly. Finally, we integrate network topology attributes and several biological features of proteins to form a classification model for gene-disease association. The reliability of the proposed methods is supported by various experimental studies conducted to compare them with existing approaches. Pro Rank+ detects more protein complexes than other state-of-the-art methods. DyCluster goes a step further and achieves a better performance than similar techniques. Then, our learning model shows that combining topological and biological features can greatly enhance the gene-disease association process. Finally, we present a comprehensive case study of breast cancer in which we pinpoint disease genes using our learning model; subsequently, we detect favorable groupings of those genes in a protein interaction network using the Pro-rank+ algorithm

    Assessment of brain cancer atlas maps with multimodal imaging features.

    Get PDF
    BACKGROUND: Glioblastoma Multiforme (GBM) is a fast-growing and highly aggressive brain tumor that invades the nearby brain tissue and presents secondary nodular lesions across the whole brain but generally does not spread to distant organs. Without treatment, GBM can result in death in about 6 months. The challenges are known to depend on multiple factors: brain localization, resistance to conventional therapy, disrupted tumor blood supply inhibiting effective drug delivery, complications from peritumoral edema, intracranial hypertension, seizures, and neurotoxicity. MAIN TEXT: Imaging techniques are routinely used to obtain accurate detections of lesions that localize brain tumors. Especially magnetic resonance imaging (MRI) delivers multimodal images both before and after the administration of contrast, which results in displaying enhancement and describing physiological features as hemodynamic processes. This review considers one possible extension of the use of radiomics in GBM studies, one that recalibrates the analysis of targeted segmentations to the whole organ scale. After identifying critical areas of research, the focus is on illustrating the potential utility of an integrated approach with multimodal imaging, radiomic data processing and brain atlases as the main components. The templates associated with the outcome of straightforward analyses represent promising inference tools able to spatio-temporally inform on the GBM evolution while being generalizable also to other cancers. CONCLUSIONS: The focus on novel inference strategies applicable to complex cancer systems and based on building radiomic models from multimodal imaging data can be well supported by machine learning and other computational tools potentially able to translate suitably processed information into more accurate patient stratifications and evaluations of treatment efficacy

    Plsi: A Computational Software Pipeline For Pathway Level Disease Subtype Identification

    Get PDF
    It is accepted that many complex diseases, like cancer, consist in collections of distinct genetic diseases. Clinical advances in treatments are attributed to molecular treatments aimed at specific genes resulting in greater ecacy and fewer debilitating side effects. This proves that it is important to identify and appropriately treat each individual disease subtype. Our current understanding of subtypes is limited: despite targeted treatment advances, targeted therapies often fail for some patients. The main limitation of current methods for subtype identification is that they focus on gene expression, and they are subject to its intrinsic noise. Signaling pathways describe biological processes that are carried out by networks of genes interacting with each other. We developed PLSI, a software that allows to identify the specific pathways impacted in individual patients, subgroups of patients, or a given subtype of disease. The expected impact includes a better understanding of disease and resistance to treatment

    Generalized topographic block model

    No full text
    Co-clustering leads to parsimony in data visualisation with a number of parameters dramatically reduced in comparison to the dimensions of the data sample. Herein, we propose a new generalized approach for nonlinear mapping by a re-parameterization of the latent block mixture model. The densities modeling the blocks are in an exponential family such that the Gaussian, Bernoulli and Poisson laws are particular cases. The inference of the parameters is derived from the block expectation–maximization algorithm with a Newton–Raphson procedure at the maximization step. Empirical experiments with textual data validate the interest of our generalized model

    Evaluating the effects of high-throughput structural neuroimaging predictors on whole-brain functional connectome outcomes via network-based vector-on-matrix regression

    Full text link
    The joint analysis of multimodal neuroimaging data is critical in the field of brain research because it reveals complex interactive relationships between neurobiological structures and functions. In this study, we focus on investigating the effects of structural imaging (SI) features, including white matter micro-structure integrity (WMMI) and cortical thickness, on the whole brain functional connectome (FC) network. To achieve this goal, we propose a network-based vector-on-matrix regression model to characterize the FC-SI association patterns. We have developed a novel multi-level dense bipartite and clique subgraph extraction method to identify which subsets of spatially specific SI features intensively influence organized FC sub-networks. The proposed method can simultaneously identify highly correlated structural-connectomic association patterns and suppress false positive findings while handling millions of potential interactions. We apply our method to a multimodal neuroimaging dataset of 4,242 participants from the UK Biobank to evaluate the effects of whole-brain WMMI and cortical thickness on the resting-state FC. The results reveal that the WMMI on corticospinal tracts and inferior cerebellar peduncle significantly affect functional connections of sensorimotor, salience, and executive sub-networks with an average correlation of 0.81 (p<0.001).Comment: 20 pages, 5 figures, 2 table

    Onset of an outline map to get a hold on the wildwood of clustering methods

    Full text link
    The domain of cluster analysis is a meeting point for a very rich multidisciplinary encounter, with cluster-analytic methods being studied and developed in discrete mathematics, numerical analysis, statistics, data analysis and data science, and computer science (including machine learning, data mining, and knowledge discovery), to name but a few. The other side of the coin, however, is that the domain suffers from a major accessibility problem as well as from the fact that it is rife with division across many pretty isolated islands. As a way out, the present paper offers an outline map for the clustering domain as a whole, which takes the form of an overarching conceptual framework and a common language. With this framework we wish to contribute to structuring the domain, to characterizing methods that have often been developed and studied in quite different contexts, to identifying links between them, and to introducing a frame of reference for optimally setting up cluster analyses in data-analytic practice.Comment: 33 pages, 4 figure

    Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues

    Get PDF
    In order to improve upon stem cell therapy for osteoarthritis, it is necessary to understand the molecular and cellular processes behind bone development and the differences from cartilage formation. To further elucidate these processes would provide a means to analyze the relatedness of bone and cartilage tissue by determining genes that are expressed and regulated for stem cells to differentiate into skeletal tissues. It would also contribute to the classification of differences in normal skeletogenesis and degenerative conditions involving these tissues. The three predominant skeletal tissues of interest are bone, immature cartilage and mature cartilage. Analysis of the transcriptome of these skeletal tissues using RNA-seq technology was performed using differential expression, clustering and biclustering algorithms, to detect similarly expressed genes, which provides evidence for genes potentially interacting together to produce a particular phenotype. Identifying key regulators in the gene regulatory networks (GRNs) driving cartilage and bone development and the differences in the GRNs they drive will facilitate a means to make comparisons between the tissues at the transcriptomic level. Due to a small number of available samples for gene expression data in bone, immature and mature cartilage, it is necessary to determine how the number of samples influences the ability to make accurate GRN predictions. Machine learning techniques for GRN prediction that can incorporate multiple data types have not been well evaluated for complex organisms, nor has RNA-seq data been used often for evaluating these methods. Therefore, techniques identified to work well with microarray data were applied to RNA-seq data from mouse embryonic stem cells, where more samples are available for evaluation compared to the skeletal tissue RNA-seq samples. The RNA-seq data was combined with ChIP-seq data to determine if the machine learning methods outperform simple, correlation-based methods that have been evaluated using RNA-seq data alone. Two of the best performing GRN prediction algorithms from previous large-scale evaluations, which are incapable of incorporating data beyond expression data, were used as a baseline to determine if the addition of multiple data types could help reduce the number of gene expression samples. It was also necessary to identify a biclustering algorithm that could identify potentially biologically relevant modules. Publicly available ChIP-seq and RNA-seq samples from embryonic stem cells were used to measure the performance and consistency of each method, as there was a well-established network in mouse embryonic stem cells to compare results. The methods were then compared to cMonkey2, a biclustering method used in conjunction with ChIP-seq for two important transcription factors in the embryonic stem cell network. This was done to determine if any of these GRN prediction methods could potentially use the small number of skeletal tissue samples available to determine transcription factors orchestrating the expression of other genes driving cartilage and bone formation. Using the embryonic stem cell RNA-seq samples, it was found that sample size, if above 10, does not have a significant impact on the number of true positives in the top predicted interactions. Random forest methods outperform correlation-based methods when using RNA-seq, with area under ROC (AUROC) for evaluation, but the number of true positive interactions predicted when compared to a literature network were similar when using a strict cut-off. Using a limited set of ChIP-seq data was found to not improve the confidence in the transcription factor interactions and had no obvious affect on biclustering results. Correlation-based methods are likely the safest option when based on consistency of the results over multiple runs, but there is still the challenge of determining an appropriate cut-off to the predictions. To predict the skeletal tissue GRNs, cMonkey was used as an initial feature selection method to identify important genes in skeletal tissues and compared with other biclustering methods that do not use ChIP-seq. The predicted skeletal tissue GRNs will be utilized in future analyses of skeletal tissues, focussing on the evolutionary relationship between the GRNs driving skeletal tissue development

    From Classical to Modern Computational Approaches to Identify Key Genetic Regulatory Components in Plant Biology

    Get PDF
    The selection of plant genotypes with improved productivity and tolerance to environmental constraints has always been a major concern in plant breeding. Classical approaches based on the generation of variability and selection of better phenotypes from large variant collections have improved their efficacy and processivity due to the implementation of molecular biology techniques, particularly genomics, Next Generation Sequencing and other omics such as proteomics and metabolomics. In this regard, the identification of interesting variants before they develop the phenotype trait of interest with molecular markers has advanced the breeding process of new varieties. Moreover, the correlation of phenotype or biochemical traits with gene expression or protein abundance has boosted the identification of potential new regulators of the traits of interest, using a relatively low number of variants. These important breakthrough technologies, built on top of classical approaches, will be improved in the future by including the spatial variable, allowing the identification of gene(s) involved in key processes at the tissue and cell levels

    Statistical Techniques for Exploratory Analysis of Structured Three-Way and Dynamic Network Data.

    Full text link
    In this thesis, I develop different techniques for the pattern extraction and visual exploration of a collection of data matrices. Specifically, I present methods to help home in on and visualize an underlying structure and its evolution over ordered (e.g., time) or unordered (e.g., experimental conditions) index sets. The first part of the thesis introduces a biclustering technique for such three dimensional data arrays. This technique is capable of discovering potentially overlapping groups of samples and variables that evolve similarly with respect to a subset of conditions. To facilitate and enhance visual exploration, I introduce a framework that utilizes kernel smoothing to guide the estimation of bicluster responses over the array. In the second part of the thesis, I introduce two matrix factorization models. The first is a data integration model that decomposes the data into two factors: a basis common to all data matrices, and a coefficient matrix that varies for each data matrix. The second model is meant for visual clustering of nodes in dynamic network data, which often contains complex evolving structure. Hence, this approach is more flexible and additionally lets the basis evolve for each matrix in the array. Both models utilize a regularization within the framework of non-negative matrix factorization to encourage local smoothness of the basis and coefficient matrices, which improves interpretability and highlights the structural patterns underlying the data, while mitigating noise effects. I also address computational aspects of applying regularized non-negative matrix factorization models to large data arrays by presenting multiple algorithms, including an approximation algorithm based on alternating least squares.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99838/1/smankad_1.pd

    Development of mathematical methods for modeling biological systems

    Get PDF
    corecore