144,874 research outputs found

    Non-linear dimensionality reduction of signaling networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systems wide modeling and analysis of signaling networks is essential for understanding complex cellular behaviors, such as the biphasic responses to different combinations of cytokines and growth factors. For example, tumor necrosis factor (TNF) can act as a proapoptotic or prosurvival factor depending on its concentration, the current state of signaling network and the presence of other cytokines. To understand combinatorial regulation in such systems, new computational approaches are required that can take into account non-linear interactions in signaling networks and provide tools for clustering, visualization and predictive modeling.</p> <p>Results</p> <p>Here we extended and applied an unsupervised non-linear dimensionality reduction approach, Isomap, to find clusters of similar treatment conditions in two cell signaling networks: (I) apoptosis signaling network in human epithelial cancer cells treated with different combinations of TNF, epidermal growth factor (EGF) and insulin and (II) combination of signal transduction pathways stimulated by 21 different ligands based on AfCS double ligand screen data. For the analysis of the apoptosis signaling network we used the Cytokine compendium dataset where activity and concentration of 19 intracellular signaling molecules were measured to characterise apoptotic response to TNF, EGF and insulin. By projecting the original 19-dimensional space of intracellular signals into a low-dimensional space, Isomap was able to reconstruct clusters corresponding to different cytokine treatments that were identified with graph-based clustering. In comparison, Principal Component Analysis (PCA) and Partial Least Squares – Discriminant analysis (PLS-DA) were unable to find biologically meaningful clusters. We also showed that by using Isomap components for supervised classification with k-nearest neighbor (k-NN) and quadratic discriminant analysis (QDA), apoptosis intensity can be predicted for different combinations of TNF, EGF and insulin. Prediction accuracy was highest when early activation time points in the apoptosis signaling network were used to predict apoptosis rates at later time points. Extended Isomap also outperformed PCA on the AfCS double ligand screen data. Isomap identified more functionally coherent clusters than PCA and captured more information in the first two-components. The Isomap projection performs slightly worse when more signaling networks are analyzed; suggesting that the mapping function between cues and responses becomes increasingly non-linear when large signaling pathways are considered.</p> <p>Conclusion</p> <p>We developed and applied extended Isomap approach for the analysis of cell signaling networks. Potential biological applications of this method include characterization, visualization and clustering of different treatment conditions (i.e. low and high doses of TNF) in terms of changes in intracellular signaling they induce.</p

    Rank Priors for Continuous Non-Linear Dimensionality Reduction

    Get PDF
    Non-linear dimensionality reduction methods are powerful techniques to deal with high-dimensional datasets. However, they often are susceptible to local minima and perform poorly when initialized far from the global optimum, even when the intrinsic dimensionality is known a priori. In this work we introduce a prior over the dimensionality of the latent space, and simultaneously optimize both the latent space and its intrinsic dimensionality. Ad-hoc initialization schemes are unnecessary with our approach; we initialize the latent space to the observation space and automatically infer the latent dimensionality using an optimization scheme that drops dimensions in a continuous fashion. We report results applying our prior to various tasks involving probabilistic non-linear dimensionality reduction, and show that our method can outperform graph-based dimensionality reduction techniques as well as previously suggested ad-hoc initialization strategies

    Non-linear dimensionality reduction techniques for classification

    Get PDF
    This thesis project concerns on dimensionality reduction through manifold learning with a focus on non linear techniques. Dimension Reduction (DR) is the process of reducing high dimension dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original higher dimensional space. More in general, the concept of manifold learning is introduced, a generalized approach that involves algorithm for dimensionality reduction. Manifold learning can be divided in two main categories: Linear and Non Linear method. Although, linear method, such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) are widely used and well known, there are plenty of non linear techniques i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Local Tangent Space Alignment (LTSA), which in recent years have been subject of studies. This project is inspired by the work done by [Bahadur et Al., 2017 ], with the aim to estimate the US market dimensionality using Russell 3000 as a proxy of financial market. Since financial markets are high dimensional and complex environment an approach with non linear techniques among linear is proposed.This thesis project concerns on dimensionality reduction through manifold learning with a focus on non linear techniques. Dimension Reduction (DR) is the process of reducing high dimension dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original higher dimensional space. More in general, the concept of manifold learning is introduced, a generalized approach that involves algorithm for dimensionality reduction. Manifold learning can be divided in two main categories: Linear and Non Linear method. Although, linear method, such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) are widely used and well known, there are plenty of non linear techniques i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Local Tangent Space Alignment (LTSA), which in recent years have been subject of studies. This project is inspired by the work done by [Bahadur et Al., 2017 ], with the aim to estimate the US market dimensionality using Russell 3000 as a proxy of financial market. Since financial markets are high dimensional and complex environment an approach with non linear techniques among linear is proposed

    A framework for high dimensional data reduction in the microarray domain

    Full text link
    Microarray analysis and visualization is very helpful for biologists and clinicians to understand gene expression in cells and to facilitate diagnosis and treatment of patients. However, a typical microarray dataset has thousands of features and a very small number of observations. This very high dimensional data has a massive amount of information which often contains some noise, non-useful information and small number of relevant features for disease or genotype. This paper proposes a framework for very high dimensional data reduction based on three technologies: feature selection, linear dimensionality reduction and non-linear dimensionality reduction. In this paper, feature selection based on mutual information will be proposed for filtering features and selecting the most relevant features with the minimum redundancy. A kernel linear dimensionality reduction method is also used to extract the latent variables from a high dimensional data set. In addition, a non-linear dimensionality reduction based on local linear embedding is used to reduce the dimension and visualize the data. Experimental results are presented to show the outputs of each step and the efficiency of this framework. © 2010 IEEE

    Graph Analytics Methods In Feature Engineering

    Get PDF
    High-dimensional data sets can be difficult to visualize and analyze, while data in low-dimensional space tend to be more accessible. In order to aid visualization of the underlying structure of a dataset, the dimension of the dataset is reduced. The simplest approach to accomplish this task of dimensionality reduction is by a random projection of the data. Even though this approach allows some degree of visualization of the underlying structure, it is possible to lose more interesting underlying structure within the data. In order to address this concern, various supervised and unsupervised linear dimensionality reduction algorithms have been designed, such as Principal Component Analysis and Linear Discriminant Analysis. These methods can be powerful, but often miss important non-linear structure in the data. In this thesis, manifold learning approaches to dimensionality reduction are developed. These approaches combine both linear and non-linear methods of dimension reduction

    Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions

    Full text link
    In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space Rd\mathbb{R}^d. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. A main feature of our approach is the existence of a non-linear out-of-sample extension formula of the embedding coordinates, called a projected Nystr\"om approximation. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.Comment: 16 pages, 5 figures. Improved presentatio

    Fast Robust PCA on Graphs

    Get PDF
    Mining useful clusters from high dimensional data has received significant attention of the computer vision and pattern recognition community in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of dimensionality. However, often such methods are accompanied with three different problems: high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix factorization methods) and susceptibility to gross corruptions in the data. In this paper we propose a principal component analysis (PCA) based solution that overcomes these three issues and approximates a low-rank recovery method for high dimensional datasets. We target the low-rank recovery by enforcing two types of graph smoothness assumptions, one on the data samples and the other on the features by designing a convex optimization problem. The resulting algorithm is fast, efficient and scalable for huge datasets with O(nlog(n)) computational complexity in the number of data samples. It is also robust to gross corruptions in the dataset as well as to the model parameters. Clustering experiments on 7 benchmark datasets with different types of corruptions and background separation experiments on 3 video datasets show that our proposed model outperforms 10 state-of-the-art dimensionality reduction models. Our theoretical analysis proves that the proposed model is able to recover approximate low-rank representations with a bounded error for clusterable data
    • …
    corecore