19,992 research outputs found

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    Revisiting Interval Graphs for Network Science

    Full text link
    The vertices of an interval graph represent intervals over a real line where overlapping intervals denote that their corresponding vertices are adjacent. This implies that the vertices are measurable by a metric and there exists a linear structure in the system. The generalization is an embedding of a graph onto a multi-dimensional Euclidean space and it was used by scientists to study the multi-relational complexity of ecology. However the research went out of fashion in the 1980s and was not revisited when Network Science recently expressed interests with multi-relational networks known as multiplexes. This paper studies interval graphs from the perspective of Network Science

    Social dilemmas in an online social network: the structure and evolution of cooperation

    Full text link
    We investigate two paradigms for studying the evolution of cooperation--Prisoner's Dilemma and Snowdrift game in an online friendship network obtained from a social networking site. We demonstrate that such social network has small-world property and degree distribution has a power-law tail. Besides, it has hierarchical organizations and exhibits disassortative mixing pattern. We study the evolutionary version of the two types of games on it. It is found that enhancement and sustainment of cooperative behaviors are attributable to the underlying network topological organization. It is also shown that cooperators can survive when confronted with the invasion of defectors throughout the entire ranges of parameters of both games. The evolution of cooperation on empirical networks is influenced by various network effects in a combined manner, compared with that on model networks. Our results can help understand the cooperative behaviors in human groups and society.Comment: 14 pages, 7 figure

    A relational approach to knowledge spillovers in biotech. Network structures as drivers of inter-organizational citation patterns

    Get PDF
    In this paper, we analyze the geography of knowledge spillovers in biotech by investigating the way in which knowledge ties are organized. Following a relational account on knowledge spillovers, we depict knowledge networks as complex evolving structures that build on pre-existing knowledge and previously formed ties. In economic geography, there is still little understanding of how structural network forces (like preferential attachment and closure) shape the structure and formation of knowledge spillover networks in space. Our study investigates the knowledge spillover networks of biotech firms by means of inter-organizational citation patterns based on USPTO biotech patents in the years 2008-2010. Using a Stochastic Actor-Oriented Model (SAOM), we explain the driving forces behind the decision of actors to cite patents produced by other actors. Doing so, we address directly the endogenous forces of knowledge dynamics.knowledge spillovers, network structure, patent citations, biotech, proximity

    A Short Survey on Data Clustering Algorithms

    Full text link
    With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

    Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering

    Get PDF
    Let Q be a given n×n square symmetric matrix of nonnegative elements between 0 and 1, similarities. Fuzzy clustering results in fuzzy assignment of individuals to K clusters. In additive fuzzy clustering, the n×K fuzzy memberships matrix P is found by least-squares approximation of the off-diagonal elements of Q by inner products of rows of P. By contrast, kernelized fuzzy c-means is not least-squares and requires an additional fuzziness parameter. The aim is to popularize additive fuzzy clustering by interpreting it as a latent class model, whereby the elements of Q are modeled as the probability that two individuals share the same class on the basis of the assignment probability matrix P. Two new algorithms are provided, a brute force genetic algorithm (differential evolution) and an iterative row-wise quadratic programming algorithm of which the latter is the more effective. Simulations showed that (1) the method usually has a unique solution, except in special cases, (2) both algorithms reached this solution from random restarts and (3) the number of clusters can be well estimated by AIC. Additive fuzzy clustering is computationally efficient and combines attractive features of both the vector model and the cluster mode
    corecore