57,612 research outputs found

    Bipartite graph partitioning and data clustering

    Get PDF
    Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.Comment: Proceedings of ACM CIKM 2001, the Tenth International Conference on Information and Knowledge Management, 200

    Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation

    Get PDF
    We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition

    Building Mini-Categories in Product Networks

    Full text link
    We constructed a product network based on the sales data collected and provided by a Fortune 500 speciality retailer. The structure of the network is dominated by small isolated components, dense clique-based communities, and sparse stars and linear chains and pendants. We used the identified structural elements (tiles) to organize products into mini-categories -- compact collections of potentially complementary and substitute items. The mini-categories extend the traditional hierarchy of retail products (group - class - subcategory) and may serve as building blocks towards exploration of consumer projects and long-term customer behavior.Comment: Accepted to CompleNet, March 2015, NYC, NY, USA; 12 pages, 4 figure

    The boundaries of dipole graphs and the complete bipartite graphs K_{2,n}

    Full text link
    We study the Seifert surfaces of a link by relating the embeddings of graphs by using induced graphs. As applications, we prove that every link LL is the boundary of an oriented surface which is obtained from a graph embedding of a complete bipartite graph K2,nK_{2,n}, where all voltage assignments on the edges of K2,nK_{2,n} are 0. We also provide an algorithm to construct such a graph diagram of a given link and demonstrate the algorithm by dealing with the links 4124_1^2 and 525_2.Comment: 14 pages, 12 figure

    An agent-based model for mRNA export through the nuclear pore complex.

    Get PDF
    mRNA export from the nucleus is an essential step in the expression of every protein- coding gene in eukaryotes, but many aspects of this process remain poorly understood. The density of export receptors that must bind an mRNA to ensure export, as well as how receptor distribution affects transport dynamics, is not known. It is also unclear whether the rate-limiting step for transport occurs at the nuclear basket, in the central channel, or on the cytoplasmic face of the nuclear pore complex. Using previously published biophysical and biochemical parameters of mRNA export, we implemented a three-dimensional, coarse-grained, agent-based model of mRNA export in the nanosecond regime to gain insight into these issues. On running the model, we observed that mRNA export is sensitive to the number and distribution of transport receptors coating the mRNA and that there is a rate-limiting step in the nuclear basket that is potentially associated with the mRNA reconfiguring itself to thread into the central channel. Of note, our results also suggest that using a single location-monitoring mRNA label may be insufficient to correctly capture the time regime of mRNA threading through the pore and subsequent transport. This has implications for future experimental design to study mRNA transport dynamics

    Designing identity of a new material: a new product design approach

    Get PDF
    The present research is a design practice-based research based on the industrial development of a new concrete. The research focuses on the development of the specific identity of a new material. The research is aimed at demonstrating that product design can be used as a new strategy to create the material identity and thus to differentiate from existing materials. In order to design material specific identity in new products, we need to understand the perception process of shaped materials. Therefore we conducted exploratory study of materials recognition in products. We identified two types of products: the “messenger” products are specific shapes characteristic from the material; the “wrong messenger” products are imitations of other well known materials. The results of questionnaire about material recognition show that it’s more or less easy to identify material according to each product (whether it’s familiar or new shapes; whether it’s imitation or specific shapes and whether it’s well known or new material). We conclude on two types of shapes: on the one hand some familiar and typical shapes make easier and more certain the material recognition; on the other hand some new shapes make people more uncertain of what it is made of but more amazed. Designing amazing new shapes can be used as a new differentiation strategy to create the specific sensory identity of each new material. It means that the product can be a really useful support to fully communicate about a new material, beyond the traditional material samples. Keywords: New Material; Sensory Identity; Product Design</p
    corecore