10 research outputs found

    Graph machine learning for assembly modeling

    Get PDF
    Assembly modeling refers to the design engineering process of composing assemblies (e.g., machines or machine components) from a common catalog of existing parts. There is a natural correspondence of assemblies to graphs which can be exploited for services based on graph machine learning such as part recommendation, clustering/taxonomy creation, or anomaly detection. However, this domain imposes particular challenges such as the treatment of unknown or new parts, ambiguously extracted edges, incomplete information about the design sequence, interaction with design engineers as users, to name a few. Along with open research questions, we present a novel data set

    Mining Unfollow Behavior in Large-Scale Online Social Networks via Spatial-Temporal Interaction

    Full text link
    Online Social Networks (OSNs) evolve through two pervasive behaviors: follow and unfollow, which respectively signify relationship creation and relationship dissolution. Researches on social network evolution mainly focus on the follow behavior, while the unfollow behavior has largely been ignored. Mining unfollow behavior is challenging because user's decision on unfollow is not only affected by the simple combination of user's attributes like informativeness and reciprocity, but also affected by the complex interaction among them. Meanwhile, prior datasets seldom contain sufficient records for inferring such complex interaction. To address these issues, we first construct a large-scale real-world Weibo dataset, which records detailed post content and relationship dynamics of 1.8 million Chinese users. Next, we define user's attributes as two categories: spatial attributes (e.g., social role of user) and temporal attributes (e.g., post content of user). Leveraging the constructed dataset, we systematically study how the interaction effects between user's spatial and temporal attributes contribute to the unfollow behavior. Afterwards, we propose a novel unified model with heterogeneous information (UMHI) for unfollow prediction. Specifically, our UMHI model: 1) captures user's spatial attributes through social network structure; 2) infers user's temporal attributes through user-posted content and unfollow history; and 3) models the interaction between spatial and temporal attributes by the nonlinear MLP layers. Comprehensive evaluations on the constructed dataset demonstrate that the proposed UMHI model outperforms baseline methods by 16.44% on average in terms of precision. In addition, factor analyses verify that both spatial attributes and temporal attributes are essential for mining unfollow behavior.Comment: 8 pages, 7 figures, Accepted by AAAI 202

    Unsupervised annotation of regulatory domains by integrating functional genomic assays and Hi-C data

    Get PDF
    In each cell type, chromosomes are organized into a specific 3D structure that controls the function of a cell through different mechanisms including domain-scale regulation. Because of the correlation between genome structure and its function, different methods have been proposed to integrate 1D functional genomic and 2D Hi-C data to identify domain types. Existing methods rely on an assumption that directly connected genomic regions are more probable to have the same domain type, however, spatial clustering of genomic regions is based on both their first-order and second-order proximities. Here, we present an integrative approach that uses 1D functional genomic features and 3D interactions from Hi-C data to assign labels to genomic regions that can discriminate both spatial and functional genomic patterns. We use graph embedding to learn latent variables for nodes (genomic regions) that preserve the Hi-C graph second-order proximity. Such latent variables summarize spatial information in Hi-C data, and we feed them in addition to existing 1D functional features to the Segway, a genome annotation method, to infer domain states. We show that our labels distinguish a combination of the spatial and functional states of the genomic regions, for example, loci locating in the nucleus interior can be furthermore clustered into significantly and moderately expressed domains. We also found the importance of each of the spatial and functional features to explain different cell activities including replication timing and gene expression profile, and how coupling two feature types improve the prediction of such activities. Finally, we showed that incorporating spatial features allow finding domain types, which are co-regulated even in large genomic distance from each other. Our framework can be generalized to aggregate different 1D genomic assays and 3D interactions from Hi-C to find the mechanisms behind the association of genome 3D structure and epigenetic profile

    Mobile app recommendations using deep learning and big data

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRMRecommender systems were first introduced to solve information overload problems in enterprises. Over the last decades, recommender systems have found applications in several major websites related to e-commerce, music and video streaming, travel and movie sites, social media and mobile app stores. Several methods have been proposed over the years to build recommender systems. The most popular approaches are based on collaborative filtering techniques, which leverage the similarities between consumer tastes. But the current state of the art in recommender systems is deep-learning methods, which can leverage not only item consumption data but also content, context, and user attributes. Mobile app stores generate data with Big Data properties from app consumption data, behavioral, geographic, demographic, social network and user-generated content data, which includes reviews, comments and search queries. In this dissertation, we propose a deep-learning architecture for recommender systems in mobile app stores that leverage most of these data sources. We analyze three issues related to the impact of the data sources, the impact of embedding layer pretraining and the efficiency of using Kernel methods to improve app scoring at a Big Data scale. An experiment is conducted on a Portuguese Android app store. Results suggest that models can be improved by combining structured and unstructured data. The results also suggest that embedding layer pretraining is essential to obtain good results. Some evidence is provided showing that Kernel-based methods might not be efficient when deployed in Big Data contexts

    Unsupervised Structural Embedding Methods for Efficient Collective Network Mining

    Full text link
    How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient. In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd

    Decoding Task-Based fMRI Data Using Graph Neural Networks, Considering Individual Differences

    Get PDF
    Functional magnetic resonance imaging (fMRI) is a non-invasive technology that provides high spatial resolution in determining the human brain\u27s responses and measures regional brain activity through metabolic changes in blood oxygen consumption associated with neural activity. Task fMRI provides an opportunity to analyze the working mechanisms of the human brain during specific task performance. Over the past several years, a variety of computational methods have been proposed to decode task fMRI data that can identify brain regions associated with different task stimulations. Despite the advances made by these methods, several limitations exist due to graph representations and graph embeddings transferred from task fMRI signals. In the present study, we proposed an end-to-end graph convolutional network by combining the convolutional neural network with graph representation, with three convolutional layers to classify task fMRI data from the Human Connectome Project (302 participants, 22–35 years of age). One goal of this dissertation was to improve classification performance. We applied four of the most widely used node embedding algorithms—NetMF, RandNE, Node2Vec, and Walklets—to automatically extract the structural properties of the nodes in the brain functional graph, then evaluated the performance of the classification model. The empirical results indicated that the proposed GCN framework accurately identified the brain\u27s state in task fMRI data and achieved comparable macro F1 scores of 0.978 and 0.976 with the NetMF and RandNE embedding methods, respectively. Another goal of the dissertation was to assess the effects of individual differences (i.e., gender and fluid intelligence) on classification performance. We tested the proposed GCN framework on sub-datasets divided according to gender and fluid intelligence. Experimental results indicated significant differences in the classification predictions of gender, but not high/low fluid intelligence fMRI data. Our experiments yielded promising results and demonstrated the superior ability of our GCN in modeling task fMRI data

    Learning Effective Embeddings for Dynamic Graphs and Quantifying Graph Embedding Interpretability

    Get PDF
    Graph representation learning has been a very active research area in recent years. The goal of graph representation learning is to generate representation vectors that accurately capture the structure and features of large graphs. This is especially important because the quality of the graph representation vectors will affect the performance of these vectors in downstream tasks such as node classification and link prediction. Many techniques have been proposed for generating effective graph representation vectors. These methods can be applied to both static and dynamic graphs. A static graph is a single fixed graph, while a dynamic graph evolves over time, and its nodes and edges can be added or deleted from the graph. We surveyed the graph embedding methods for both static and dynamic graphs. The majority of the existing graph embedding methods are developed for static graphs. Therefore, since most real-world graphs are dynamic, developing novel graph embedding methods suitable for evolving graphs is essential. This dissertation proposes three dynamic graph embedding models. In previous dynamic methods, the inputs were mainly adjacency matrices of graphs which are not memory efficient and may not capture the neighbourhood structure in graphs effectively. Therefore, we developed Dynnode2vec based on random walks using the static model Node2vec. Dynnode2vec generates node embeddings in each snapshot by initializing the current model with previous embedding vectors and training the model using a set of random walks obtained for nodes in the snapshot. Our second model, LSTM-Node2vec, is also based on random walks. This method leverages the LSTM model to capture the long-range dependencies between nodes in combination with Node2vec to generate node embeddings. Finally, inspired by the importance of substructures in the graphs, our third model TGR-Clique generates node embeddings by considering the effects of neighbours of a node in the maximal cliques containing the node. Experiments on real-world datasets demonstrate the effectiveness of our proposed methods in comparison to the state-of-the-art models. In addition, motivated by the lack of proper measures for quantifying and comparing graph embeddings interpretability, we proposed two interpretability measures for graph embeddings using the centrality properties of graphs
    corecore