    Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

    Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most of these applications involves the multiplication of a large, sparse matrix with a dense vector (SpMV). In this paper, we investigate the performance of the Xeon Phi coprocessor for SpMV. We first provide a comprehensive introduction to this new architecture and analyze its peak performance with a number of micro benchmarks. Although the design of a Xeon Phi core is not much different than those of the cores in modern processors, its large number of cores and hyperthreading capability allow many application to saturate the available memory bandwidth, which is not the case for many cutting-edge processors. Yet, our performance studies show that it is the memory latency not the bandwidth which creates a bottleneck for SpMV on this architecture. Finally, our experiments show that Xeon Phi's sparse kernel performance is very promising and even better than that of cutting-edge general purpose processors and GPUs

    Reordering matrices for optimal sparse matrix bipartitioning

    Sparse-matrix vector multiplication (SpMV) is one of the widely used and extensively studied kernels in today’s scientific computing and high-performance computing domains. The efficiency and scalability of this kernel is extensively investigated on single-core, multi-core, many-core processors and accelerators, and on distributed memory. In general, a good mapping of an application’s tasks to the processing units in a distributed environment is important since communication among these tasks is the main bottleneck on scalability. A fundamental approach to solve this problem is modeling the application via a graph/hypergraph and partitioning it. For SpMV, several graph/hypergraph models have been proposed. These approaches consider the problem as a balanced partitioning problem where the vertices (tasks) are partitioned (assigned) to the parts (processors) in a way that the total vertex weight (processor load) is balanced and the total communication incurred among the processors is minimized. The partitioning problem is NP-Hard and all the existing studies and tools use heuristics to solve the problem. For graphs, the literature on optimal partitioning contains a number of notable studies; however for hypergraphs, very little work has been done. Unfortunately, it has been shown that unlike graphs, hypergraphs can exactly model the total communication for SpMV. Recently, Pelt and Bisseling proposed a novel, purely combinatorial branch-and-bound-based approach for the sparse-matrix bipartitioning problem which can tackle relatively larger hypergraphs that were impossible to optimally partition into two by using previous methods. This work can be considered as an extension to their approach with two ideas. We propose to use; 1) matrix ordering techniques to use more information in the earlier branches of the tree, and 2) a machine learning approach to choose the best ordering based on matrix features. As our experiments on various matrices will show, these enhancements make the optimal bipartitioning process much faster

    Non-convex Optimization for Machine Learning

    A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.Comment: The official publication is available from now publishers via http://dx.doi.org/10.1561/220000005

    User-Assisted Similarity Estimation for Searching Related Web Pages

    ABSTRACT To utilize the similarity information hidden in the Web graph, we investigate the problem of adaptively retrieving related Web pages with user assistance. Given a definition of similarities between pages, it is intuitive to estimate that any similarity will propagate from page to page, inducing an implicit topical relatedness between pages. In this paper, we extract connected subgraphs from the whole graph that consists of all pairs of pages whose similarity scores are above a given threshold, and then sort the candidates of related pages by a novel rank measure which is based on the combination distances of a flexible hierarchical clustering. Moreover, due to the subjectivity of similarity values, we dynamically supply the ordering list of related pages according to a parameter adjusted by users. We show our approach effectively handles a set of pages originating from three related categories of Web hierarchies, such as Google Directory. The experiments with three similarity measures demonstrate that using in-link information is favorable while using a combination measure of in-links and out-links lowers the precision of identifying similar pages

    Learning Collective Behavior in Multi-relational Networks

    With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex. The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons

    Exploiting Latent Features of Text and Graphs

    As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

    Compact and efficient representations of graphs

    [Resumen] En esta tesis estudiamos el problema de la creación de representaciones compactas y eficientes de grafos. Proponemos nuevas estructuras para persistir y consultar grafos de diferentes dominios, prestando especial atención al diseño de soluciones eficientes para grafos generales y grafos RDF. Hemos diseñado una nueva herramienta para generar grafos a partir de fuentes de datos heterogéneas mediante un sistema de definición de reglas. Es una herramienta de propósito general y, hasta nuestro conocimiento, no existe otra herramienta de estas características en el Estado del Arte. Otra contribución de este trabajo es una representación compacta de grafos generales, que soporta el acceso eficiente a los atributos y aristas del grafo. Así mismo, hemos estudiado el problema de la distribución de grafos en un entorno paralelo, almacenados sobre estructuras compactas, y hemos propuesto nueve alternativas diferentes que han sido evaluadas experimentalmente. También hemos propuesto un nuevo índice para RDF que soporta la resolución básica de SPARQL de forma comprimida. Por último, presentamos una nueva estructura compacta para almacenar relaciones ternarias cuyo diseño se enfoca a la representación eficiente de datos RDF. Todas estas propuestas han sido experimentalmente validadas con conjuntos de datos ampliamente aceptados, obteniéndose resultados competitivos comparadas con otras alternativas del Estado del Arte.[Resumo] Na presente tese estudiamos o problema da creación de representacións compactas e eficientes de grafos. Para isto propoñemos novas estruturas para persistir e consultar grafos de diferentes dominios, facendo especial fincapé no deseño de solucións eficientes nos casos de grafos xerais e grafos RDF. Deseñamos unha nova ferramenta para a xeración de grafos a partires de fontes de datos heteroxéneas mediante un sistema de definición de regras. Trátase dunha ferramenta de propósito xeral e, até onde chega o noso coñecemento, non existe outra ferramenta semellante no Estado do Arte. Outra das contribucións do traballo é unha representación compacta de grafos xerais, con soporte para o acceso eficiente aos atributos e aristas do grafo. Así mesmo, estudiamos o problema da distribución de grafos nun contorno paralelo, almacenados sobre estruturas compactas, e propoñemos nove alternativas diferentes que foron avaliadas de xeito experimental. Propoñemos tamén un novo índice para RDF que soporta a resolución básica de SPARQL de xeito comprimido. Para rematar, presentamos unha nova estrutura compacta para almacenar relacións ternarias, cun diseño especialmente enfocado á representación eficiente de datos RDF. Todas estas propostas foron validadas experimentalmente con conxuntos de datos amplamente aceptados, obténdose resultados competitivos comparadas con outras alternativas do Estado do Arte.[Abstract] In this thesis we study the problem of creating compact and efficient representations of graphs. We propose new data structures to store and query graph data from diverse domains, paying special attention to the design of efficient solutions for attributed and RDF graphs. We have designed a new tool to generate graphs from arbitrary data through a rule definition system. It is a general-purpose solution that, to the best of our knowledge, is the first with these characteristics. Another contribution of this work is a very compact representation for attributed graphs, providing efficient access to the properties and links of the graph. We also study the problem of graph distribution on a parallel environment using compact structures, proposing nine different alternatives that are experimentally compared. We also propose a novel RDF indexing technique that supports efficient SPARQL solution in compressed space. Finally, we present a new compact structure to store ternary relationships whose design is focused on the efficient representation of RDF data. All of these proposals were experimentally evaluated with widely accepted datasets, obtaining competitive results when they are compared against other alternatives of the State of the Art

    Multilinear algebra for analyzing data with multiple linkages.

