4,959 research outputs found

    Sparse Allreduce: Efficient Scalable Communication for Power-Law Data

    Full text link
    Many large datasets exhibit power-law statistics: The web graph, social networks, text data, click through data etc. Their adjacency graphs are termed natural graphs, and are known to be difficult to partition. As a consequence most distributed algorithms on these graphs are communication intensive. Many algorithms on natural graphs involve an Allreduce: a sum or average of partitioned data which is then shared back to the cluster nodes. Examples include PageRank, spectral partitioning, and many machine learning algorithms including regression, factor (topic) models, and clustering. In this paper we describe an efficient and scalable Allreduce primitive for power-law data. We point out scaling problems with existing butterfly and round-robin networks for Sparse Allreduce, and show that a hybrid approach improves on both. Furthermore, we show that Sparse Allreduce stages should be nested instead of cascaded (as in the dense case). And that the optimum throughput Allreduce network should be a butterfly of heterogeneous degree where degree decreases with depth into the network. Finally, a simple replication scheme is introduced to deal with node failures. We present experiments showing significant improvements over existing systems such as PowerGraph and Hadoop

    Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking

    Full text link
    Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. The pair-wise approach for bi-partite ranking construct a quadratic number of pairs to solve the problem, which is infeasible for large-scale data sets. The point-wise approach, albeit more efficient, often results in inferior performance. That is, it is difficult to conduct bipartite ranking accurately and efficiently at the same time. In this paper, we develop a novel active sampling scheme within the pair-wise approach to conduct bipartite ranking efficiently. The scheme is inspired from active learning and can reach a competitive ranking performance while focusing only on a small subset of the many pairs during training. Moreover, we propose a general Combined Ranking and Classification (CRC) framework to accurately conduct bipartite ranking. The framework unifies point-wise and pair-wise approaches and is simply based on the idea of treating each instance point as a pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that the proposed algorithm of Active Sampling within CRC, when coupled with a linear Support Vector Machine, usually outperforms state-of-the-art point-wise and pair-wise ranking approaches in terms of both accuracy and efficiency.Comment: a shorter version was presented in ACML 201

    Positive Definite Kernels in Machine Learning

    Full text link
    This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions {k(x,⋅),x∈X}\{k(x,\cdot),x\in\mathcal{X}\} associated with a kernel kk defined on a space X\mathcal{X}. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.Comment: draft. corrected a typo in figure

    Training Support Vector Machines Using Frank-Wolfe Optimization Methods

    Full text link
    Training a Support Vector Machine (SVM) requires the solution of a quadratic programming problem (QP) whose computational complexity becomes prohibitively expensive for large scale datasets. Traditional optimization methods cannot be directly applied in these cases, mainly due to memory restrictions. By adopting a slightly different objective function and under mild conditions on the kernel used within the model, efficient algorithms to train SVMs have been devised under the name of Core Vector Machines (CVMs). This framework exploits the equivalence of the resulting learning problem with the task of building a Minimal Enclosing Ball (MEB) problem in a feature space, where data is implicitly embedded by a kernel function. In this paper, we improve on the CVM approach by proposing two novel methods to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast method to approximate the solution of a MEB problem. In contrast to CVMs, our algorithms do not require to compute the solutions of a sequence of increasingly complex QPs and are defined by using only analytic optimization steps. Experiments on a large collection of datasets show that our methods scale better than CVMs in most cases, sometimes at the price of a slightly lower accuracy. As CVMs, the proposed methods can be easily extended to machine learning problems other than binary classification. However, effective classifiers are also obtained using kernels which do not satisfy the condition required by CVMs and can thus be used for a wider set of problems

    Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus

    Full text link
    In Web search, entity-seeking queries often trigger a special Question Answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short `telegraphic' keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8,000 queries with diverse query syntax, we see 5--16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.Comment: Accepted to Information Retrieval Journa

    Antifragility = Elasticity + Resilience + Machine Learning: Models and Algorithms for Open System Fidelity

    Full text link
    We introduce a model of the fidelity of open systems - fidelity being interpreted here as the compliance between corresponding figures of interest in two separate but communicating domains. A special case of fidelity is given by real-timeliness and synchrony, in which the figure of interest is the physical and the system's notion of time. Our model covers two orthogonal aspects of fidelity, the first one focusing on a system's steady state and the second one capturing that system's dynamic and behavioural characteristics. We discuss how the two aspects correspond respectively to elasticity and resilience and we highlight each aspect's qualities and limitations. Finally we sketch the elements of a new model coupling both of the first model's aspects and complementing them with machine learning. Finally, a conjecture is put forward that the new model may represent a first step towards compositional criteria for antifragile systems.Comment: Preliminary version submitted to the 1st International Workshop "From Dependable to Resilient, from Resilient to Antifragile Ambients and Systems" (ANTIFRAGILE 2014), https://sites.google.com/site/resilience2antifragile

    Supervised Random Walks: Predicting and Recommending Links in Social Networks

    Full text link
    Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction

    Analysis of a Custom Support Vector Machine for Photometric Redshift Estimation and the Inclusion of Galaxy Shape Information

    Get PDF
    Aims: We present a custom support vector machine classification package for photometric redshift estimation, including comparisons with other methods. We also explore the efficacy of including galaxy shape information in redshift estimation. Support vector machines, a type of machine learning, utilize optimization theory and supervised learning algorithms to construct predictive models based on the information content of data in a way that can treat different input features symmetrically. Methods: The custom support vector machine package we have developed is designated SPIDERz and made available to the community. As test data for evaluating performance and comparison with other methods, we apply SPIDERz to four distinct data sets: 1) the publicly available portion of the PHAT-1 catalog based on the GOODS-N field with spectroscopic redshifts in the range z<3.6z < 3.6, 2) 14365 galaxies from the COSMOS bright survey with photometric band magnitudes, morphology, and spectroscopic redshifts inside z<1.4z < 1.4, 3) 3048 galaxies from the overlap of COSMOS photometry and morphology with 3D-HST spectroscopy extending to z<3.9z < 3.9, and 4) 2612 galaxies with five-band photometric magnitudes and morphology from the All-wavelength Extended Groth Strip International Survey and z<1.57z < 1.57. Results: We find that SPIDER-z achieves results competitive with other empirical packages on the PHAT-1 data, and performs quite well in estimating redshifts with the COSMOS and AEGIS data, including in the cases of a large redshift range (0<z<3.90 < z < 3.9). We also determine from analyses with both the COSMOS and AEGIS data that the inclusion of morphological information does not have a statistically significant benefit for photometric redshift estimation with the techniques employed here.Comment: Submitted to A&A, 11 pages, 10 figures, 1 table, updated to version in revisio
    • …
    corecore