6,723 research outputs found

    Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

    Full text link
    User information needs vary significantly across different tasks, and therefore their queries will also differ considerably in their expressiveness and semantics. Many studies have been proposed to model such query diversity by obtaining query types and building query-dependent ranking models. These studies typically require either a labeled query dataset or clicks from multiple users aggregated over the same document. These techniques, however, are not applicable when manual query labeling is not viable, and aggregated clicks are unavailable due to the private nature of the document collection, e.g., in email search scenarios. In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models. We first develop a hierarchical clustering algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine query types. Then, we study three query-dependent ranking models, including two neural models that leverage query type information as additional features, and one novel multi-task neural model that views query type as the label for the auxiliary query cluster prediction task. This multi-task model is trained to simultaneously rank documents and predict query types. Our experiments on tens of millions of real-world email search queries demonstrate that the proposed multi-task model can significantly outperform the baseline neural ranking models, which either do not incorporate query type information or just simply feed query type as an additional feature.Comment: CIKM 201

    DeepWalk: Online Learning of Social Representations

    Full text link
    We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1F_1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Enabling High-Dimensional Hierarchical Uncertainty Quantification by ANOVA and Tensor-Train Decomposition

    Get PDF
    Hierarchical uncertainty quantification can reduce the computational cost of stochastic circuit simulation by employing spectral methods at different levels. This paper presents an efficient framework to simulate hierarchically some challenging stochastic circuits/systems that include high-dimensional subsystems. Due to the high parameter dimensionality, it is challenging to both extract surrogate models at the low level of the design hierarchy and to handle them in the high-level simulation. In this paper, we develop an efficient ANOVA-based stochastic circuit/MEMS simulator to extract efficiently the surrogate models at the low level. In order to avoid the curse of dimensionality, we employ tensor-train decomposition at the high level to construct the basis functions and Gauss quadrature points. As a demonstration, we verify our algorithm on a stochastic oscillator with four MEMS capacitors and 184 random parameters. This challenging example is simulated efficiently by our simulator at the cost of only 10 minutes in MATLAB on a regular personal computer.Comment: 14 pages (IEEE double column), 11 figure, accepted by IEEE Trans CAD of Integrated Circuits and System

    Similarity-Aware Spectral Sparsification by Edge Filtering

    Full text link
    In recent years, spectral graph sparsification techniques that can compute ultra-sparse graph proxies have been extensively studied for accelerating various numerical and graph-related applications. Prior nearly-linear-time spectral sparsification methods first extract low-stretch spanning tree from the original graph to form the backbone of the sparsifier, and then recover small portions of spectrally-critical off-tree edges to the spanning tree to significantly improve the approximation quality. However, it is not clear how many off-tree edges should be recovered for achieving a desired spectral similarity level within the sparsifier. Motivated by recent graph signal processing techniques, this paper proposes a similarity-aware spectral graph sparsification framework that leverages efficient spectral off-tree edge embedding and filtering schemes to construct spectral sparsifiers with guaranteed spectral similarity (relative condition number) level. An iterative graph densification scheme is introduced to facilitate efficient and effective filtering of off-tree edges for highly ill-conditioned problems. The proposed method has been validated using various kinds of graphs obtained from public domain sparse matrix collections relevant to VLSI CAD, finite element analysis, as well as social and data networks frequently studied in many machine learning and data mining applications

    Efficient Optimization of Loops and Limits with Randomized Telescoping Sums

    Full text link
    We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences
    corecore