2,216 research outputs found

    Explaining Latent Factor Models for Recommendation with Influence Functions

    Full text link
    Latent factor models (LFMs) such as matrix factorization achieve the state-of-the-art performance among various Collaborative Filtering (CF) approaches for recommendation. Despite the high recommendation accuracy of LFMs, a critical issue to be resolved is the lack of explainability. Extensive efforts have been made in the literature to incorporate explainability into LFMs. However, they either rely on auxiliary information which may not be available in practice, or fail to provide easy-to-understand explanations. In this paper, we propose a fast influence analysis method named FIA, which successfully enforces explicit neighbor-style explanations to LFMs with the technique of influence functions stemmed from robust statistics. We first describe how to employ influence functions to LFMs to deliver neighbor-style explanations. Then we develop a novel influence computation algorithm for matrix factorization with high efficiency. We further extend it to the more general neural collaborative filtering and introduce an approximation algorithm to accelerate influence analysis over neural network models. Experimental results on real datasets demonstrate the correctness, efficiency and usefulness of our proposed method

    Towards making NLG a voice for interpretable Machine Learning

    Get PDF
    I would like to acknowledge the support given to me by the Engineering and Physical Sciences Research Council (EPSRC) DTP grant number EP/N509814/1.Publisher PD

    Are contrastive explanations useful?

    Get PDF
    Funding Information: Supported by EPSRC DTP Grant Number EP/N509814/1Peer reviewedPublisher PD

    Capacity-achieving Polar-based LDGM Codes

    Full text link
    In this paper, we study codes with sparse generator matrices. More specifically, low-density generator matrix (LDGM) codes with a certain constraint on the weight of all the columns in the generator matrix are considered. In this paper, it is first shown that when a binary-input memoryless symmetric (BMS) channel WW and a constant s>0s>0 are given, there exists a polarization kernel such that the corresponding polar code is capacity-achieving and the column weights of the generator matrices are bounded from above by NsN^s. Then, a general construction based on a concatenation of polar codes and a rate-11 code, and a new column-splitting algorithm that guarantees a much sparser generator matrix is given. More specifically, for any BMS channel and any ϵ>2ϵ\epsilon > 2\epsilon^*, where ϵ0.085\epsilon^* \approx 0.085, an existence of sequence of capacity-achieving codes with all the column wights of the generator matrix upper bounded by (logN)1+ϵ(\log N)^{1+\epsilon} is shown. Furthermore, coding schemes for BEC and BMS channels, based on a second column-splitting algorithm are devised with low-complexity decoding that uses successive-cancellation. The second splitting algorithm allows for the use of a low-complexity decoder by preserving the reliability of the bit-channels observed by the source bits, and by increasing the code block length. In particular, for any BEC and any λ>λ=0.5+ϵ\lambda >\lambda^* = 0.5+\epsilon^*, the existence of a sequence of capacity-achieving codes where all the column wights of the generator matrix are bounded from above by (logN)2λ(\log N)^{2\lambda} and with decoding complexity O(NloglogN)O(N\log \log N) is shown. The existence of similar capacity-achieving LDGM codes with low-complexity decoding is shown for any BMS channel, and for any λ>λ0.631\lambda >\lambda^{\dagger} \approx 0.631.Comment: arXiv admin note: text overlap with arXiv:2001.1198

    High throughput protein-protein interaction data: clues for the architecture of protein complexes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput techniques are becoming widely used to study protein-protein interactions and protein complexes on a proteome-wide scale. Here we have explored the potential of these techniques to accurately determine the constituent proteins of complexes and their architecture within the complex.</p> <p>Results</p> <p>Two-dimensional representations of the 19S and 20S proteasome, mediator, and SAGA complexes were generated and overlaid with high quality pairwise interaction data, core-module-attachment classifications from affinity purifications of complexes and predicted domain-domain interactions. Pairwise interaction data could accurately determine the members of each complex, but was unexpectedly poor at deciphering the topology of proteins in complexes. Core and module data from affinity purification studies were less useful for accurately defining the member proteins of these complexes. However, these data gave strong information on the spatial proximity of many proteins. Predicted domain-domain interactions provided some insight into the topology of proteins within complexes, but was affected by a lack of available structural data for the co-activator complexes and the presence of shared domains in paralogous proteins.</p> <p>Conclusion</p> <p>The constituent proteins of complexes are likely to be determined with accuracy by combining data from high-throughput techniques. The topology of some proteins in the complexes will be able to be clearly inferred. We finally suggest strategies that can be employed to use high throughput interaction data to define the membership and understand the architecture of proteins in novel complexes.</p
    corecore