    Robust Federated Training via Collaborative Machine Teaching using Trusted Instances

    Federated learning performs distributed model training using local data hosted by agents. It shares only model parameter updates for iterative aggregation at the server. Although it is privacy-preserving by design, federated learning is vulnerable to noise corruption of local agents, as demonstrated in the previous study on adversarial data poisoning threat against federated learning systems. Even a single noise-corrupted agent can bias the model training. In our work, we propose a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers, to improve robustness of the federated training process against local data corruption. We assume that each local agent (teacher) have the resources to verify a small portions of trusted instances, which may not by itself be adequate for learning. In the proposed collaborative machine teaching method, these trusted instances guide the distributed agents to jointly select a compact while informative training subset from data hosted by their own. Simultaneously, the agents learn to add changes of limited magnitudes into the selected data instances, in order to improve the testing performances of the federally trained model despite of the training data corruption. Experiments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appropriate changes to the labels. Our algorithm is a step toward trustworthy machine learning

    Graph Embedding with Rich Information through Heterogeneous Network

    Graph embedding has attracted increasing attention due to its critical application in social network analysis. Most existing algorithms for graph embedding only rely on the typology information and fail to use the copious information in nodes as well as edges. As a result, their performance for many tasks may not be satisfactory. In this paper, we proposed a novel and general framework of representation learning for graph with rich text information through constructing a bipartite heterogeneous network. Specially, we designed a biased random walk to explore the constructed heterogeneous network with the notion of flexible neighborhood. The efficacy of our method is demonstrated by extensive comparison experiments with several baselines on various datasets. It improves the Micro-F1 and Macro-F1 of node classification by 10% and 7% on Cora dataset.Comment: 9 pages, 7 figures, 4 table

    Coarse Grained Exponential Variational Autoencoders

    Variational autoencoders (VAE) often use Gaussian or category distribution to model the inference process. This puts a limit on variational learning because this simplified assumption does not match the true posterior distribution, which is usually much more sophisticated. To break this limitation and apply arbitrary parametric distribution during inference, this paper derives a \emph{semi-continuous} latent representation, which approximates a continuous density up to a prescribed precision, and is much easier to analyze than its continuous counterpart because it is fundamentally discrete. We showcase the proposition by applying polynomial exponential family distributions as the posterior, which are universal probability density function generators. Our experimental results show consistent improvements over commonly used VAE models

    Weakly-paired Cross-Modal Hashing

    Hashing has been widely adopted for large-scale data retrieval in many domains, due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across modalities are readily available. This assumption is unrealistic in practical applications. In addition, these methods generally require the same number of samples across different modalities, which restricts their flexibility. We propose a flexible cross-modal hashing approach (Flex-CMH) to learn effective hashing codes from weakly-paired data, whose correspondence across modalities are partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the local structure of each cluster, and thus to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence, it jointly optimizes in a unified objective function the potential correspondence, the cross-modal hashing functions derived from the correspondence, and a hashing quantitative loss. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions, and to reinforce the reciprocal effects of the two objectives. Experiments on publicly multi-modal datasets show that FlexCMH achieves significantly better results than state-of-the-art methods, and it indeed offers a high degree of flexibility for practical cross-modal hashing tasks

    Multi-View Multiple Clustering

    Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for single-view data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple co-clustering on multi-view data and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study shows that MVMC (MVMCC) can exploit multi-view data to generate multiple high-quality and diverse clusterings (co-clusterings), with superior performance to the state-of-the-art methods.Comment: 7 pages, 5 figures, uses ijcai19.st

    Addressing Class-Imbalance Problem in Personalized Ranking

    Pairwise ranking models have been widely used to address recommendation problems. The basic idea is to learn the rank of users' preferred items through separating items into \emph{positive} samples if user-item interactions exist, and \emph{negative} samples otherwise. Due to the limited number of observable interactions, pairwise ranking models face serious \emph{class-imbalance} issues. Our theoretical analysis shows that current sampling-based methods cause the vertex-level imbalance problem, which makes the norm of learned item embeddings towards infinite after a certain training iterations, and consequently results in vanishing gradient and affects the model inference results. We thus propose an efficient \emph{\underline{Vi}tal \underline{N}egative \underline{S}ampler} (VINS) to alleviate the class-imbalance issue for pairwise ranking model, in particular for deep learning models optimized by gradient methods. The core of VINS is a bias sampler with reject probability that will tend to accept a negative candidate with a larger degree weight than the given positive item. Evaluation results on several real datasets demonstrate that the proposed sampling method speeds up the training procedure 30\% to 50\% for ranking models ranging from shallow to deep, while maintaining and even improving the quality of ranking results in top-N item recommendation.Comment: Preprin

    GESF: A Universal Discriminative Mapping Mechanism for Graph Representation Learning

    Graph embedding is a central problem in social network analysis and many other applications, aiming to learn the vector representation for each node. While most existing approaches need to specify the neighborhood and the dependence form to the neighborhood, which may significantly degrades the flexibility of representation, we propose a novel graph node embedding method (namely GESF) via the set function technique. Our method can 1) learn an arbitrary form of representation function from neighborhood, 2) automatically decide the significance of neighbors at different distances, and 3) be applied to heterogeneous graph embedding, which may contain multiple types of nodes. Theoretical guarantee for the representation capability of our method has been proved for general homogeneous and heterogeneous graphs and evaluation results on benchmark data sets show that the proposed GESF outperforms the state-of-the-art approaches on producing node vectors for classification tasks.Comment: 18 page

    Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data

    This paper carries out a large dimensional analysis of a variation of kernel ridge regression that we call \emph{centered kernel ridge regression} (CKRR), also known in the literature as kernel ridge regression with offset. This modified technique is obtained by accounting for the bias in the regression problem resulting in the old kernel ridge regression but with \emph{centered} kernels. The analysis is carried out under the assumption that the data is drawn from a Gaussian distribution and heavily relies on tools from random matrix theory (RMT). Under the regime in which the data dimension and the training size grow infinitely large with fixed ratio and under some mild assumptions controlling the data statistics, we show that both the empirical and the prediction risks converge to a deterministic quantities that describe in closed form fashion the performance of CKRR in terms of the data statistics and dimensions. Inspired by this theoretical result, we subsequently build a consistent estimator of the prediction risk based on the training data which allows to optimally tune the design parameters. A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk. This insight is validated with both synthetic and real data.Comment: Submitted to IEEE Transactions on Signal Processin

    Tracking Influential Nodes in Time-Decaying Dynamic Interaction Networks

    Identifying influential nodes that can jointly trigger the maximum influence spread in networks is a fundamental problem in many applications such as viral marketing, online advertising, and disease control. Most existing studies assume that social influence is static and they fail to capture the dynamics of influence in reality. In this work, we address the dynamic influence challenge by designing efficient streaming methods that can identify influential nodes from highly dynamic node interaction streams. We first propose a general time-decaying dynamic interaction network (TDN) model to model node interaction streams with the ability to smoothly discard outdated data. Based on the TDN model, we design three algorithms, i.e., SieveADN, BasicReduction, and HistApprox. SieveADN identifies influential nodes from a special kind of TDNs with efficiency. BasicReduction uses SieveADN as a basic building block to identify influential nodes from general TDNs. HistApprox significantly improves the efficiency of BasicReduction. More importantly, we theoretically show that all three algorithms enjoy constant factor approximation guarantees. Experiments conducted on various real interaction datasets demonstrate that our approach finds near-optimal solutions with speed at least 55 to 1515 times faster than baseline methods.Comment: 14 pages, 15 figure

    ActiveHNE: Active Heterogeneous Network Embedding

    Heterogeneous network embedding (HNE) is a challenging task due to the diverse node types and/or diverse relationships between nodes. Existing HNE methods are typically unsupervised. To maximize the profit of utilizing the rare and valuable supervised information in HNEs, we develop a novel Active Heterogeneous Network Embedding (ActiveHNE) framework, which includes two components: Discriminative Heterogeneous Network Embedding (DHNE) and Active Query in Heterogeneous Networks (AQHN). In DHNE, we introduce a novel semi-supervised heterogeneous network embedding method based on graph convolutional neural network. In AQHN, we first introduce three active selection strategies based on uncertainty and representativeness, and then derive a batch selection method that assembles these strategies using a multi-armed bandit mechanism. ActiveHNE aims at improving the performance of HNE by feeding the most valuable supervision obtained by AQHN into DHNE. Experiments on public datasets demonstrate the effectiveness of ActiveHNE and its advantage on reducing the query cost.Comment: Accepted to IJCAI201