967 research outputs found
Linear filtering reveals false negatives in species interaction data
Species interaction datasets, often represented as sparse matrices, are usually collected through observation studies targeted at identifying species interactions. Due to the extensive required sampling effort, species interaction datasets usually contain many false negatives, often leading to bias in derived descriptors. We show that a simple linear filter can be used to detect false negatives by scoring interactions based on the structure of the interaction matrices. On 180 different datasets of various sizes, sparsities and ecological interaction types, we found that on average in about 75% of the cases, a false negative interaction got a higher score than a true negative interaction. Furthermore, we show that this filter is very robust, even when the interaction matrix contains a very large number of false negatives. Our results demonstrate that unobserved interactions can be detected in species interaction datasets, even without resorting to information about the species involved
Multiple imputation for continuous variables using a Bayesian principal component analysis
We propose a multiple imputation method based on principal component analysis
(PCA) to deal with incomplete continuous data. To reflect the uncertainty of
the parameters from one imputation to the next, we use a Bayesian treatment of
the PCA model. Using a simulation study and real data sets, the method is
compared to two classical approaches: multiple imputation based on joint
modelling and on fully conditional modelling. Contrary to the others, the
proposed method can be easily used on data sets where the number of individuals
is less than the number of variables and when the variables are highly
correlated. In addition, it provides unbiased point estimates of quantities of
interest, such as an expectation, a regression coefficient or a correlation
coefficient, with a smaller mean squared error. Furthermore, the widths of the
confidence intervals built for the quantities of interest are often smaller
whilst ensuring a valid coverage.Comment: 16 page
Data Privacy Preservation in Collaborative Filtering Based Recommender Systems
This dissertation studies data privacy preservation in collaborative filtering based recommender systems and proposes several collaborative filtering models that aim at preserving user privacy from different perspectives.
The empirical study on multiple classical recommendation algorithms presents the basic idea of the models and explores their performance on real world datasets. The algorithms that are investigated in this study include a popularity based model, an item similarity based model, a singular value decomposition based model, and a bipartite graph model. Top-N recommendations are evaluated to examine the prediction accuracy.
It is apparent that with more customers\u27 preference data, recommender systems can better profile customers\u27 shopping patterns which in turn produces product recommendations with higher accuracy. The precautions should be taken to address the privacy issues that arise during data sharing between two vendors. Study shows that matrix factorization techniques are ideal choices for data privacy preservation by their nature. In this dissertation, singular value decomposition (SVD) and nonnegative matrix factorization (NMF) are adopted as the fundamental techniques for collaborative filtering to make privacy-preserving recommendations. The proposed SVD based model utilizes missing value imputation, randomization technique, and the truncated SVD to perturb the raw rating data. The NMF based models, namely iAux-NMF and iCluster-NMF, take into account the auxiliary information of users and items to help missing value imputation and privacy preservation. Additionally, these models support efficient incremental data update as well.
A good number of online vendors allow people to leave their feedback on products. It is considered as users\u27 public preferences. However, due to the connections between users\u27 public and private preferences, if a recommender system fails to distinguish real customers from attackers, the private preferences of real customers can be exposed. This dissertation addresses an attack model in which an attacker holds real customers\u27 partial ratings and tries to obtain their private preferences by cheating recommender systems. To resolve this problem, trustworthiness information is incorporated into NMF based collaborative filtering techniques to detect the attackers and make reasonably different recommendations to the normal users and the attackers. By doing so, users\u27 private preferences can be effectively protected
Fast Methods for Recovering Sparse Parameters in Linear Low Rank Models
In this paper, we investigate the recovery of a sparse weight vector
(parameters vector) from a set of noisy linear combinations. However, only
partial information about the matrix representing the linear combinations is
available. Assuming a low-rank structure for the matrix, one natural solution
would be to first apply a matrix completion on the data, and then to solve the
resulting compressed sensing problem. In big data applications such as massive
MIMO and medical data, the matrix completion step imposes a huge computational
burden. Here, we propose to reduce the computational cost of the completion
task by ignoring the columns corresponding to zero elements in the sparse
vector. To this end, we employ a technique to initially approximate the support
of the sparse vector. We further propose to unify the partial matrix completion
and sparse vector recovery into an augmented four-step problem. Simulation
results reveal that the augmented approach achieves the best performance, while
both proposed methods outperform the natural two-step technique with
substantially less computational requirements
- …