2,022 research outputs found
Measuring Similarity in Large-Scale Folksonomies
Social (or folksonomic) tagging has become a very popular way to describe content within Web 2.0 websites. Unlike\ud
taxonomies, which overimpose a hierarchical categorisation of content, folksonomies enable end-users to freely create and choose the categories (in this case, tags) that best\ud
describe some content. However, as tags are informally de-\ud
fined, continually changing, and ungoverned, social tagging\ud
has often been criticised for lowering, rather than increasing, the efficiency of searching, due to the number of synonyms, homonyms, polysemy, as well as the heterogeneity of\ud
users and the noise they introduce. To address this issue, a\ud
variety of approaches have been proposed that recommend\ud
users what tags to use, both when labelling and when looking for resources. As we illustrate in this paper, real world\ud
folksonomies are characterized by power law distributions\ud
of tags, over which commonly used similarity metrics, including the Jaccard coefficient and the cosine similarity, fail\ud
to compute. We thus propose a novel metric, specifically\ud
developed to capture similarity in large-scale folksonomies,\ud
that is based on a mutual reinforcement principle: that is,\ud
two tags are deemed similar if they have been associated to\ud
similar resources, and vice-versa two resources are deemed\ud
similar if they have been labelled by similar tags. We offer an efficient realisation of this similarity metric, and assess its quality experimentally, by comparing it against cosine similarity, on three large-scale datasets, namely Bibsonomy, MovieLens and CiteULike
Neural Collaborative Ranking
Recommender systems are aimed at generating a personalized ranked list of
items that an end user might be interested in. With the unprecedented success
of deep learning in computer vision and speech recognition, recently it has
been a hot topic to bridge the gap between recommender systems and deep neural
network. And deep learning methods have been shown to achieve state-of-the-art
on many recommendation tasks. For example, a recent model, NeuMF, first
projects users and items into some shared low-dimensional latent feature space,
and then employs neural nets to model the interaction between the user and item
latent features to obtain state-of-the-art performance on the recommendation
tasks. NeuMF assumes that the non-interacted items are inherent negative and
uses negative sampling to relax this assumption. In this paper, we examine an
alternative approach which does not assume that the non-interacted items are
necessarily negative, just that they are less preferred than interacted items.
Specifically, we develop a new classification strategy based on the widely used
pairwise ranking assumption. We combine our classification strategy with the
recently proposed neural collaborative filtering framework, and propose a
general collaborative ranking framework called Neural Network based
Collaborative Ranking (NCR). We resort to a neural network architecture to
model a user's pairwise preference between items, with the belief that neural
network will effectively capture the latent structure of latent factors. The
experimental results on two real-world datasets show the superior performance
of our models in comparison with several state-of-the-art approaches.Comment: Proceedings of the 2018 ACM on Conference on Information and
Knowledge Managemen
Semantic data mining and linked data for a recommender system in the AEC industry
Even though it can provide design teams with valuable performance insights and enhance decision-making, monitored building data is rarely reused in an effective feedback loop from operation to design. Data mining allows users to obtain such insights from the large datasets generated throughout the building life cycle. Furthermore, semantic web technologies allow to formally represent the built environment and retrieve knowledge in response to domain-specific requirements. Both approaches have independently established themselves as powerful aids in decision-making. Combining them can enrich data mining processes with domain knowledge and facilitate knowledge discovery, representation and reuse. In this article, we look into the available data mining techniques and investigate to what extent they can be fused with semantic web technologies to provide recommendations to the end user in performance-oriented design. We demonstrate an initial implementation of a linked data-based system for generation of recommendations
Developing a Prediction Model for Author Collaboration in Bioinformatics Research Using Graph Mining Techniques and Big Data Applications
Nowadays, scientific collaboration has dramatically increased due to web-based technologies, advanced communication systems, and information and scientific databases. The present study aims to provide a predictive model for author collaborations in bioinformatics research output using graph mining techniques and big data applications. The study is applied-developmental research adopting a mixed-method approach, i.e., a mix of quantitative and qualitative measures. The research population consisted of all bioinformatics research documents indexed in PubMed (n=699160). The correlations of bioinformatics articles were examined in terms of weight and strength based on article sections including title, abstract, keywords, journal title, and author affiliation using graph mining techniques and big data applications. Eventually, the prediction model of author collaboration in bioinformatics research was developed using the abovementioned tools and expert-assigned weights. The calculations and data analysis were carried out using Expert Choice, Excel, Spark, and Scala, and Python programming languages in a big data server. Accordingly, the research was conducted in three phases: 1) identifying and weighting the factors contributing to authors’ similarity measurement; 2) implementing co-authorship prediction model; and 3) integrating the first and second phases (i.e., integrating the weights obtained in the previous phases). The results showed that journal title, citation, article title, author affiliation, keywords, and abstract scored 0.374, 0.374, 0.091, 0.075, 0.055, and 0.031. Moreover, the journal title achieved the highest score in the model for the co-author recommender system. As the data in bibliometric information networks is static, it was proved remarkably effective to use content-based features for similarity measures. So that the recommender system can offer the most suitable collaboration suggestions. It is expected that the model works efficiently in other databases and provides suitable recommendations for author collaborations in other subject areas. By integrating expert opinion and systemic weights, the model can help alleviate the current information overload and facilitate collaborator lookup by authors.https://dorl.net/dor/20.1001.1.20088302.2021.19.2.1.
Editable User Profiles for Controllable Text Recommendation
Methods for making high-quality recommendations often rely on learning latent
representations from interaction data. These methods, while performant, do not
provide ready mechanisms for users to control the recommendation they receive.
Our work tackles this problem by proposing LACE, a novel concept value
bottleneck model for controllable text recommendations. LACE represents each
user with a succinct set of human-readable concepts through retrieval given
user-interacted documents and learns personalized representations of the
concepts based on user documents. This concept based user profile is then
leveraged to make recommendations. The design of our model affords control over
the recommendations through a number of intuitive interactions with a
transparent user profile. We first establish the quality of recommendations
obtained from LACE in an offline evaluation on three recommendation tasks
spanning six datasets in warm-start, cold-start, and zero-shot setups. Next, we
validate the controllability of LACE under simulated user interactions.
Finally, we implement LACE in an interactive controllable recommender system
and conduct a user study to demonstrate that users are able to improve the
quality of recommendations they receive through interactions with an editable
user profile.Comment: Accepted to SIGIR 2023; Pre-print, camera-ready to follo
MatRec: Matrix Factorization for Highly Skewed Dataset
Recommender systems is one of the most successful AI technologies applied in
the internet cooperations. Popular internet products such as TikTok, Amazon,
and YouTube have all integrated recommender systems as their core product
feature. Although recommender systems have received great success, it is well
known for highly skewed datasets, engineers and researchers need to adjust
their methods to tackle the specific problem to yield good results. Inability
to deal with highly skewed dataset usually generates hard computational
problems for big data clusters and unsatisfactory results for customers. In
this paper, we propose a new algorithm solving the problem in the framework of
matrix factorization. We model the data skewness factors in the theoretic
modeling of the approach with easy to interpret and easy to implement formulas.
We prove in experiments our method generates comparably favorite results with
popular recommender system algorithms such as Learning to Rank , Alternating
Least Squares and Deep Matrix Factorization
- …