Search CORE

2,022 research outputs found

Measuring Similarity in Large-Scale Folksonomies

Author: Capra Licia
De Meo Pasquale
Ferrara Emilio
Quattrone Giovanni
Publication venue
Publication date: 01/01/2011
Field of study

Social (or folksonomic) tagging has become a very popular way to describe content within Web 2.0 websites. Unlike\ud taxonomies, which overimpose a hierarchical categorisation of content, folksonomies enable end-users to freely create and choose the categories (in this case, tags) that best\ud describe some content. However, as tags are informally de-\ud ﬁned, continually changing, and ungoverned, social tagging\ud has often been criticised for lowering, rather than increasing, the efﬁciency of searching, due to the number of synonyms, homonyms, polysemy, as well as the heterogeneity of\ud users and the noise they introduce. To address this issue, a\ud variety of approaches have been proposed that recommend\ud users what tags to use, both when labelling and when looking for resources. As we illustrate in this paper, real world\ud folksonomies are characterized by power law distributions\ud of tags, over which commonly used similarity metrics, including the Jaccard coefﬁcient and the cosine similarity, fail\ud to compute. We thus propose a novel metric, speciﬁcally\ud developed to capture similarity in large-scale folksonomies,\ud that is based on a mutual reinforcement principle: that is,\ud two tags are deemed similar if they have been associated to\ud similar resources, and vice-versa two resources are deemed\ud similar if they have been labelled by similar tags. We offer an efﬁcient realisation of this similarity metric, and assess its quality experimentally, by comparing it against cosine similarity, on three large-scale datasets, namely Bibsonomy, MovieLens and CiteULike

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

CogPrints Cognitive Sciences Eprint Archive

Neural Collaborative Ranking

Author: Cao Yi
Song Bo
Xu Congfu
Yang Xin
Publication venue
Publication date: 14/08/2018
Field of study

Recommender systems are aimed at generating a personalized ranked list of items that an end user might be interested in. With the unprecedented success of deep learning in computer vision and speech recognition, recently it has been a hot topic to bridge the gap between recommender systems and deep neural network. And deep learning methods have been shown to achieve state-of-the-art on many recommendation tasks. For example, a recent model, NeuMF, first projects users and items into some shared low-dimensional latent feature space, and then employs neural nets to model the interaction between the user and item latent features to obtain state-of-the-art performance on the recommendation tasks. NeuMF assumes that the non-interacted items are inherent negative and uses negative sampling to relax this assumption. In this paper, we examine an alternative approach which does not assume that the non-interacted items are necessarily negative, just that they are less preferred than interacted items. Specifically, we develop a new classification strategy based on the widely used pairwise ranking assumption. We combine our classification strategy with the recently proposed neural collaborative filtering framework, and propose a general collaborative ranking framework called Neural Network based Collaborative Ranking (NCR). We resort to a neural network architecture to model a user's pairwise preference between items, with the belief that neural network will effectively capture the latent structure of latent factors. The experimental results on two real-world datasets show the superior performance of our models in comparison with several state-of-the-art approaches.Comment: Proceedings of the 2018 ACM on Conference on Information and Knowledge Managemen

arXiv.org e-Print Archive

Crossref

Semantic data mining and linked data for a recommender system in the AEC industry

Author: Jensen Rasmus Lund
Pauwels Pieter
Petrova Ekaterina
Svidt Kjeld
Publication venue: 'European Council for Computing in Construction'
Publication date: 01/01/2019
Field of study

Even though it can provide design teams with valuable performance insights and enhance decision-making, monitored building data is rarely reused in an effective feedback loop from operation to design. Data mining allows users to obtain such insights from the large datasets generated throughout the building life cycle. Furthermore, semantic web technologies allow to formally represent the built environment and retrieve knowledge in response to domain-specific requirements. Both approaches have independently established themselves as powerful aids in decision-making. Combining them can enrich data mining processes with domain knowledge and facilitate knowledge discovery, representation and reuse. In this article, we look into the available data mining techniques and investigate to what extent they can be fused with semantic web technologies to provide recommendations to the end user in performance-oriented design. We demonstrate an initial implementation of a linked data-based system for generation of recommendations

Crossref

Ghent University Academic Bibliography

Archivsystem Ask23

VBN

Developing a Prediction Model for Author Collaboration in Bioinformatics Research Using Graph Mining Techniques and Big Data Applications

Author: Asemi Asefeh
Ebrahimi Fezzeh
Nezarat Amin
Shabani Ahmad
Publication venue: Regional Information Center for Science & Technology
Publication date: 05/07/2021
Field of study

Nowadays, scientific collaboration has dramatically increased due to web-based technologies, advanced communication systems, and information and scientific databases. The present study aims to provide a predictive model for author collaborations in bioinformatics research output using graph mining techniques and big data applications. The study is applied-developmental research adopting a mixed-method approach, i.e., a mix of quantitative and qualitative measures. The research population consisted of all bioinformatics research documents indexed in PubMed (n=699160). The correlations of bioinformatics articles were examined in terms of weight and strength based on article sections including title, abstract, keywords, journal title, and author affiliation using graph mining techniques and big data applications. Eventually, the prediction model of author collaboration in bioinformatics research was developed using the abovementioned tools and expert-assigned weights. The calculations and data analysis were carried out using Expert Choice, Excel, Spark, and Scala, and Python programming languages in a big data server. Accordingly, the research was conducted in three phases: 1) identifying and weighting the factors contributing to authors’ similarity measurement; 2) implementing co-authorship prediction model; and 3) integrating the first and second phases (i.e., integrating the weights obtained in the previous phases). The results showed that journal title, citation, article title, author affiliation, keywords, and abstract scored 0.374, 0.374, 0.091, 0.075, 0.055, and 0.031. Moreover, the journal title achieved the highest score in the model for the co-author recommender system. As the data in bibliometric information networks is static, it was proved remarkably effective to use content-based features for similarity measures. So that the recommender system can offer the most suitable collaboration suggestions. It is expected that the model works efficiently in other databases and provides suitable recommendations for author collaborations in other subject areas. By integrating expert opinion and systemic weights, the model can help alleviate the current information overload and facilitate collaborator lookup by authors.https://dorl.net/dor/20.1001.1.20088302.2021.19.2.1.

International Journal of Information Science and Management (IJISM)

Editable User Profiles for Controllable Text Recommendation

Author: Jasim Mahmood
McCallum Andrew
Mysore Sheshera
Zamani Hamed
Publication venue
Publication date: 09/04/2023
Field of study

Methods for making high-quality recommendations often rely on learning latent representations from interaction data. These methods, while performant, do not provide ready mechanisms for users to control the recommendation they receive. Our work tackles this problem by proposing LACE, a novel concept value bottleneck model for controllable text recommendations. LACE represents each user with a succinct set of human-readable concepts through retrieval given user-interacted documents and learns personalized representations of the concepts based on user documents. This concept based user profile is then leveraged to make recommendations. The design of our model affords control over the recommendations through a number of intuitive interactions with a transparent user profile. We first establish the quality of recommendations obtained from LACE in an offline evaluation on three recommendation tasks spanning six datasets in warm-start, cold-start, and zero-shot setups. Next, we validate the controllability of LACE under simulated user interactions. Finally, we implement LACE in an interactive controllable recommender system and conduct a user study to demonstrate that users are able to improve the quality of recommendations they receive through interactions with an editable user profile.Comment: Accepted to SIGIR 2023; Pre-print, camera-ready to follo

arXiv.org e-Print Archive

MatRec: Matrix Factorization for Highly Skewed Dataset

Author: Bertin-Mahieux T.
Cheng Heng-Tze
Guo Huifeng
Hofmann Thomas
Paterek Arkadiusz
Rendle Steffen
Rendle Steffen
Takacs Gabor
Wang Chong
Wang Hao
Xue Hong-Jian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/11/2020
Field of study

Recommender systems is one of the most successful AI technologies applied in the internet cooperations. Popular internet products such as TikTok, Amazon, and YouTube have all integrated recommender systems as their core product feature. Although recommender systems have received great success, it is well known for highly skewed datasets, engineers and researchers need to adjust their methods to tackle the specific problem to yield good results. Inability to deal with highly skewed dataset usually generates hard computational problems for big data clusters and unsatisfactory results for customers. In this paper, we propose a new algorithm solving the problem in the framework of matrix factorization. We model the data skewness factors in the theoretic modeling of the approach with easy to interpret and easy to implement formulas. We prove in experiments our method generates comparably favorite results with popular recommender system algorithms such as Learning to Rank , Alternating Least Squares and Deep Matrix Factorization

arXiv.org e-Print Archive

Crossref