3,717 research outputs found
Recommender system to support comprehensive exploration of large scale scientific datasets
Bases de dados de entidades científicas, como compostos químicos, doenças e objetos astronómicos, têm crescido em tamanho e complexidade, chegando a milhares de milhões de itens por base de dados. Os investigadores precisam de ferramentas novas e inovadoras para auxiliar na escolha desses itens. Este trabalho propõe o uso de Sistemas de Recomendação para auxiliar os investigadores a encontrar itens de interesse. Identificamos como um dos maiores desafios para a aplicação de sistemas de recomendação em áreas científicas a falta de conjuntos de dados padronizados e de acesso aberto com informações sobre as preferências dos utilizadores. Para superar esse desafio, desenvolvemos uma metodologia denominada LIBRETTI - Recomendação Baseada em Literatura de Itens Científicos, cujo objetivo é a criação de conjuntos de dados , relacionados com campos científicos. Estes conjuntos de dados são criados com base no principal recurso de conhecimento que a Ciência possui: a literatura científica. A metodologia LIBRETTI permitiu o desenvolvimento de novos algoritmos de recomendação específicos para vários campos científicos. Além do LIBRETTI, as principais contribuições desta tese são conjuntos de dados de recomendação padronizados nas áreas de Astronomia, Química e Saúde (relacionado com a doença COVID-19), um sistema de recomendação semântica híbrido para compostos químicos em conjuntos de dados de grande escala, uma abordagem híbrida baseada no enriquecimento sequencial (SeEn) para recomendações sequenciais, um pipeline baseado em semântica de vários campos para recomendar entidades biomédicas relacionadas com a doença COVID-19.Databases for scientific entities, such as chemical compounds, diseases and astronomical objects, are growing in size and complexity, reaching billions of items per database. Researchers need new and innovative tools for assisting the choice of these items. This work proposes the use of Recommender Systems approaches for helping researchers to find items of interest. We identified as one of the major challenges for applying RS in scientific fields the lack of standard and open-access datasets with information about the preferences of the users.
To overcome this challenge, we developed a methodology called LIBRETTI - LIterature Based RecommEndaTion of scienTific Items, whose goal is to create datasets related to scientific fields. These datasets are created based on scientific literature, the major resource of knowledge that Science has. LIBRETTI methodology allowed the development and testing of new recommender algorithms specific for each field. Besides LIBRETTI, the main contributions of this thesis are standard and sequence-aware recommendation datasets in the fields of Astronomy, Chemistry, and Health (related to COVID-19 disease), a hybrid semantic recommender system for chemical compounds in large-scale datasets, a hybrid approach based on sequential enrichment (SeEn) for sequence-aware recommendations, a multi-field semantic-based pipeline for recommending biomedical entities related to COVID-19 disease
Scalable and interpretable product recommendations via overlapping co-clustering
We consider the problem of generating interpretable recommendations by
identifying overlapping co-clusters of clients and products, based only on
positive or implicit feedback. Our approach is applicable on very large
datasets because it exhibits almost linear complexity in the input examples and
the number of co-clusters. We show, both on real industrial data and on
publicly available datasets, that the recommendation accuracy of our algorithm
is competitive to that of state-of-art matrix factorization techniques. In
addition, our technique has the advantage of offering recommendations that are
textually and visually interpretable. Finally, we examine how to implement our
technique efficiently on Graphical Processing Units (GPUs).Comment: In IEEE International Conference on Data Engineering (ICDE) 201
Recommended from our members
Capturing and Exploiting Citation Knowledge for the Recommendation of Scientific Publications
With the continuous growth of scientific literature, it is becoming increasingly challenging to discover relevant scientific publications from the plethora of available academic digital libraries. Despite the current scale, important efforts have been achieved towards the research and development of academic search engines, reference management tools, review management platforms, scientometrics systems, and recommender systems that help finding a variety of relevant scientific items, such as publications, books, researchers, grants and events, among others.
This thesis focuses on recommender systems for scientific publications. Existing systems do not always provide the most relevant scientific publications to users, despite they are present in the recommendation space. A common limitation is the lack of access to the full content of the publications when designing the recommendation methods. Solutions are largely based on the exploitation of metadata (e.g., titles, abstracts, lists of references, etc.), but rarely with the text of the publications. Another important limitation is the lack of time awareness. Existing works have not addressed the important scenario of recommending the most recent publications to users, due to the challenge of recommending items for which no ratings (i.e., user preferences) have been yet provided. The lack of evaluation benchmarks also limits the evolution and progress of the field.
This thesis investigates the use of fine-grained forms of citation knowledge, extracted from the full textual content of scientific publications, to enhance recommendations: citation proximity, citation context, citation section, citation graph and citation intention. We design and develop new recommendation methods that incorporate such knowledge, individually and in combination.
By conducting offline evaluations, as well as user studies, we show how the use of citation knowledge does help enhancing the performance of existing recommendation methods when addressing two key tasks: (i) recommending scientific publications for a given work, and (ii) recommending recent scientific publications to a user. Two novel evaluation benchmarks have also been generated and made available for the scientific community
Recommended from our members
Capturing and Exploiting Citation Knowledge for Recommending Recently Published Papers
With the continuous growth of scientific literature, discovering relevant academic papers for a researcher has become a challenging task, especially when looking for the latest, most recent papers. In this case, traditional collaborative filtering systems are ineffective, since they are unable to recommend items not previously seen, rated or cited. In this paper, we explore the potential of exploiting citation knowledge to provide a given user with relevant suggestions about recent scientific publications. A novel hybrid recommendation method that encapsulates such citation knowledge is proposed. Experimental results show improvements over baseline methods, evidencing benefits of using citation knowledge to recommend recently published papers in a personalised way. Moreover, as a result of our work, we also provide a unique dataset that, differently to previous corpora, contains detailed paper citation information
A Systematic Literature Review of Linked Data-based Recommender Systems
Recommender Systems (RS) are software tools that use analytic technologies to suggest different items of interest to an end user. Linked Data is a set of best practices for publishing and connecting structured data on the Web. This paper presents a systematic literature review to summarize the state of the art in recommender systems that use structured data published as Linked Data for providing recommendations of items from diverse domains. It considers the most relevant research problems addressed and classifies RS according to how Linked Data has been used to provide recommendations. Furthermore, it analyzes contributions, limitations, application domains, evaluation techniques, and directions proposed for future research. We found that there are still many open challenges with regard to RS based on Linked Data in order to be efficient for real applications. The main ones are personalization of recommendations; use of more datasets considering the heterogeneity introduced; creation of new hybrid RS for adding information; definition of more advanced similarity measures that take into account the large amount of data in Linked Data datasets; and implementation of testbeds to study evaluation techniques and to assess the accuracy scalability and computational complexity of RS
Biases in scholarly recommender systems: impact, prevalence, and mitigation
We create a simulated financial market and examine the effect of different levels of active and passive investment on fundamental market efficiency. In our simulated market, active, passive, and random investors interact with each other through issuing orders. Active and passive investors select their portfolio weights by optimizing Markowitz-based utility functions. We find that higher fractions of active investment within a market lead to an increased fundamental market efficiency. The marginal increase in fundamental market efficiency per additional active investor is lower in markets with higher levels of active investment. Furthermore, we find that a large fraction of passive investors within a market may facilitate technical price bubbles, resulting in market failure. By examining the effect of specific parameters on market outcomes, we find that that lower transaction costs, lower individual forecasting errors of active investors, and less restrictive portfolio constraints tend to increase fundamental market efficiency in the market
- …