61 research outputs found

    Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings

    Get PDF
    [Abstract] Nowadays, item recommendation is an increasing concern for many companies. Users tend to be more reactive than proactive for solving information needs. Recommendation accuracy became the most studied aspect of the quality of the suggestions. However, novel and diverse suggestions also contribute to user satisfaction. Unfortunately, it is common to harm those two aspects when optimizing recommendation accuracy. In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy.This work was supported by project RTI2018-093336-B-C22 (MCIU & ERDF), project GPC ED431B 2019/03 (Xunta de Galicia & ERDF) and accreditation ED431G 2019/01 (Xunta de Galicia & ERDF). The first author also acknowledges the support of grant FPU17/03210 (MCIU)Xunta de Galicia; ED431B 2019/03Xunta de Galicia; ED431G 2019/0

    Writing Science, Compiling Science: The Coruña Corpus of English Scientific Writing

    Get PDF
    [Abstract] The Coruña Corpus: A Collection of Samples for the Historical Study of English Scientific Writing is a project on which the MUSTE Group has been working since 2003 in the University of A Coruña (Spain). It has been designed as a tool for the study of language change in English scientific writing in general as well as within the different scientific disciplines. Its purpose is to facilitate investigation at all linguistic levels, though, in principle, phonology is not included among our intended research topics. A rough definition of our corpus would say it contains English scientific texts other than medical produced between 1600 and 1900. In order to retrieve information from the compiled data, we decided to create a corpus management tool. Loosely speaking the Coruña Corpus Tool (CCT) is an Information Retrieval (IR) system where the indexed textual repository is the set of compiled documents that constitutes the CC

    Probabilistic collaborative filtering with negative cross entropy

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in RecSys '13 Proceedings of the 7th ACM conference on Recommender systems, http://dx.doi.org/10.1145/2507157.2507191.Relevance-Based Language Models are an effective IR approach which explicitly introduces the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models have shown to achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. In this paper we propose a novel adaptation of this language modeling approach to rating-based Collaborative Filtering. In a memory-based approach, we apply the model to the formation of user neighbourhoods, and the generation of recommendations based on such neighbourhoods. We report experimental results where our method outperforms other standard memory-based algorithms in terms of ranking precision.This work was funded by Secretaría de Estado de Investigación, Desarrollo e Innovación from the Spanish Government under projects TIN2012-33867 and TIN2011-28538-C02

    Relevance-based language models : new estimations and applications

    Get PDF
    [Abstratc] Relevance-Based Language Models introduced in the Language Modelling framework the concept of relevance, which is explicit in other retrieval models such as the Probabilistic models. Relevance Models have been mainly used for a specific task within Information Retrieval called Pseudo-Relevance Feedback, a kind of local query expansion technique where relevance is assumed over a top of documents from the initial retrieval and where those documents are used to select expansion terms for the original query and produce a, hopefully more effective, second retrieval. In this thesis we investigate some new estimations for Relevance Models for both Pseudo-Relevance Feedback and other tasks beyond retrieval, particularly, constrained text clustering and item recommendation in Recommender Systems. We study the benefits of our proposals for those tasks in comparison with existing estimations. This new modellings are able not only to improve the effectiveness of the existing estimations and methods but also to outperform their robustness, a critical factor when dealing with Pseudo-Relevance Feedback methods. These objectives are pursued by different means: promoting divergent terms in the estimation of the Relevance Models, presenting new cluster-based retrieval models, introducing new methods for automatically determine the size of the pseudo-relevant set on a query-basis, and originally producing new modellings under the Relevance-Based Language Modelling framework for the constrained text clustering and the item recommendation problems

    Designing an Open Source Virtual Assistant

    Get PDF
    [Abstract] A chatbot is a type of agent that allows people to interact with an information repository using natural language. Nowadays, chatbots have been incorporated in the form of conversational assistants on the most important mobile and desktop platforms. In this article, we present our design of an assistant developed with open-source and widely used components. Our proposal covers the process end-to-end, from information gathering and processing to visual and speech-based interaction. We have deployed a proof of concept over the website of our Computer Science Faculty.This work was supported by projects RTI2018-093336-B-C22 (MCIU & ERDF) and GPC ED431B 2019/03 (Xunta de Galicia & ERDF). Also, this work has received financial support from CITIC, Centro de Investigación del Sistema universitario de Galicia, which is financial supported by Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the ERDF (80%) and Secretaría Xeral de Universidades (20%), (Ref ED431G 2019/01)Xunta de Galicia; ED431B 2019/03Xunta de Galicia; ED431G2019/0

    Experimental Analysis of the Relevance of Features and Effects on Gender Classification Models for Social Media Author Profiling

    Get PDF
    [Abstract] Automatic user profiling from social networks has become a popular task due to its commercial applications (targeted advertising, market studies...). Automatic profiling models infer demographic characteristics of social network users from their generated content or interactions. Users’ demographic information is also precious for more social worrying tasks such as automatic early detection of mental disorders. For this type of users’ analysis tasks, it has been shown that the way how they use language is an important indicator which contributes to the effectiveness of the models. Therefore, we also consider that for identifying aspects such as gender, age or user’s origin, it is interesting to consider the use of the language both from psycho-linguistic and semantic features. A good selection of features will be vital for the performance of retrieval, classification, and decision-making software systems. In this paper, we will address gender classification as a part of the automatic profiling task. We show an experimental analysis of the performance of existing gender classification models based on external corpus and baselines for automatic profiling. We analyse in-depth the influence of the linguistic features in the classification accuracy of the model. After that analysis, we have put together a feature set for gender classification models in social networks with an accuracy performance above existing baselines.This work was supported by projects RTI2018-093336-B-C21, RTI2018-093336-B-C22 (Ministerio de Ciencia e Innvovacion & ERDF) and the financial support supplied by the Conselleria de Educacion, Universidade e Formacion Profesional (accreditation 2019-2022 ED431G/01, ED431B 2019/03) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruna as a Research Center of the Galician University System.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431B 2019/0

    Relevance-based language modelling for recommender systems

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal Information Processing and Management: an International Journal. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal Information Processing and Management: an International Journal, 49, 4, (2013) DOI: 10.1016/j.ipm.2013.03.001Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. On the other hand, the field of recommender systems is a fertile research area where users are provided with personalised recommendations in several applications. In this paper, we propose an adaptation of the Relevance Modelling framework to effectively suggest recommendations to a user. We also propose a probabilistic clustering technique to perform the neighbour selection process as a way to achieve a better approximation of the set of relevant items in the pseudo relevance feedback process. These techniques, although well known in the Information Retrieval field, have not been applied yet to recommender systems, and, as the empirical evaluation results show, both proposals outperform individually several baseline methods. Furthermore, by combining both approaches even larger effectiveness improvements are achieved.This work was funded by Secretaría de Estado de Investigación, Desarrollo e Innovación from the Spanish Government under Projects TIN2012-33867 and TIN2011-28538-C02

    Priors for Diversity and Novelty on Neural Recommender Systems

    Get PDF
    [Abstract] PRIN is a neural based recommendation method that allows the incorporation of item prior information into the recommendation process. In this work we study how the system behaves in terms of novelty and diversity under different configurations of item prior probability estimations. Our results show the versatility of the framework and how its behavior can be adapted to the desired properties, whether accuracy is preferred or diversity and novelty are the desired properties, or how a balance can be achieved with the proper selection of prior estimations.Ministerio de Ciencia, Innovación y Universidades; RTI2018-093336-B-C22Xunta de Galicia; GPC ED431B 2019/03Xunta de Galicia; ED431G/01Ministerio de Ciencia, Innovación y Universidades; FPU17/03210Ministerio de Ciencia, Innovación y Universidades; FPU014/0172

    Keyword Embeddings for Query Suggestion

    Get PDF
    Nowadays, search engine users commonly rely on query suggestions to improve their initial inputs. Current systems are very good at recommending lexical adaptations or spelling corrections to users' queries. However, they often struggle to suggest semantically related keywords given a user's query. The construction of a detailed query is crucial in some tasks, such as legal retrieval or academic search. In these scenarios, keyword suggestion methods are critical to guide the user during the query formulation. This paper proposes two novel models for the keyword suggestion task trained on scientific literature. Our techniques adapt the architecture of Word2Vec and FastText to generate keyword embeddings by leveraging documents' keyword co-occurrence. Along with these models, we also present a specially tailored negative sampling approach that exploits how keywords appear in academic publications. We devise a ranking-based evaluation methodology following both known-item and ad-hoc search scenarios. Finally, we evaluate our proposals against the state-of-the-art word and sentence embedding models showing considerable improvements over the baselines for the tasks
    corecore