Search CORE

10 research outputs found

Conversational Data Exploration: A Game-Changer for Designing Data Science Pipelines

Author: Buchaille Anthelme
Cerquitelli Tania
Cheval François
Espinosa-Oviedo Javier A.
Polgar Luca
Vargas-Solar Genoveva
Publication venue
Publication date: 11/11/2023
Field of study

This paper proposes a conversational approach implemented by the system Chatin for driving an intuitive data exploration experience. Our work aims to unlock the full potential of data analytics and artificial intelligence with a new generation of data science solutions. Chatin is a cutting-edge tool that democratises access to AI-driven solutions, empowering non-technical users from various disciplines to explore data and extract knowledge from it

arXiv.org e-Print Archive

Sequential Recommendation Based on Objective and Subjective Features

Author: Guo Wenhao
Li Minqiang
Tian Jin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 08/07/2023
Field of study

Nowadays, sequential recommender systems are widely used in E-commerce fields to capture consumers’ dynamic preferences in short terms. Existing transformer-based recommendation models mainly consider consumer preference for the products and some related features, such as price. However, besides such objective features, some subjective features, such as consumers’ preference for product quality, also affect consumers’ purchase decisions. In this paper, we design a Sequential Recommender system based on Objective and Subjective features (SROS). We construct subjective features by using natural language processing to analyze online consumer reviews. Then we design a feature-level multi-head self-attention to explore the interactions between objective features and subjective features and capture consumers’ dynamic preferences for them among different purchases. Experimental results on real-world datasets demonstrate the effectiveness of the proposed model

AIS Electronic Library (AISeL)

Vocabulary Evolution on the Semantic Web: From Changes to Evolution of Vocabularies and its Impact on the Data

Author: Abdel-Qader Mohammad
Publication venue
Publication date: 01/01/2020
Field of study

The main objective of the Semantic Web is to provide data on the web well-defined meaning. Vocabularies are used for modeling data in the web, provide a shared understanding of a domain and consist of a collection of types and properties. These types and properties are so-called terms. A vocabulary can import terms from other vocabularies, and data publishers use vocabulary terms for modeling data. Importing terms via vocabularies results in a Network of Linked vOcabularies (NeLO). Vocabularies are subject to change during their lifetime. When vocabularies change, the published data become a problem if they are not updated based on these changes. So far, there has been no study that analyzes vocabulary changes over time. Furthermore, it is unknown how data publishers reflect on such vocabulary changes. Ontology engineers and data publishers may not be aware of the changes in the vocabulary terms that have already happened since they occur rather rarely. This work addresses the problem of vocabulary changes and their impact on other vocabularies and the published data. We analyzed the changes of vocabularies and their reuse. We selected the most dominant vocabularies, based on their use by data publishers. Additionally, we analyzed the changes of 994 vocabularies. Furthermore, we analyzed various vocabularies to better understand by whom and how they are used in the modeled data, and how these changes are adopted in the Linked Open Data cloud. We computed the state of the NeLO from the available versions of vocabularies for over 17 years. We analyzed the static parameters of the NeLO such as its size, density, average degree, and the most important vocabularies at certain points in time. We further investigated how NeLO changes over time, specifically measuring the impact of a change in one vocabulary on others, how the reuse of terms changes, and the importance of vocabulary changes. Our results show that the vocabularies are highly static, and many of the changes occurred in annotation properties. Additionally, 16% of the existing terms are reused by other vocabularies, and some of the deprecated and deleted terms are still reused. Furthermore, most of the newly coined terms are adopted immediately. Our results show that even if the change frequency of terms is rather low, it can have a high impact on the data due to a large amount of data on the web. Moreover, due to a large number of vocabularies in the NeLO, and therefore the increase of available terms, the percentage of imported terms compared with the available ones has decreased over time. Additionally, based on the scores of the average number of exports for the vocabularies in the NeLO, some vocabularies have become more popular over time. Overall, understanding the evolution of vocabulary terms is important for ontology engineers and data publishers to avoid wrong assumptions about the data published on the web. Furthermore, it may foster a better understanding of the impact of the changes in vocabularies and how they are adopted to possibly learn from previous experience. Our results provide for the first time in-depth insights into the structure and evolution of the NeLO. Supported by proper tools exploiting the analysis of this thesis, it may help ontology engineers to identify data modeling shortcomings and assess the dependencies implied by the reusing of a specific vocabulary.Das Hauptziel des Semantic Web ist es, den Daten im Web eine klar definierte Bedeutung zu geben. Vokabulare werden zum Modellieren von Daten im Web verwendet. Vokabulare vermitteln ein gemeinsames Verständnis einer Domäne und bestehen aus einer Sammlung von Typen und Eigenschaften. Diese Typen und Eigenschaften sind sogenannte Begriffe. Ein Vokabular kann Begriffe aus anderen Vokabularen importieren, und Datenverleger verwenden die Begriffe der Vokabulare zum Modellieren von Daten. Durch das Importieren von Begriffen entsteht ein Netzwerk verknüpfter Vokabulare (NeLO). Vokabulare können sich im Laufe der Zeit ändern. Wenn sich Vokabulare ändern, kann dies zu Problemen mit bereits veröffentlichten Daten führen, falls diese nicht entsprechend angepasst werden. Bisher gibt es keine Studie, die die Veränderung der Vokabulare im Laufe der Zeit analysiert. Darüber hinaus ist nicht bekannt, inwiefern bereits veröffentlichte Daten an diese Veränderungen angepasst werden. Verantwortliche für Ontologien und Daten sind sich möglicherweise der Änderungen in den Vokabularen nicht bewusst, da solche Änderungen eher selten vorkommen. Diese Arbeit befasst sich mit dem Problem der Änderung von Vokabularen und deren Auswirkung auf andere Vokabulare sowie die Daten. Wir analysieren die Änderung von Vokabularen und deren Wiederverwendung. Für unsere Analyse haben wir diejenigen Vokabulare ausgewählt, die am häufigsten verwendet werden. Zusätzlich analysieren wir die Änderungen von 994 Vokabularen aus dem Verzeichnis "Linked Open Vocabulary". Wir analysieren die Vokabulare, um zu verstehen, von wem und wie sie in den modellierten Daten verwendet werden und inwiefern Änderungen in die Linked Open Data Cloud übernommen werden. Wir beobachten den Status von NeLO aus den verfügbaren Versionen der Vokabulare über einen Zeitraum von 17 Jahren. Wir analysieren statische Parameter von NeLO wie Größe, Dichte, Durchschnittsgrad und die wichtigsten Vokabulare zu bestimmten Zeitpunkten. Wir untersuchen weiter, wie sich NeLO mit der Zeit ändert. Insbesondere messen wir die Auswirkung einer Änderung in einem Vokabular auf andere, wie sich die Wiederverwendung von Begriffen ändert und wie wichtig Änderungen im Vokabular sind. Unsere Ergebnisse zeigen, dass die Vokabulare sehr statisch sind und viele Änderungen an sogenannten Annotations-Properties vorgenommen wurden. Darüber hinaus werden 16% der vorhandenen Begriffen von anderen Vokabularen wiederverwendet, und einige der veralteten und gelöschten Begriffe werden weiterhin wiederverwendet. Darüber hinaus werden die meisten neu erstellten Begriffe unmittelbar verwendet. Unsere Ergebnisse zeigen, dass selbst wenn die Häufigkeit von Änderungen an Vokabularen eher gering ist, so kann dies aufgrund der großen Datenmenge im Web erhebliche Auswirkungen haben. Darüber hinaus hat sich aufgrund einer großen Anzahl von Vokabularen in NeLO und damit der Zunahme der verfügbaren Begriffe der Prozentsatz der importierten Begriffe im Vergleich zu den verfügbaren Begriffen im Laufe der Zeit verringert. Basierend auf den Ergebnissen der durchschnittlichen Anzahl von Exporten für die Vokabulare in NeLO sind einige Vokabulare im Laufe der Zeit immer beliebter geworden. Insgesamt ist es für Verantwortliche für Ontologien und Daten wichtig, die Entwicklung der Vokabulare zu verstehen, um falsche Annahmen über die im Web veröffentlichten Daten zu vermeiden. Darüber hinaus ermöglichen unsere Ergebnisse ein besseres Verständnis der Auswirkungen von Änderungen in Vokabularen, sowie deren Nachnutzung, um möglicherweise aus früheren Erfahrungen zu lernen. Unsere Ergebnisse bieten erstmals detaillierte Einblicke in die Struktur und Entwicklung des Netzwerks der verknüpften Vokabularen. Unterstützt von geeigneten Tools für die Analyse in dieser Arbeit, kann es Verantwortlichen für Ontologien helfen, Mängel in der Datenmodellierung zu identifizieren und Abhängigkeiten zu bewerten, die durch die Wiederverwendung eines bestimmten Vokabulars entstehenden

MACAU: Open Access Repository of Kiel University

Data augmentation for recommender system: A semi-supervised approach using maximum margin matrix factorization

Author: Kagita Venkateswara Rao
Kumar Vikas
Pujari Arun K
Shaikh Shamal
Publication venue
Publication date: 22/06/2023
Field of study

Collaborative filtering (CF) has become a popular method for developing recommender systems (RS) where ratings of a user for new items is predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for the rating predictions, which have not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.Comment: 20 page

arXiv.org e-Print Archive

Web engineering: 19th international conference, ICWE 2019, Daejeon, South Korea, June 11-14, 2019, proceedings

Author: Bakaev Maxim
Frasincar Flavius
Ko In-Young
Publication venue: Springer International Publishing AG
Publication date: 01/01/2019
Field of study

CERN Document Server

Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation

Author: Sanchez Pérez Pablo
Publication venue
Publication date: 08/07/2021
Field of study

Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications throughout the world, the amount of data stored on the Web has been growing exponentially. In this new digital era, a large number of companies have emerged with the purpose of ltering the information available on the web and provide users with interesting items. The algorithms and models used to recommend these items are called Recommender Systems. These systems are applied to a large number of domains, from music, books, or movies to dating or Point-of-Interest (POI), which is an increasingly popular domain where users receive recommendations of di erent places when they arrive to a city. In this thesis, we focus on exploiting the use of contextual information, especially temporal and sequential data, and apply it in novel ways in both traditional and Point-of-Interest recommendation. We believe that this type of information can be used not only for creating new recommendation models but also for developing new metrics for analyzing the quality of these recommendations. In one of our rst contributions we propose di erent metrics, some of them derived from previously existing frameworks, using this contextual information. Besides, we also propose an intuitive algorithm that is able to provide recommendations to a target user by exploiting the last common interactions with other similar users of the system. At the same time, we conduct a comprehensive review of the algorithms that have been proposed in the area of POI recommendation between 2011 and 2019, identifying the common characteristics and methodologies used. Once this classi cation of the algorithms proposed to date is completed, we design a mechanism to recommend complete routes (not only independent POIs) to users, making use of reranking techniques. In addition, due to the great di culty of making recommendations in the POI domain, we propose the use of data aggregation techniques to use information from di erent cities to generate POI recommendations in a given target city. In the experimental work we present our approaches on di erent datasets belonging to both classical and POI recommendation. The results obtained in these experiments con rm the usefulness of our recommendation proposals, in terms of ranking accuracy and other dimensions like novelty, diversity, and coverage, and the appropriateness of our metrics for analyzing temporal information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones en todo el mundo, la cantidad de datos almacenados en la red ha crecido exponencialmente. En esta nueva era digital, han surgido un gran n umero de empresas con el objetivo de ltrar la informaci on disponible en la red y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos utilizados para recomendar estos art culos reciben el nombre de Sistemas de Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios, desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs, en ingl es), un dominio cada vez m as popular en el que los usuarios reciben recomendaciones de diferentes lugares cuando llegan a una ciudad. En esta tesis, nos centramos en explotar el uso de la informaci on contextual, especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa tanto en la recomendaci on cl asica como en la recomendaci on de POIs. Creemos que este tipo de informaci on puede utilizarse no s olo para crear nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas m etricas para analizar la calidad de estas recomendaciones. En una de nuestras primeras contribuciones proponemos diferentes m etricas, algunas derivadas de formulaciones previamente existentes, utilizando esta informaci on contextual. Adem as, proponemos un algoritmo intuitivo que es capaz de proporcionar recomendaciones a un usuario objetivo explotando las ultimas interacciones comunes con otros usuarios similares del sistema. Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y 2019, identi cando las caracter sticas comunes y las metodolog as utilizadas. Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking. Adem as, debido a la gran di cultad de realizar recomendaciones en el ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos para utilizar la informaci on de diferentes ciudades y generar recomendaciones de POIs en una determinada ciudad objetivo. En el trabajo experimental presentamos nuestros m etodos en diferentes conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los resultados obtenidos en estos experimentos con rman la utilidad de nuestras propuestas de recomendaci on en t erminos de precisi on de ranking y de otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de apropiadas son nuestras m etricas para analizar la informaci on temporal y los sesgos en las recomendaciones producida

Biblos-e Archivo

Information and Communication Technologies in Tourism 2022

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/01/2022
Field of study

This open access book presents the proceedings of the International Federation for IT and Travel & Tourism (IFITT)’s 29th Annual International eTourism Conference, which assembles the latest research presented at the ENTER2022 conference, which will be held on January 11–14, 2022. The book provides an extensive overview of how information and communication technologies can be used to develop tourism and hospitality. It covers the latest research on various topics within the field, including augmented and virtual reality, website development, social media use, e-learning, big data, analytics, and recommendation systems. The readers will gain insights and ideas on how information and communication technologies can be used in tourism and hospitality. Academics working in the eTourism field, as well as students and practitioners, will find up-to-date information on the status of research

Directory of Open Access Books (DOAB)

Information and Communication Technologies in Tourism 2022

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen