29 research outputs found
Use of Graph Analytics to Identify Relevant Entities and Communities in Mercado Libre: A Case Study
Este artÃculo representa la información disponible en bases de datos no relacionales, aprovechando los beneficios de escalabilidad, alta disponibilidad, resiliencia y facilidad proporcionados por estas. Asà mismo, se da a conocer una serie de algoritmos suministrados por el motor de bases de datos de grafos Neo4j para computar métricas de grafos, nodos y relaciones. En primer lugar, se consolida un conjunto de datos públicos tomado del sistema de ventas online de Mercado Libre. Posteriormente, se modelan los datos obtenidos en un esquema de grafos que tiene como nodos a los usuarios, quienes pueden ser vendedores, compradores, productos y sus caracterÃsticas. Como siguiente paso, se aplican algoritmos que calculan métricas del grafo, junto con sus nodos y relaciones, visualizando de esta manera los resultados obtenidos. Para finalizar, se identifican las categorÃas ofertadas más importantes, las comunidades existentes y los usuarios más influyentes.This article represents the information available in non-relational databases, taking advantage of their scalability, high availability, resilience, and ease of development. This work also describes some algorithms provided by the Neo4j graph database engine to compute graph, node and relationship metrics. To do this, we first consolidate a data set obtained from Mercado Libre online sales system. Subsequently, the data is cast into a graph schema that considers users as nodes. Such users can be sellers or buyers, products and their characteristics. Afterward, we applied the algorithms that calculate metrics from the graph, as well as its nodes and relationships, thus displaying the results obtained. Finally, we identify the most important categories offered, along with the most influential communities and users
Eliciting preferences to find your perfect laptop: a usability study
#ElicitingPreferencesTofindYourPerfectLaptopAUsabilityStudyMost e-commerce websites that sell technological items provide a simplified Queryby-Example preference elicitation approach to enable users to search for the product that they desire by specifying constraints over criteria and filtering results. Such an interface may be
adequate when the technical specification is known. In this work, the usability of an alternative preference elicitation approach, Pairwise Comparisons, that enable users to specify trade-offs between various potentially conflicting criteria, is compared and contrasted to the
traditional Query-by-Example approach.
This is done by implementing two search tools that implement each preference elicitation
approach to collect data about user behaviour. An experiment is carried out to evaluate performance and usability to search for laptops over a web scraped data set for varied tasks. It is found that while Query-by-Example tends to be preferred for cases where the technical specification criteria are precisely specified, the result sets generated for the Pairwise Comparisons approach are preferred when it is less clear what the criteria values should be. As such, the Pairwise Comparisons preference elicitation is deemed to be a worthwhile complementary approach for such e-commerce websites
Classifying incoming customer messages for an e-commerce site using supervised learning
Throughout the world, the provision of online goods and services has increased significantly over the last few years. We consider the case of Tango Discos, a small company in Colombia that sells entertainment products through an e-commerce website and receives customer messages through various channels, including a webform, email, Facebook and Twitter. This dataset comprises 29,970 messages collected from 2019 to 2021. Each message can be categorized as being either being a sale, request or complaint. In this work we evaluate different supervised classification models to automate the task of classifying the messages, viz. decision trees, Naive Bayes, linear Support Vector Machines and logistic regression. As the data set is unbalanced, the different models are evaluated in combination with various data balancing approaches to obtain the best performance. In order to maximize revenue, the management is interested in prioritizing messages that may result in potential sales. As such, the best model for deployment is one that minimizes false positives in the sales category, so that these are processed in a timely fashion. As such, the best performing model is found to be the Linear Support Vector Machine using the Random Over Sampler balancing technique. This model is deployed in the cloud and exposed using a RESTful interface.En todo el mundo, la adquisicion de bienes y servicios en lÃnea ha aumentado significativamente en los últimos años. Consideramos el caso de Tango Discos, que es una pequeña empresa en Colombia que vende productos de entretenimiento a través de un sitio web de comercio electrónico y recibe mensajes de los clientes a través de varios canales, incluido un formulario web, correo electrónico, Facebook y Twitter. Este conjunto de datos comprende 29,970 mensajes recopilados entre 2019 y 2021. Cada mensaje se puede clasificar como una venta, una solicitud o una queja. En este trabajo evaluamos diferentes modelos de clasificación supervisada para automatizar la tarea de clasificar los mensajes, a saber. árboles de decisión, Naive Bayes, Máquinas de Vectores Soporte lineales y regresión logÃstica. Como el conjunto de datos está desequilibrado, los diferentes modelos se evalúan en combinación con varias tecnicas de balanceo de datos para obtener el mejor rendimiento. Como requerimiento desde el negocio, la gerencia está interesada en priorizar los mensajes que pueden resultar en ventas potenciales. Como tal, el mejor modelo para la implementación es aquel que minimiza los falsos positivos en la categorÃa de ventas, para que estos se procesen de manera oportuna. Asi, se encuentra que el modelo con mejor desempeño es el lineal. Support Vector Machine utilizando la técnica de balanceo Random Over Sampler. Este modelo se implementa en la nube y se expone mediante una API RESTful
Exploring the Colombian digital divide using Moodle logs through supervised learning
Purpose
This study aims to explore the digital divide between students living in metropolitan and non-metropolitan areas in the Antioquia region of Colombia. This is achieved by collecting data about student interactions from the Moodle learning management system (LMS), and subsequently applying supervised machine learning models to infer the gap between students in metropolitan and non-metropolitan areas.
Design/methodology/approach
This work uses the well-established Cross-Industry Standard Process for Data Mining methodology, which comprises six phases, viz., problem understanding, data understanding, data preparation, modelling, evaluation and implementation. In this case, student data was collected from the Moodle platform from the Antioquia campus of the UNAD distance learning university.
Findings
The digital divide is evident in the classification model when observing differences in variables such as the number of accesses to the LMS, the total time spent and the number of distinct IP addresses used, as well as the number of system modification events.
Originality/value
This study provides conclusions regarding the problems students in virtual education may face as a result of the digital divide in Colombia which have become increasingly visible since the implementation of machine learning methodologies on LMS such as Moodle. However, these practices may be replicated in any virtual educational context and furthermore be extended to enable personalisation of various aspects of the Moodle platform to meet the individual needs of students
Pairwise comparisons or constrained optimization? A usability evaluation of techniques for eliciting decision priorities
Decision support methodologies provide notations for expressing and communicating the priorities thatinform a decision. Although a substantial literature has explored the theoretical merits of such notationsand methodologies, much less work has investigated their usability in practice, which is of vital importancefor their widespread adoption by users. In this paper, we explore the usability of two well-known prefer-ence elicitation techniques,pairwise comparisonsandconstrained optimization. The techniques were exploredthrough two contrasting crowd worker experiments, a preliminary one evaluatingrecognition, that is, theability to identify the most suitable formulation for a given task, and the othersynthesis, that is, the abilityto construct formulations for a given task. The tasks are based on a case study involving source selection,a well-known problem in the data integration domain. The results of the empirical evaluation show that,overall,pairwise comparisonsresulted in significantly higher performance thanconstrained optimization,yetthere is negligible difference between the usability appraisals for each technique. Furthermore, we observedthat the technique that participants perform better with is not necessarily the one that they consider moreusable