3,270 research outputs found

    Predicción y análisis de interacciones de usuarios en plataformas de enseñanza online

    Full text link
    Las plataformas de enseñanza online generan gran cantidad de metadatos sobre las interacciones entre los estudiantes y con la plataforma. Esta información puede ser aprovechada por los profesores de los cursos para mejorar el curso y la experiencia docente de los estudiantes. En este contexto el objetivo de este TFG es el análisis de las interacciones realizadas por los estudiantes en cursos online y la predicción del comportamiento del estudiante utilizando su patrón de acceso a la plataforma. Debido al volumen de datos que se maneja se hará uso herramientas de computación en paralelo como Apache Spark para preprocesar los datos generados por la plataforma. Mediante Apache Spark se creará una aplicación que extraiga el patrón de acceso de los estudiantes a la plataforma y disminuya la gran cantidad de metadatos generada en un curso online. Por último, se aplicarán algoritmos de aprendizaje automático para predecir variables de interés sobre la interacción de los estudiantes con el curso como la probabilidad de abandono o el rendimiento académico. Esto también se realizará con la herramienta Apache Spark. En concreto, se utilizará el algoritmo Random Forest de la librería MLlib de Spark con la finalidad de obtener el mejor resultado a la hora de predecir las variables de interés del curso.Online education platforms generate a lot of metadata about interactions among students and with the platform. This information can be harnessed by teachers to improve the course and student’s teaching experience. In this context the aim of this study is the analysis of interactions performed by students and the prediction of student’s behavior using his access patterns to platform. Due to the volume of data handled, we use a tool for parallel computing such as Apache Spark for preprocessing the data generated by the platform. We create an application that extracts the access patterns to platform and decreases the volume of the metadata generated in this online course. Finally, we apply machine learning algorithms to predict target variables related to the interactions of students enrolled in the course, for example the dropout rate or the academic performance. We also use the tool Apache Spark for this task. Specifically, we apply the algorithm Random Forest from the library MLlib in order to get the best result in predicting the course’s target variables

    Database integrated analytics using R : initial experiences with SQL-Server + R

    Get PDF
    © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most data scientists use nowadays functional or semi-functional languages like SQL, Scala or R to treat data, obtained directly from databases. Such process requires to fetch data, process it, then store again, and such process tends to be done outside the DB, in often complex data-flows. Recently, database service providers have decided to integrate “R-as-a-Service” in their DB solutions. The analytics engine is called directly from the SQL query tree, and results are returned as part of the same query. Here we show a first taste of such technology by testing the portability of our ALOJA-ML analytics framework, coded in R, to Microsoft SQL-Server 2016, one of the SQL+R solutions released recently. In this work we discuss some data-flow schemes for porting a local DB + analytics engine architecture towards Big Data, focusing specially on the new DB Integrated Analytics approach, and commenting the first experiences in usability and performance obtained from such new services and capabilities.Peer ReviewedPostprint (author's final draft

    Real-Time Context-Aware Microservice Architecture for Predictive Analytics and Smart Decision-Making

    Get PDF
    The impressive evolution of the Internet of Things and the great amount of data flowing through the systems provide us with an inspiring scenario for Big Data analytics and advantageous real-time context-aware predictions and smart decision-making. However, this requires a scalable system for constant streaming processing, also provided with the ability of decision-making and action taking based on the performed predictions. This paper aims at proposing a scalable architecture to provide real-time context-aware actions based on predictive streaming processing of data as an evolution of a previously provided event-driven service-oriented architecture which already permitted the context-aware detection and notification of relevant data. For this purpose, we have defined and implemented a microservice-based architecture which provides real-time context-aware actions based on predictive streaming processing of data. As a result, our architecture has been enhanced twofold: on the one hand, the architecture has been supplied with reliable predictions through the use of predictive analytics and complex event processing techniques, which permit the notification of relevant context-aware information ahead of time. On the other, it has been refactored towards a microservice architecture pattern, highly improving its maintenance and evolution. The architecture performance has been evaluated with an air quality case study

    Customer churn prediction in telecom using machine learning and social network analysis in big data platform

    Full text link
    Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer to churn. Therefore, finding factors that increase customer churn is important to take necessary actions to reduce this churn. The main contribution of our work is to develop a churn prediction model which assists telecom operators to predict customers who are most likely subject to churn. The model developed in this work uses machine learning techniques on big data platform and builds a new way of features' engineering and selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 93.3%. Another main contribution is to use customer social network in the prediction model by extracting Social Network Analysis (SNA) features. The use of SNA enhanced the performance of the model from 84 to 93.3% against AUC standard. The model was prepared and tested through Spark environment by working on a large dataset created by transforming big raw data provided by SyriaTel telecom company. The dataset contained all customers' information over 9 months, and was used to train, test, and evaluate the system at SyriaTel. The model experimented four algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM" and Extreme Gradient Boosting "XGBOOST". However, the best results were obtained by applying XGBOOST algorithm. This algorithm was used for classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK
    corecore