30 research outputs found
Bridging the Gap Between the Least and the Most Influential Twitter Users
Social networks play an increasingly important role in shaping the behaviour of users of the Web. Conceivably Twitter stands out from the others, not only for the platform's simplicity but also for the great influence that the messages sent over the network can have. The impact of such messages determines the influence of a Twitter user and is what tools such as Klout, PeerIndex or TwitterGrader aim to calculate. Reducing all the factors that make a person influential into a single number is not an easy task, and the effort involved could become useless if the Twitter users do not know how to improve it. In this paper we identify what specific actions should be carried out for a Twitterer to increase their influence in each of above-mentioned tools applying, for this purpose, data mining techniques based on classification and regression algorithms to the information collected from a set of Twitter users.This work has been partially founded by the European Commission Project ”SiSOB: An Observatorium for Science
in Society based in Social Models” (http://sisob.lcc.uma.es) (Contract no.: FP7 266588), ”Sistemas Inalámbricos
de Gestión de Información Crítica” (with code number TIN2011-23795 and granted by the MEC, Spain) and ”3DTUTOR:
Sistema Interoperable de Asistencia y Tutoría Virtual e Inteligente 3D” (with code number IPT-2011-0889-
900000 and granted by the MINECO, Spain
Nuevos enfoques en aprendizaje incremental
Actualmente el volumen de datos que se genera en diferentes ámbitos es muy elevado, llegando incluso a ser difícil de almacenar. Realizar tareas de aprendizaje automático ante tal cantidad de información está provocando que sean necesarios nuevos algoritmos. En esta tesis se presentan distintas aportaciones en el ámbito del aprendizaje incremental, las cuales, fundamentalmente, están dirigidas a mejorarlo usando algoritmos basados en cotas de concentración y sistemas multiclasificadores
GNUsmail: Open framework for on-line email classification
Real-time classification of massive email data is a challenging task that presents its own particular difficulties. Since email data presents an important temporal component, several problems arise: emails arrive continuously, and the criteria used to classify those emails can change, so the learning algorithms have to be able to deal with concept drift. Our problem is more general than spam detection, which has received much more attention in the literature.
In this paper we present GNUsmail, an open-source extensible framework for email classification, which structure supports incremental and on-line learning. This framework enables the incorporation of algorithms developed by other researchers, such as those included in WEKA and MOA. We evaluate this framework, characterized by two overlapping phases (pre-processing and learning), using the ENRON dataset, and we compare the results achieved by WEKA and MOA algorithms
A novel clustering based method for characterizing household electricity consumption profiles
A new methodology based on expert knowledge and data mining is proposed to obtain data-driven models that characterize household consumption profiles. These profiles are useful for electricity marketers to understand their customers’ consumption. They could then adjust their electricity purchases in the market and provide recommendations to their customers to manage their consumption. The novelty of this research work is proposing a new procedure to determine an adequate number of clusters for a clustering task. Therefore, the proposed new methodology includes this novel procedure to build the models in two phases. In the first phase, clustering algorithms are used to group the data using different numbers of clusters. For the second phase, a new procedure (k-ISAC_TLP) is proposed and used to select the most appropriate number of clusters. This methodology allows the inclusion of domain information. In the case of household electricity consumption, where only groups with a significant number are relevant as long as the error is small, it allows the use of metrics like the mean absolute error and the number of observations (daily electricity consumption profiles). According to experts, the results achieved in two real datasets (from Spain and Ireland), with millions of observations support the methodology and reveal novel knowledge. In both cases, two and a half million observations have been analyzed and around twenty electricity consumption profiles have been detected. The methodology is easily extensible to problems of any domain where clustering algorithms are applicable. A software solution has been implemented and made freely available.Funding for open access charge: Universidad de Málaga/CBUA . The authors would like to thank the University College Dublin Library the access to the Irish Social Science Data Archive (ISSDA). This work was supported by Grant RTI2018-095097-B-I00 funded by MCIN (Spain), Grant CPP2021-008403 funded by MCIN/AEI/ 10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”
Mining Web-based Educational Systems to Predict Student Learning Achievements
Educational Data Mining (EDM) is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES) and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases).
The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students). It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered.
In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models
Teaching compilers: automatic question generation and intelligent assessment of grammars' Parsing
Automatic question generation and the assessment of
procedural knowledge is still a challenging research topic. This
article focuses on the case of it, the techniques of parsing grammars
for compiler construction. There are two well-known techniques for
parsing: top-down parsing with LL(1) and bottom-up with LR(1).
Learning these techniques and learning to design grammars that
can be parsed with these techniques requires practice. This article
describes an application that covers all the tasks needed to automa-
tize the learning and assessment process: 1) automatically generate
context-free languages and grammars of different complexity; 2)
pose different types of questions to the student with an appropriate
response interface; 3) automatically correct the student answer,
including grammar design for a given language; and 4) provide
feedback on errors. The application has been implemented as a
plug-in of the SIETTE assessment system that, in addition, can
provide adaptive behavior for question selection. It has been suc-
cessfully used by more than a thousand students for formative and
summative assessment.Funding for open access charge: Universidad de Málaga / CBU
StreetQR Project. Device for Information Assistance in Streets and Places of Interest
En este trabajo se expone un ejemplo de transferencia de conocimiento desde la universidad hacia la sociedad, dentro del campo de la Inteligencia Artificial, con vista a obtener un encadenamiento productivo universidad-empresa. Así, se describe el proyecto StreetQR, cuyo objetivo es implementar el dispositivo de dicho nombre en el campus de la Universidad de Málaga, y que está actualmente en desarrollo. El StreetQR es un dispositivo de asistencia informativa para placas de calle y lugares de interés, que permite tres funciones: informar de manera situacional a los ciudadanos que están en una ciudad, captar información del flujo vehicular y peatonal de dicha ciudad, y alertar a la población en caso de situaciones especiales. En el trabajo se explica el dispositivo y su funcionamiento, así como el marco institucional que ha ofrecido la Universidad de Málaga para poder pasar de una patente a un proyecto que tiene por objetivo obtener un prototipo funcional del dispositivo en el campus universitario. También se expondrá el estado actual de desarrollo del proyecto.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec
Educational Data Mining for Personalized Prediction of Academic Performance
La Minería de Datos Educativos (Educational Data Mining - EDM) está adquiriendo gran importancia como un nuevo campo de investigación interdisciplinario relacionado con algunas otras áreas. Está directamente relacionado con los Sistemas Educativos basados en la Web (Web-based Educational Systems - WBES) y la Minería de Datos (Data Mining - DM), siendo esta última una parte fundamental del Descubrimiento de Conocimiento en Bases de Datos (Knowledge Discovery in Databases - KDD).
Los WBES almacenan y administran grandes cantidades de datos. Estos datos están creciendo cada vez más y contienen conocimientos ocultos que podrían ser muy útiles para los usuarios (tanto profesores como estudiantes). Es conveniente identificar tales conocimientos en forma de modelos, patrones o cualquier otro esquema de repre- sentación que permita una mejor explotación del sistema. La minería de datos se revela como la herramienta para lograr tal descubrimiento, dando lugar a la EDM. En este contexto complejo se suelen utilizar distintas técnicas y algoritmos de aprendizaje para obtener los mejores resultados.
En este trabajo se estudia, para una asignatura de Informática Teórica, concretamente la asignatura “Teoría de Autómatas y Lenguajes Formales”, cómo predecir el rendimiento académico alcanzado por los estudiantes, a partir de la realización de controles intermedios. Para ello se han aplicado y comparado distintos tipos de algoritmos de aprendizaje (vecinos más cercanos, árboles de decisión, multiclasificadores). Todo el proceso de control y evaluación de los estudiantes durante el curso se ha llevado a cabo a través de la herramienta web denominada SIETTE, desarrollada en nuestro departamento, y que además se utiliza en ámbitos fuera de nuestra propia universidad.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Este trabajo ha sido parcialmente financiado por el I Plan Propio de Investigacion y Transferencia de la Universidad de Malaga
Detection of unfavourable urban areas with higher temperatures and lack of green spaces using satellite imagery in sixteen Spanish cities.
This paper seeks to identify the most unfavourable areas of a city in terms of high temperatures and the absence
of green infrastructure. An automatic methodology based on remote sensing and data analysis has been devel oped and applied in sixteen Spanish cities with different characteristics. Landsat-8 satellite images were selected
for each city from the July-August period of 2019 and 2020 to calculate the spatial variation of land surface
temperature (LST). The Normalized Difference Vegetation Index (NDVI) was used to determine the abundance of
vegetation across the city. Based on the NDVI and LST maps created, a k-means unsupervised classification
clustering was performed to automatically identify the different clusters according to how favourable these areas
were in terms of temperature and presence of vegetation. A Disadvantaged Area Index (DAI), combining both
variables, was developed to produce a map showing the most unfavourable areas for each city. Overall, the
percentage of the area susceptible to improvement with more vegetation in the cities studied ranged from 13 %
in Huesca to 64–65 % in Bilbao and Valencia. The influence of several factors, such as the presence of water
bodies or large buildings, is discussed. Detecting unfavourable areas is a very interesting tool for defining future
planning strategy for green spaces
Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds
I. Frías-Blanco, J. d. Campo-Ávila, G. Ramos-Jiménez, R. Morales-Bueno, A. Ortiz-Díaz and Y. Caballero-Mota, "Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds," in IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 3, pp. 810-823, 1 March 2015
doi: 10.1109/TKDE.2014.2345382.
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Incremental and online learning algorithms are more relevant in the data mining context because of the increasing necessity to process data streams. In this context, the target function may change over time, an inherent problem of online learning (known as concept drift). In order to handle concept drift regardless of the learning model, we propose new methods to monitor the performance metrics measured during the learning process, to trigger drift signals when a significant variation has been detected. To monitor this performance, we apply some probability inequalities that assume only independent, univariate and bounded random variables to obtain theoretical guarantees for the detection of such distributional changes. Some common restrictions for the online change detection as well as relevant types of change (abrupt and gradual) are considered. Two main approaches are proposed, the first one involves moving averages and is more suitable to detect abrupt changes. The second one follows a widespread intuitive idea to deal with gradual changes using weighted moving averages. The simplicity of the proposed methods, together with the computational efficiency make them very advantageous. We use a Naïve Bayes classifier and a Perceptron to evaluate the performance of the methods over synthetic and real data.Supported in part by the SESAAME project number TIN2008-06582-C03-03 of the MICINN, Spain.
Supported in part by the AUIP (Asociación Universitaria Iberoamericana de Postgrado)