    Multi-tenant Pub/Sub processing for real-time data streams

    Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. This paper introduces a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model defined by the applications that consume data. Each user-defined processing unit is called a Service Object. Every Service Object consumes input data streams and may produce output streams that others can consume. The subscription-based programing model enables multiple users to deploy their own data-processing services. The runtime does the dynamic forwarding of data and execution of Service Objects from different users. Data streams can originate in real-world devices or they can be the outputs of Service Objects. The runtime leverages Apache STORM for parallel data processing, that combined with dynamic user-code injection provides multi-tenant stream processing topologies. In this work we describe the runtime, its features and implementation details, as well as we include a performance evaluation of some of its core components.This work is partially supported by the European Research Council (ERC) un- der the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitivity (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft


    Wood mouse feeding effort and decision-making when encountering a restricted unknown food source

    This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Animals making foraging decisions must balance the energy gained, the time invested, and the influence of key environmental factors. In our work, we examined the effect of predation risk cues and experience on feeding efforts when a novel food resource was made available. To achieve this, we live-trapped wood mouse Apodemus sylvaticus in Monte de Valdelatas (Madrid), where 80 Sherman traps were set in four plots. Traps were subjected to two food-access difficulties in treatments consisting of three consecutive nights: open plastic bottles (easy) and closed bottles (difficult), both using corn as bait. To simulate predation risk, we set fox faeces in half of the traps in each plot. We also considered moonlight (medium/low) as an indirect predation risk cue. We analysed whether bottles had been bitten by mice and the gnawed area of each bottle was measured. Our results indicated that food access difficulty, experience, and predation risk determined mice feeding decisions and efforts. The ability of mice to adapt feeding effort when a new food source is available was demonstrated because a higher proportion of closed bottles exhibited bite marks and the gnawed area was bigger. Moreover, mouse experience was determinant in the use of this new resource since recaptured mice gnawed broader orifices in the bottles and the gnawed area increased each time an individual was recaptured. Additionally, direct predation risk cues prompted mice to bite the bottles whereas the effect of different moon phases varied among the food access treatments. This study provides direct evidence of formidable efficacy of wild mice to exploit a new nutrient resource while considering crucial environmental factors that shape the decision-making procedur

    Inclusión de la “Metodología Multicriterio” en el campo de la valoración de bienes inmuebles

    Proyecto de Graduación (Licenciatura en Ingeniería en Construcción) Instituto Tecnológico de Costa Rica. Escuela de Ingeniería en Construcción, 2010.The subject is the valuation of the land, by using the Multicriterio Method, which are a series of mathematics and statistics procedures that reduces the expert subjectivity and help him to have a bigger view for obtain the right price for the ground. The main objective is research what this is and how can be used by the Diquís Hydroelectric Project (PHED), also establish by examples an easy and practical guide for develop the Multicriterio Method. The bibliographical research and the interviews with ICE and others companies’ experts can define the start and elaborate a process to put in practice the new way of valuation. Two real examples of proprieties are used for applicants the Multicriterio Method. As conclusions based on the Results and Theoretic Resume this methodology is perfectly adapted to national reality and the purposes of the PHED, which is trying to find the better way of buy all the proprieties affected, also all this process are flexible and no requires a high development software.Instituto Tecnológico de Costa Rica. Escuela Ingeniería en Construcción; Empresa PHED

    Scalable processing of aggregate functions for data streams in resource-constrained environments

    The fast evolution of data analytics platforms has resulted in an increasing demand for real-time data stream processing. From Internet of Things applications to the monitoring of telemetry generated in large datacenters, a common demand for currently emerging scenarios is the need to process vast amounts of data with low latencies, generally performing the analysis process as close to the data source as possible. Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. Stream processing platforms are required to be malleable and absorb spikes generated by fluctuations of data generation rates. Data is usually produced as time series that have to be aggregated using multiple operators, being sliding windows one of the most common abstractions used to process data in real-time. To satisfy the above-mentioned demands, efficient stream processing techniques that aggregate data with minimal computational cost need to be developed. However, data analytics might require to aggregate extensive windows of data. Approximate computing has been a central paradigm for decades in data analytics in order to improve the performance and reduce the needed resources, such as memory, computation time, bandwidth or energy. In exchange for these improvements, the aggregated results suffer from a level of inaccuracy that in some cases can be predicted and constrained. This doctoral thesis aims to demonstrate that it is possible to have constant-time and memory efficient aggregation functions with approximate computing mechanisms for constrained environments. In order to achieve this goal, the work has been structured in three research challenges. First we introduce a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model de¿ned by the applications that consume data. The subscription-based programing model enables multiple users to deploy their own data-processing services. On top of this runtime, we present the Amortized Monoid Tree Aggregator general sliding window aggregation framework, which seamlessly combines the following features: amortized O(1) time complexity and a worst-case of O(log n) between insertions; it provides both a window aggregation mechanism and a window slide policy that are user programmable; the enforcement of the window sliding policy exhibits amortized O(1) computational cost for single evictions and supports bulk evictions with cost O(log n); and it requires a local memory space of O(log n). The framework can compute aggregations over multiple data dimensions, and has been designed to support decoupling computation and data storage through the use of distributed Key-Value Stores to keep window elements and partial aggregations. Specially motivated by edge computing scenarios, we contribute Approximate and Amortized Monoid Tree Aggregator (A2MTA). It is, to our knowledge, the first general purpose sliding window programable framework that combines constant-time aggregations with error bounded approximate computing techniques. A2MTA uses statistical analysis of the stream data in order to perform inaccurate aggregations, providing a critical reduction of needed resources for massive stream data aggregation, and an improvement of performance.La ràpida evolució de les plataformes d'anàlisi de dades ha resultat en un increment de la demanda de processament de fluxos continus de dades en temps real. Des de la internet de les coses fins al monitoratge de telemetria generada en grans servidors, una demanda recurrent per escenaris emergents es la necessitat de processar grans quantitats de dades amb latències molt baixes, generalment fent el processat de les dades tant a prop dels origines com sigui possible. Les dades son generades com a fluxos continus per dispositius que utilitzen una varietat de localitzacions i protocols. Aquests processat de les dades s pot fer en temps real amb les transformacions efectuant-se al vol, i en aquest cas la utilització de plataformes de processat d'streams és necessària. Les plataformes de processat d'streams cal que absorbeixin pics de freqüència de dades. Les dades es generen com a series temporals que s'agreguen fent servir multiples operadors, on les finestres són l'abstracció més habitual. Per a satisfer les baixes latències i maleabilitat requerides, els operadors necesiten tenir un cost computacional mínim, inclús amb extenses finestres de dades per a agregar. La computació aproximada ha sigut durant decades un paradigma rellevant per l'anàlisi de dades on cal millorar el rendiment de diferents algorismes i reduir-ne el temps de computació, la memòria requerida, l'ample de banda o el consum energètic. A canvi d'aquestes millores, els resultats poden patir d'una falta d'exactitud que pot ser estimada i controlada. Aquesta tesi doctoral vol demostrar que es posible tenir funcions d'agregació pel processat d'streams que tinc un cost de temps constant, sigui eficient en termes de memoria i faci ús de computació aproximada. Per aconseguir aquests objectius, aquesta tesi està dividida en tres reptes. Primer presentem un entorn per a la construcció dinàmica de topologies de computació d'streams de dades utilitzant codi d'usuari. Aquestes topologies es construeixen fent servir un model de subscripció a streams, en el que les aplicación consumidores de dades amplien les topologies mentre s'estan executant. Aquest entorn permet multiples entitats ampliant una mateixa topologia. A sobre d'aquest entorn, presentem un framework de propòsit general per a l'agregació de finestres de dades anomenat AMTA (Amortized Monoid Tree Aggregator). Aquest framework combina: temps amortitzat constant per a totes les operacions, amb un cas pitjor logarítmic; programable tant en termes d'agregació com en termes d'expulsió d'elements de la finestra. L'expulsió massiva d'elements de la finestra es considera una operació atòmica, amb un cost amortitzat constant; i requereix espai en memoria local per a O(log n) elements de la finestra. Aquest framework pot computar agregacions sobre multiples dimensions de dades, i ha estat dissenyat per desacoplar la computació de les dades del seu desat, podent tenir els continguts de la finestra distribuits en diferents màquines. Motivats per la computació en l'edge (edge computing), hem contribuit A2MTA (Approximate and Amortized Monoid Tree Aggregator). Des de el nostre coneixement, es el primer framework de propòsit general per a la computació de finestres que combina un cost constant per a totes les seves operacions amb tècniques de computació aproximada amb control de l'error. A2MTA fa us d'anàlisis estadístics per a poder fer agregacions amb error limitat, reduint críticament els recursos necessaris per a la computació de grans quantitats de dades

    Comparison of Clustering Algorithms for Learning Analytics with Educational Datasets

    Learning Analytics is becoming a key tool for the analysis and improvement of digital education processes, and its potential benefit grows with the size of the student cohorts generating data. In the context of Open Education, the potentially massive student cohorts and the global audience represent a great opportunity for significant analyses and breakthroughs in the field of learning analytics. However, these potentially huge datasets require proper analysis techniques, and different algorithms, tools and approaches may perform better in this specific context. In this work, we compare different clustering algorithms using an educational dataset. We start by identifying the most relevant algorithms in Learning Analytics and benchmark them to determine, according to internal validation and stability measurements, which algorithms perform better. We analyzed seven algorithms, and determined that K-means and PAM were the best performers among partition algorithms, and DIANA was the best performer among hierarchical algorithms

    Constant-time approximate sliding window framework with error control

    Stream Processing is a crucial element for the Edge Computing paradigm, in which large amount of devices generate data at the edge of the network. This data needs to be aggregated and processed on-the-move across different layers before reaching the Cloud. Therefore, defining Stream Processing services that adapt to different levels of resource availability is of paramount importance. In this context, Stream Processing frameworks need to combine efficient algorithms with low computational complexity to manage sliding windows, with the ability to adjust resource demands for different deployment scenarios, from very low capacity edge devices to virtually unlimited Cloud platforms. The Approximate Computing paradigm provides improved performance and adaptive resource demands in data analytics, at the price of introducing some level of inaccuracy that can be calculated. In this paper we present the Approximate and Amortized Monoid Tree Aggregator (A 2 MTA). It is, to our knowledge, the first general purpose sliding window programable framework that combines constant-time aggregations with error bounded approximate computing techniques. It is very suitable for adverse stream processing environments, such as resource scarce multi-tenant edge computing. The framework can compute aggregations over multiple data dimensions, setting error bounds on any of them, and has been designed to support decoupling computation and data storage through the use of distributed Key-Value Stores to keep window elements and partial aggregations.This project is partially supported by the European Research Council (ERC), Spain under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya, Spain under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493).Peer ReviewedPostprint (author's final draft

    Creació d'un sistema multipista de 4 pistes de so per al DSI Sound

    La finalitat d’aquest programa és la creació de reportatges i cròniques radiofòniques com a projectes d’àudio, utilitzant diverses pistes de so, per a la seva posterior emissió. L’aplicació està enfocada a periodistes que treballin en una redacció de ràdio i que no tinguin necessàriament uns grans coneixements d’informàtica. El programa consta d’una interfície senzilla que permet treballar amb diverses pistes on s’hi poden afegir talls d’àudio prèviament enregistrats. Les seves característiques principals són: Possibilitat d’afegir, desplaçar, definir un interval o eliminar talls de so dins d’una pista. Mètode de Drag & Drop per afegir talls des de qualsevol localització. Reproducció d’una o diverses pistes simultàniament. Reproducció independent d’un sol tall. Regulació manual de l’envolupant dels talls d’àudio. Regulació automàtica de l’envolupant segons un sistema de priorització de les pistes. 9 modes diferents de reproducció a partir d’un tram seleccionat del projecte. Desat dels projectes en format XML per a la seva posterior recuperació, o bé com a arxiu d’àudio únic. Funcions de desfer i refer. Zoom sobre les pistes. La programació s’ha dividit en dues parts diferenciades, el nucli i la interfície. El nucli es pot considerar com un programa independent sense cap part visible, mentre que la interfície és la que permet a l’usuari utilitzar la majoria de funcions del nucli amb una representació gràfica dels seus elements. Tot el conjunt ha estat programat amb Visual Basic 6.0 i, concretament, els elements del nucli s’han creat utilitzat classes i col·leccions, mentre que a la interfície s’han utilitzat controls d’usuari

    La enseñanza y aprendizaje de las ciencias sociales y el desarrollo del pensamiento social

    El presente proyecto de investigación es el resultado de un trabajo que tiene como propósito aportar al desarrollo del pensamiento social a través de una innovación didáctica sobre las representaciones de justicia y ciudadanía en estudiantes de básica y media. Éste pretendió identificar las representaciones que los educandos tienen sobre el tema desde los escenarios escolares, familiares y sociales, con el fin de comprenderlos y transformarlos. Para esta investigación se diseñó y realizó una unidad didáctica que permitió obtener datos empíricos procedentes de la experimentación en el aula, a través de diferentes técnicas de recolección de la información como el cuestionario, producciones textuales, observación participante, entre otros, que fueron aportados por los grupos de estudio. Los datos obtenidos fueron transcritos para ser analizados, interpretados y codificados en tres momentos, mediante la codificación abierta, axial y selectiva, lo que permitió delimitar y extraer la información más relevante para crear un marco de referencia para comprender las diferentes representaciones de justicia y ciudadanía. Este tipo de trabajo permite fundamentar las representaciones de justicia y ciudadanía desde los presaberes de los estudiantes de una manera crítica y participativa, partiendo del quehacer pedagógico de los profesores participantes. Es tarea del profesorado tratar de contribuir desde la enseñanza y el aprendizaje de las ciencias sociales al desarrollo del pensamiento social a través de diversas estrategias que les permitan a los estudiantes desarrollar competencias argumentativas y propositivas que logren que el estudiante tenga la capacidad de tomar posición y asumir retos del diario vivir